Nowadays, cities must address the challenge of sustainable mobility. Traffic state forecasting plays a key role in mitigating traffic congestion in urban areas. For example, predicting path travel time is a crucial issue in navigation and route planning applications.

The smart mobility research group of inLab FIB faces this problem in multiple projects and traditionally, the group solves it through simulation techniques. In the last months, a data-driven approach has been studied, in particular using deep learning methods.

The traffic forecasting problem is very extensive and it includes some subproblems according to different issues, whose are described in the following sections.

Scenario

One of the most important features in the traffic forecasting problem is the kind of network where the predictions are performed. Usually, they are classified in urban networks and freeways. The topology is very different because urban networks contain more and shorter links while freeway networks are composed of few but larger links. Typically, forecasting in urban areas is more difficult because the behavior of the drivers inside the network is less predictable.

Prediction horizon

Another key feature in this problem is the prediction horizon, which is classified as short-term or long-term. Although it is not clear exactly what horizons each group refers to, the short-term name is used for predictions from 1 minute to around 30 minutes or 1 hour depending on the author. For larger prediction horizons, the long-term name is used. Depending on the traffic forecasting goal, the required prediction horizon changes. For example, for real-time navigation software, the desired forecasting is in short-term in order to modify the indications to the driver and avoid network congestion. Whereas, for a traffic management systems, the desired prediction horizon could be long-term in order to perform some decisions with enough anticipation for implementing them.

Predicted variable and scale

Besides the kind of network and the horizon time, the traffic prediction problem is determined by what variable and with which scale is going to be predicted. Traffic forecasting can be performed for different variables, the most commons are four:

Traffic flow: number of vehicles that pass through some site in a determined amount of time (measured in vehicles/second or vehicles/hour).
Traffic density: number of vehicles located in a determined area at the same time (measured in vehicles/meter or vehicles/kilometer).
Average speed: average speed for the vehicles in a site (measured in kilometer/hour or meter/second).
Travel time: time that takes a vehicle to travel from an origin point to its destination (measured in seconds, minutes or hours).

These predicted variables can be used in different scales, like a specific point in a network, a section (a link or a part of a link) of the network and the whole network (or a sector for the network).

Classification or regression

Also, as seen on most of the machine learning problems, the prediction performed can be regression or classification. This depends on if the prediction is performed over a continuous number (the presented predicted variables) or over a finite set of values (a discretization of the previously predicted variables). So, depending on the final goal, the traffic forecasting solution could try to advance the average speed of each section, the general traffic state of a network (for example free, medium or congestion), the number of vehicles that will pass through some point, etc.

Data source

Finally, the last issue to be considered is the data source used. Nowadays, traffic data is generated in multiple ways and many types of systems can be used to it. Following, the most usual traffic data sources are listed:

User surveys: Traditionally, most of the other options did not exist and the traffic was measured through surveys performed directly over the population. This data source has been replaced by other automatic ways (mentioned below) which are cheaper and collect better data.
Sensors: These devices are able to register some traffic features like the presence of a vehicle, its speed, etc. In the last years and following with the smart cities evolution, the presence of traffic sensors has grown and they have been improved. For example, one of the first used sensors was the loop detector which is able to detect vehicles in a specific position. This kind of devices requires roadworks for their installation, thus their relocation is too expensive. Currently, other modern options like the ANPR (Automatic Number-Plate Recorder) are used, with an easier installation and the capability to identify which cars pass through a point. Despite these improvements, they are only able to register the activity in a fixed location and the quantity of them needed to cover a whole city is very high.
Cameras: Although the main goal of traffic cameras is to offer a real-time monitorization of the network state, in recent years they have been used as traffic data collectors. It is a very good way to reuse installed systems, but the use of camera records as traffic data needs a computer vision process to translate the images into, for example, traffic flow data. In addition, they present the same problem than the sensors about the fixed position.
GPS-FCD: The systems installed in the modern vehicles allows to locate the vehicles in real time with high precision. This kind of data is the most desirable for the traffic forecasting systems that offer individual data of high quality and without location limitations. The main problem is that a sufficient penetration rate is needed in order to this data being representative.
Cellphones: Because nowadays most people bring connected smartphones with them (even while driving), the data generated by these devices can be used as GPS-FCD.

Exogenous variables

In addition to traffic data, the use of some external data is more and more usual in the literature. This information is named exogenous variables and its use allows to adjust the predictions to some external conditions. For example, in the temporal dimension, the use of information like the moment of the day, the day of the week, the season of the year or the holiday days can be decisive to improve the forecasting accuracy. Also, other factors that can change the traffic situation are the weather conditions, the city events (special and periodical), the roadworks, the traffic incidents, etc. These ones are also used as exogenous variables.

The Traffic Forecasting problem