Skip to main content
QUICK REVIEW

[Paper Review] Forecasting of the Montreal Subway Smart Card Entry Logs with Event Data

Florian Toqué, Étienne Côme|arXiv (Cornell University)|Aug 22, 2020
Human Mobility and Location-Based Analysis1 citations
TL;DR

This study proposes a generic data shaping framework that enables long-term (up to one year ahead) forecasting of Montreal Metro passenger demand at 15-minute intervals using smart card data, calendar information, and planned event data. The random forest model achieved the best performance, demonstrating that event data significantly improves forecast accuracy, especially for occasional pass usage during events.

ABSTRACT

One of the major goals of transport operators is to adapt the transport supply scheduling to the passenger demand for existing transport networks during each specific period. Another problem mentioned by operators is accurately estimating the demand for disposable ticket or pass to adapt ticket availability to passenger demand. In this context, we propose generic data shaping, allowing the use of well-known regression models (basic, statistical and machine learning models) for the long-term forecasting of passenger demand with fine-grained temporal resolution. Specifically, this paper investigates the forecasting until one year ahead of the number of passengers entering each station of a transport network with a quarter-hour aggregation by taking planned events into account (e.g., concerts, shows, and so forth). To compare the models and the quality of the prediction, we use a real smart card and event data set from the city of Montr\'eal, Canada, that span a three-year period with two years for training and one year for testing.

Motivation & Objective

  • To enable long-term forecasting of passenger demand at fine temporal resolution (15 minutes) for urban transit networks.
  • To develop a generic data shaping method that integrates smart card, calendar, and event data for use with standard regression models.
  • To improve transport supply planning and ticket availability by forecasting demand for different ticket types, especially during events.
  • To evaluate the impact of event data on forecasting accuracy, particularly for occasional and disposable passes.
  • To provide a reproducible methodology applicable to other cities with similar data availability.

Proposed method

  • Data shaping transforms calendar and event data into numerical features (e.g., day of week, event presence, event type) for model input.
  • A 15-minute temporal aggregation is used to capture fine-grained demand patterns and event impacts.
  • Multiple models are compared: basic (linear), statistical (SARIMAX), and machine learning (random forest, XGBoost, LightGBM).
  • The training set spans two years, and the test set covers one year of real-world data to evaluate long-term forecasting performance.
  • A trend adjustment method is applied to account for year-to-year changes in passenger volume.
  • Model performance is evaluated using MAE, RMSE, and R² metrics across global and ticket-type-specific forecasts.

Experimental results

Research questions

  • RQ1Can passenger demand be accurately forecasted up to one year in advance using only calendar and event data available in advance?
  • RQ2How does the inclusion of event data affect forecasting accuracy, particularly during event periods?
  • RQ3Which forecasting model—basic, statistical, or machine learning—performs best for long-term, fine-grained metro demand prediction?
  • RQ4How does demand vary by ticket type (e.g., occasional pass) during events, and can this be captured in the forecast?
  • RQ5To what extent does a trend adjustment method improve long-term forecasting performance?

Key findings

  • The random forest model achieved the lowest MAE and RMSE, outperforming both statistical and basic models in global demand forecasting.
  • Event data significantly improved forecast accuracy, especially during event periods, with the most notable impact observed on occasional pass usage.
  • The number of passengers using occasional passes increased by up to 1,000 during peak event hours (e.g., 10:15 p.m.) at Lucien-L’Allier station.
  • Forecasting performance varied by station, with some stations—particularly those near event venues—being more predictable when event data was included.
  • The trend adjustment method effectively captured the year-to-year increase in passenger volume, improving forecast stability over long horizons.
  • The proposed data shaping framework is generic and reproducible, enabling application to other transit networks with similar data availability.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.