QUICK REVIEW

[Paper Review] Stacked Bidirectional and Unidirectional LSTM Recurrent Neural Network for Forecasting Network-wide Traffic State with Missing Values

Zhiyong Cui, Ruimin Ke|arXiv (Cornell University)|May 24, 2020

Traffic Prediction and Management Techniques53 references37 citations

TL;DR

This paper introduces SBU-LSTM, a stacked architecture combining Bidirectional LSTM with Imputation (BDLSTM-I) and unidirectional LSTM layers to forecast network-wide traffic speeds and handle missing data via an integrated imputation unit.

ABSTRACT

Short-term traffic forecasting based on deep learning methods, especially recurrent neural networks (RNN), has received much attention in recent years. However, the potential of RNN-based models in traffic forecasting has not yet been fully exploited in terms of the predictive power of spatial-temporal data and the capability of handling missing data. In this paper, we focus on RNN-based models and attempt to reformulate the way to incorporate RNN and its variants into traffic prediction models. A stacked bidirectional and unidirectional LSTM network architecture (SBU-LSTM) is proposed to assist the design of neural network structures for traffic state forecasting. As a key component of the architecture, the bidirectional LSTM (BDLSM) is exploited to capture the forward and backward temporal dependencies in spatiotemporal data. To deal with missing values in spatial-temporal data, we also propose a data imputation mechanism in the LSTM structure (LSTM-I) by designing an imputation unit to infer missing values and assist traffic prediction. The bidirectional version of LSTM-I is incorporated in the SBU-LSTM architecture. Two real-world network-wide traffic state datasets are used to conduct experiments and published to facilitate further traffic prediction research. The prediction performance of multiple types of multi-layer LSTM or BDLSTM models is evaluated. Experimental results indicate that the proposed SBU-LSTM architecture, especially the two-layer BDLSTM network, can achieve superior performance for the network-wide traffic prediction in both accuracy and robustness. Further, comprehensive comparison results show that the proposed data imputation mechanism in the RNN-based models can achieve outstanding prediction performance when the model's input data contains different patterns of missing values.

Motivation & Objective

Motivate improved short-term network-wide traffic forecasting that can handle missing sensor data.
Propose an LSTM variant with an imputation unit (LSTM-I) to infer missing values within prediction models.
Introduce a stacked architecture (SBU-LSTM) that combines bidirectional and unidirectional LSTM components for better spatial-temporal feature learning.
Evaluate model performance on real-world datasets and analyze trade-offs between model capacity and complexity.
Publish and share LOOP-SEA dataset to facilitate further research.

Proposed method

Define X as a T x D traffic state sequence with a masking matrix M indicating missing values.
Introduce LSTM-I with an imputation unit that infers missing x_t from C_{t-1} and h_{t-1} and updates inputs via a masking mechanism.
Incorporate a regularization term to LSTM-I loss that penalizes imputation error (lambda * sum of |x_t - xhat_t|).
Extend to Bidirectional LSTM with Imputation (BDLSTM-I) to impute missing values using forward and backward passes and combine via an averaging operator.
Stack BDLSTM-I and LSTM/LBDSTM layers into a flexible SBU-LSTM architecture where the first layer (when missing data exist) is BDLSTM-I, followed by additional layers as needed.
Train using MSE with Adam, early stopping, and learning-rate decay; evaluate under random and non-random missing data patterns at varying missing rates.

Experimental results

Research questions

RQ1Can a stacked architecture that uses bidirectional temporal dependencies improve network-wide traffic prediction accuracy compared to unidirectional or single-layer models?
RQ2Does integrating a data imputation unit within the RNN (LSTM-I/BDLSTM-I) enhance robustness and prediction accuracy in the presence of missing sensor data?
RQ3What is the impact of model depth and layer type (BDLSTM-I vs LSTM/LBDSTM) on predictive performance and computational trade-offs?
RQ4How does the proposed approach perform on real-world network-wide traffic datasets with different missing data patterns?

Key findings

BDLSTM-based architectures, especially a two-layer BDLSTM, achieve the best predictive accuracy on LOOP-SEA and PEMS-BAY datasets.
Two-layer BDLSTM generally outperforms single-layer models and deeper architectures, indicating a sweet spot in model capacity vs. complexity.
BDLSTM-I (with missing data handling) often yields superior performance when input data contains missing values, with imputation integrated into the training objective.
The proposed data imputation mechanism (LSTM-I/BDLSTM-I) improves prediction robustness across various missing-value patterns (random and non-random).
The results suggest BDLSTM-based stacks are more effective as the final layer for network-wide traffic prediction, and the architecture can be flexibly extended with LSTM/BDLSTM layers.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.