[Paper Review] Adaptive Traffic Signal Control: Deep Reinforcement Learning Algorithm with Experience Replay and Target Network
The paper proposes a deep reinforcement learning approach with a CNN-based feature extractor, experience replay, and a target network to adaptively control traffic signals using raw real-time data, improving stability and reducing vehicle delay.
Adaptive traffic signal control, which adjusts traffic signal timing according to real-time traffic, has been shown to be an effective method to reduce traffic congestion. Available works on adaptive traffic signal control make responsive traffic signal control decisions based on human-crafted features (e.g. vehicle queue length). However, human-crafted features are abstractions of raw traffic data (e.g., position and speed of vehicles), which ignore some useful traffic information and lead to suboptimal traffic signal controls. In this paper, we propose a deep reinforcement learning algorithm that automatically extracts all useful features (machine-crafted features) from raw real-time traffic data and learns the optimal policy for adaptive traffic signal control. To improve algorithm stability, we adopt experience replay and target network mechanisms. Simulation results show that our algorithm reduces vehicle delay by up to 47% and 86% when compared to another two popular traffic signal control algorithms, longest queue first algorithm and fixed time control algorithm, respectively.
Motivation & Objective
- Motivate adaptive traffic signal control to handle dynamic real-time traffic better than fixed-time or queue-based methods.
- Eliminate dependence on human-crafted features by learning from raw traffic data.
- Develop a stable DRL framework using experience replay and a target network.
- Demonstrate effectiveness via simulation against popular baseline controllers.
Proposed method
- Model the intersection control as a Markov decision process and define state, action, and reward based on real-time traffic data.
- Use a deep convolutional neural network to extract features from vehicle position and speed matrices and signal state.
- Implement a DQN-like architecture with a separate target network for stabilizing learning and experience replay for efficient training.
- Train with an epsilon-greedy policy and use RMSProp to minimize temporal-difference error with a soft target network update.
- Represent inputs as P (vehicle positions) and V (normalized speeds) matrices per road, and L as the two-action green-light configuration vector.
Experimental results
Research questions
- RQ1Can a deep reinforcement learning agent learn effective adaptive traffic signal control directly from raw traffic data without hand-crafted features?
- RQ2Does experience replay and a target network improve stability and performance of DRL-based traffic signal control?
- RQ3How does the proposed method compare to fixed-time and longest-queue-first baselines under varying traffic demands?
Key findings
- The DRL agent learns a policy that reduces the sum of vehicle staying time, converging to stable, small values after sufficient training.
- Average vehicle delays on all roads decrease as training progresses, indicating effective learning of a fair control policy.
- Under higher traffic demands, the DRL method yields substantial delay reductions compared to fixed-time and longest-queue-first baselines (up to 86% vs fixed-time and 47% vs LQF).
- The method demonstrates robustness to changing demand, with delays increasing only slightly on busier roads as demand grows.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.