QUICK REVIEW

[论文解读] Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control.

Cathy Wu, Aboudy Kreidieh|arXiv (Cornell University)|Oct 16, 2017

Traffic control and management被引用 155

一句话总结

Flow 是一个整合了 SUMO 和 rllab 的深度强化学习框架，用于在混合自主性环境中基准化学习型与传统交通控制器。它表明，简单的神经网络策略可在多种交通密度下稳定环形道路交通，并具备分布外泛化能力，在泛化性能上超越了最先进的手工设计控制器。

ABSTRACT

Flow is a new computational framework, built to support a key need triggered by the rapid growth of autonomy in ground traffic: controllers for autonomous vehicles in the presence of complex nonlinear dynamics in traffic. Leveraging recent advances in deep Reinforcement Learning (RL), Flow enables the use of RL methods such as policy gradient for traffic control and enables benchmarking the performance of classical (including hand-designed) controllers with learned policies (control laws). Flow integrates traffic microsimulator SUMO with deep reinforcement learning library rllab and enables the easy design of traffic tasks, including different networks configurations and vehicle dynamics. We use Flow to develop reliable controllers for complex problems, such as controlling mixed-autonomy traffic (involving both autonomous and human-driven vehicles) in a ring road. For this, we first show that state-of-the-art hand-designed controllers excel when in-distribution, but fail to generalize; then, we show that even simple neural network policies can solve the stabilization task across density settings and generalize to out-of-distribution settings.

研究动机与目标

应对混合自主性环境中自动驾驶与人工驾驶车辆共存时对可扩展、自适应交通控制日益增长的需求。
在真实的交通场景中，实现深度强化学习策略与传统手工设计控制器的基准对比。
设计一个灵活的框架，支持多样化的交通网络配置、车辆动力学和任务定义。
研究学习型策略在不同交通密度及分布外条件下的泛化能力。
证明学习型策略可在分布偏移条件下稳定复杂交通动态，而传统控制器在此类条件下会失效。

提出的方法

将交通微观仿真器 SUMO 与深度强化学习库 rllab 集成，构建统一的训练与评估环境。
通过模块化的网络配置和可自定义的车辆动力学，在 SUMO 中定义交通控制任务。
实现基于策略梯度的强化学习算法，以训练用于交通信号和车辆控制的神经网络策略。
以具有混合自主性的环形道路场景为核心基准，评估在不同车辆密度下的控制器性能。
采用端到端的连续控制公式训练策略，实现对交通稳定性目标的直接优化。
在分布内和分布外的交通条件下评估学习型策略与传统控制器，以评估其鲁棒性与泛化能力。

实验结果

研究问题

RQ1深度强化学习策略是否能在不同交通密度下稳定混合自主性环形道路交通？
RQ2在不同密度设置下，学习型策略与最先进的手工设计控制器在性能和泛化能力方面如何比较？
RQ3学习型策略是否能在传统控制器失效的分布外交通条件下实现泛化？
RQ4网络配置和车辆动力学对策略学习与控制稳定性有何影响？
RQ5简单的神经网络架构是否无需大量架构工程即可实现鲁棒的交通稳定？

主要发现

最先进的手工设计控制器在分布内条件下表现良好，但在交通密度超出其训练范围时无法泛化。
通过深度强化学习训练的简单神经网络策略在广泛交通密度范围内成功稳定了混合自主性交通。
学习型策略能有效泛化到分布外设置，在传统控制器失效时仍能保持稳定。
Flow 框架可在统一、可扩展的环境中可靠地训练和评估学习型与传统控制器。
SUMO 与 rllab 的集成使得在多样化交通场景和控制目标下高效实验成为可能。
Flow 中的策略梯度方法在涉及非线性动力学和混合自主性的现实世界交通控制问题中展现出实际可行性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。