QUICK REVIEW

[论文解读] Centralized Conflict-free Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization.

Yang Guan, Yangang Ren|arXiv (Cornell University)|Dec 18, 2019

Traffic control and management被引用 6

一句话总结

本文提出了一种基于模型加速近端策略优化（MA-PPO）算法的集中式强化学习协调方法，用于在无信号交叉口实现联网自动驾驶车辆的协调。通过将先验模型整合到PPO中，并将轨迹优化建模为具有自定义状态、动作和奖励设计的马尔可夫决策过程，该方法在离线训练下实现了无碰撞的交通流，显著提升了交叉口效率。

ABSTRACT

Connected vehicles will change the modes of future transportation management and organization, especially at intersections. There are mainly two categories coordination methods at unsignalized intersection, i.e. centralized and distributed methods. Centralized coordination methods need huge computation resources since they own a centralized controller to optimize the trajectories for all approaching vehicles, while in distributed methods each approaching vehicles owns an individual controller to optimize the trajectory considering the motion information and the conflict relationship with its neighboring vehicles, which avoids huge computation but needs sophisticated manual design. In this paper, we propose a centralized conflict-free cooperation method for multiple connected vehicles at unsignalized intersection using reinforcement learning (RL) to address computation burden naturally by training offline. We firstly incorporate a prior model into proximal policy optimization (PPO) algorithm to accelerate learning process. Then we present the design of state, action and reward to formulate centralized cooperation as RL problem. Finally, we train a coordinate policy by our model-accelerated PPO (MA-PPO) in a simulation setting and analyze results. Results show that the method we propose improves the traffic efficiency of the intersection on the premise of ensuring no collision.

研究动机与目标

解决联网车辆系统在无信号交叉口集中协调带来的高计算负担问题。
通过强化学习实现自动轨迹优化，减少对分布式方法中人工设计的依赖。
在保证无碰撞车辆协作的前提下，提升交叉口的交通效率。
通过将先验模型集成到PPO算法中，加速深度强化学习中的学习过程。

提出的方法

该方法将集中式车辆协调建模为马尔可夫决策过程，为多车轨迹优化定义状态、动作和奖励组件。
将先验模型集成到近端策略优化（PPO）算法中，以加速训练收敛。
在仿真环境中离线训练MA-PPO算法，以学习无冲突车辆通行的协调策略。
状态表示包括车辆位置、速度以及与邻近车辆的冲突关系。
动作空间定义了每辆车的轨迹调整（例如速度变化），以避免碰撞。
奖励函数设计旨在鼓励及时通行，同时惩罚碰撞和过度减速。

实验结果

研究问题

RQ1集中式强化学习方法能否在无信号交叉口有效协调多辆联网车辆，且不发生碰撞？
RQ2将先验模型集成到PPO中，如何提升车辆协调任务中的学习效率？
RQ3与传统方法相比，所提出方法在多大程度上提升了交叉口交通效率？
RQ4所设计的状态、动作和奖励组件对学习过程的稳定性与收敛性有何影响？

主要发现

所提出的MA-PPO方法成功学习到一种协调策略，可确保无信号交叉口处车辆无碰撞通行。
将先验模型集成到PPO中，显著加速了离线训练过程中的学习速度。
该方法提升了交叉口的交通效率，表现为车辆延迟减少和交通流更平稳。
奖励函数设计有效平衡了安全与效率，减少了不必要的减速，同时防止了冲突。
仿真结果证实，集中式方法在多车协调场景下保持了高性能，且具备良好的可扩展性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。