QUICK REVIEW

[论文解读] Agent-based Learning for Driving Policy Learning in Connected and Autonomous Vehicles

Xiongzhao Wang, De Silva|arXiv (Cornell University)|Sep 14, 2017

Traffic control and management参考文献 13被引用 1

一句话总结

本文提出了一种基于智能体的强化学习框架，用于联网和自动驾驶汽车（CAVs），通过车辆间（V2V）通信的实时数据实现自我演化，以学习驾驶策略。结果表明，V2V通信显著提升了学习效率，使CAVs能够随时间自主发展出避撞和实现目标的策略。

ABSTRACT

Due to the complexity of the natural world, a programmer cannot foresee all possible situations a connected and autonomous vehicle (CAV) will face during its operation, and hence, CAVs will need to learn to make decisions autonomously. Due to the sensing of its surroundings and information exchanged with other vehicles and road infrastructure a CAV will have access to large amounts of useful data. This paper investigates a data driven driving policy learning framework through an agent based learning. A reinforcement learning framework is presented in the paper, which simulates the self-evolution of a CAV over its lifetime. The results indicated that overtime the CAVs are able to learn useful policies to avoid crashes and achieve its objectives in more efficient ways. Vehicle to vehicle communication in particular, enables additional useful information to be acquired by CAVs, which in turn enables CAVs to learn driving policies more efficiently. The simulation results indicate that while a CAV can learn to make autonomous decision V2V communication of information improves this capability. The future work will investigate complex driving policies such as roundabout negotiations, cooperative learning between CAVs and deep reinforcement learning to traverse larger state spaces.

研究动机与目标

解决预编程规则不足以应对不可预测的真实世界驾驶场景的挑战。
使CAVs能够通过持续交互和数据收集自主学习最优驾驶策略。
研究车辆间（V2V）通信如何提升驾驶策略学习的效率。
开发一种可扩展的数据驱动框架，支持CAVs的长期适应与自我演化。

提出的方法

该框架采用强化学习（RL）范式，CAVs通过在模拟环境中试错交互来学习策略。
每辆CAV作为一个自主智能体，观察环境，采取行动，并根据安全性和效率指标获得奖励。
集成V2V通信以提供额外的上下文数据，如周围车辆的位置和意图，丰富学习的状态空间。
学习过程模拟了CAV生命周期的演化，使其通过反复接触多样化的交通场景，实现策略的持续改进。
通过利用联网车辆和基础设施的实时数据流，该框架支持可扩展的策略学习。
该架构设计支持未来与深度强化学习的集成，以处理更大、更复杂的状态空间。

实验结果

研究问题

RQ1在缺乏预编程规则的复杂、不可预测的交通环境中，CAVs如何学习有效的驾驶策略？
RQ2车辆间（V2V）通信在多大程度上提升了CAVs驾驶策略学习的效率和效果？
RQ3自演化基于智能体的强化学习框架能否使CAVs随时间自主发展出安全高效的驾驶行为？
RQ4V2V数据的引入如何影响所学驾驶策略的收敛速度和性能？

主要发现

CAVs能够在无预定义规则的情况下，通过强化学习自主学习驾驶策略。
V2V通信的集成显著提升了模拟环境中驾驶策略的学习效率。
随着时间推移，CAVs发展出能更高效地避免碰撞并实现任务目标的策略。
V2V通信提供了关键的上下文数据，增强了智能体的感知和决策能力。
该框架在支持复杂操作（如环岛通行）方面展现出可扩展性。
未来工作将探索深度强化学习，以实现在更大、更复杂状态空间中的学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。