QUICK REVIEW

[论文解读] Autonomous Braking System via Deep Reinforcement Learning

Hyunmin Chae, Chang Mook Kang|arXiv (Cornell University)|Feb 8, 2017

Traffic control and management参考文献 8被引用 24

一句话总结

本文提出了一种基于深度强化学习的自主制动系统，采用深度Q网络（DQN）实时学习最优制动策略，以实现对行人的防撞。通过将制动决策建模为马尔可夫决策过程，并设计精心的奖励函数以平衡安全与效率，该系统在TTC值≥1.5秒时实现了100%的防撞率，并通过了所有欧洲新车评估计划（Euro NCAP）AEB行人测试，未发生任何碰撞。

ABSTRACT

In this paper, we propose a new autonomous braking system based on deep reinforcement learning. The proposed autonomous braking system automatically decides whether to apply the brake at each time step when confronting the risk of collision using the information on the obstacle obtained by the sensors. The problem of designing brake control is formulated as searching for the optimal policy in Markov decision process (MDP) model where the state is given by the relative position of the obstacle and the vehicle's speed, and the action space is defined as whether brake is stepped or not. The policy used for brake control is learned through computer simulations using the deep reinforcement learning method called deep Q-network (DQN). In order to derive desirable braking policy, we propose the reward function which balances the damage imposed to the obstacle in case of accident and the reward achieved when the vehicle runs out of risk as soon as possible. DQN is trained for the scenario where a vehicle is encountered with a pedestrian crossing the urban road. Experiments show that the control agent exhibits desirable control behavior and avoids collision without any mistake in various uncertain environments.

研究动机与目标

开发一种能够适应城市环境中动态、不确定行人横穿场景的智能自主制动系统。
克服基于规则的系统在多样化且不可预测的真实交通场景中缺乏泛化能力的局限性。
设计一种奖励函数，平衡碰撞惩罚与快速风险缓解，以鼓励安全且及时的制动决策。
通过DQN中的创伤记忆机制，提升罕见但关键碰撞场景的训练稳定性和样本效率。
在多样化测试条件下验证系统性能，包括标准化的欧洲新车评估计划（Euro NCAP）AEB行人测试。

提出的方法

将自主制动问题建模为马尔可夫决策过程（MDP），其中状态由障碍物相对位置和车辆速度定义。
将动作空间定义为四种离散制动动作：不制动、弱制动、中等制动和强制动。
采用具有全连接前馈神经网络架构（15-100-70-50-70-100-4）的深度Q网络（DQN）来近似Q值函数。
设计自定义奖励函数，参数为α=0.001，β=0.1，η=0.01，λ=100，以平衡事故惩罚与早期风险清除。
引入一个大小为1,000的创伤记忆缓冲区，用于存储并重新训练高惩罚（碰撞）经验，从而提升学习稳定性和收敛性。
使用RMSProp优化器，学习率为0.0005，并采用大小为10,000、批量大小为32的经验回放机制。

实验结果

研究问题

RQ1深度强化学习智能体是否能够在不确定的城市环境中学习到稳健且安全的行人防撞制动策略？
RQ2所提出的奖励函数在自主制动决策中，对安全性（防撞）与效率（早期风险清除）的平衡效果如何？
RQ3创伤记忆机制在多大程度上提升了对罕见但高后果碰撞场景的学习收敛性和性能表现？
RQ4基于DRL的制动系统是否能够满足如欧洲新车评估计划（Euro NCAP）AEB行人测试等标准化安全协议？
RQ5系统在不同初始条件下（包括车速、行人位置和横穿时机）的表现如何？

主要发现

采用创伤记忆的DQN智能体在2,000个训练回合内实现了稳定收敛，总累积奖励持续较高；而基线DQN（无创伤记忆）未能收敛，性能波动明显。
在测试场景中，系统在所有TTC值≥1.5秒的情况下实现了0%的碰撞率，表明在真实条件下具备有效的防撞能力。
对于TTC值低于1.5秒的情况，在0.9秒时碰撞率为61.29%，表明即使在全力制动下，由于初始车速过高，碰撞仍不可避免。
系统在20–60 km/h速度范围内成功通过了所有欧洲新车评估计划（Euro NCAP）AEB行人测试（CVFA和CVNA），实现完全合规且无任何碰撞。
平均停车距离约为行人前方5米，与安全距离（3米）一致，且可通过调整奖励参数进行调节。
轨迹分析表明，智能体初始施加弱制动，并在行人接近时逐步升级为强制动，体现了智能且自适应的控制行为。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。