QUICK REVIEW

[论文解读] Conditional Affordance Learning for Driving in Urban Environments

Axel Sauer, Nikolay Savinov|arXiv (Cornell University)|Jun 18, 2018

Autonomous Vehicle Technology and Safety被引用 29

一句话总结

该论文提出条件可操作性学习（CAL），一种直接感知方法，将视频输入与高层级导航指令映射为低维可操作性表征，实现在复杂城市环境中的安全、平稳自主驾驶。通过基于方向意图进行条件预测并使用图像级标签，CAL在CARLA上实现目标导向导航性能提升68%，并实现交通灯与限速标志识别的端到端学习，显著减少模拟中的事故。

ABSTRACT

Most existing approaches to autonomous driving fall into one of two categories: modular pipelines, that build an extensive model of the environment, and imitation learning approaches, that map images directly to control outputs. A recently proposed third paradigm, direct perception, aims to combine the advantages of both by using a neural network to learn appropriate low-dimensional intermediate representations. However, existing direct perception approaches are restricted to simple highway situations, lacking the ability to navigate intersections, stop at traffic lights or respect speed limits. In this work, we propose a direct perception approach which maps video input to intermediate representations suitable for autonomous navigation in complex urban environments given high-level directional inputs. Compared to state-of-the-art reinforcement and conditional imitation learning approaches, we achieve an improvement of up to 68 % in goal-directed navigation on the challenging CARLA simulation benchmark. In addition, our approach is the first to handle traffic lights and speed signs by using image-level labels only, as well as smooth car-following, resulting in a significant reduction of traffic accidents in simulation.

研究动机与目标

将直接感知方法扩展至模块化流水线过于复杂且模仿学习缺乏可解释性的复杂城市驾驶场景。
开发一种条件神经网络，利用高层级导航指令（如“左转”）引导可操作性预测。
仅使用图像级标注实现交通灯与限速标志识别的端到端学习，避免昂贵的像素级标注。
通过学习支持稳定控制并减少急动的中间表征，提升驾驶平稳性与安全性。
在保持可解释性与鲁棒性的同时，实现在模拟中的最先进导航性能。

提出的方法

视频编码器处理连续的RGB帧，以提取时空特征。
条件头基于高层级导航指令，预测一组低维可操作性向量（如到中心线的距离、相对角度、车辆距离）。
可操作性预测结果由传统控制器用于生成控制指令（转向、油门）。
模型使用监督学习进行训练，交通灯与限速标志的标签为图像级标签，避免像素级标注。
通过LSTM、GRU或时间卷积网络实现时序建模，以提升预测随时间的稳定性。
采用多任务学习目标，同时优化多种可操作性类型，采用任务特定的头设计。

实验结果

研究问题

RQ1与端到端模仿学习和强化学习相比，条件可操作性学习是否能提升复杂城市环境中目标导向导航的性能？
RQ2在直接感知框架中，仅使用图像级标签是否能有效识别交通灯与限速标志？
RQ3基于高层级导航指令对可操作性预测进行条件化，如何提升驾驶的稳定性和安全性？
RQ4时序建模在多大程度上增强了驾驶行为的鲁棒性与平稳性？
RQ5该模型在仅需少量监督的情况下，能否在多样化城市场景（包括交叉路口与动态障碍物）中实现良好泛化？

主要发现

与最先进强化学习及条件模仿学习方法相比，CAL在CARLA基准上实现目标导向导航成功率提升68%。
该模型成功仅使用图像级标签学习识别交通灯与限速标志，这是在城市驾驶直接感知框架中的首次实现。
CAL代理通过实现平稳跟车与正确遵守交通灯，显著减少了模拟中的交通事故。
中位中心线偏离距离为0.334 m，优于CIL（0.390 m）与RL（0.755 m），表明其具有更优的车道保持性能。
纵向急动降低至0.333 m/s³，显著低于RL（1.368 m/s³），表明加减速更平稳。
转弯过程中的横向急动为0.065 m/s³，远低于RL（0.548 m/s³），表明转向过渡更平稳。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。