QUICK REVIEW

[论文解读] End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perception

Zhengyuan Yang, Yixuan Zhang|arXiv (Cornell University)|Jan 20, 2018

Autonomous Vehicle Technology and Safety参考文献 17被引用 40

一句话总结

本文提出一种多模态多任务网络，能够从前视图图像和先前速度同时预测转向角和速度（包含离散速度指令和连续速度值），在 Udacity 数据集和一个新的 SAIC 数据集上进行了评估，并通过数据合成方法来缓解实际道路测试中的误差累积。

ABSTRACT

Convolutional Neural Networks (CNN) have been successfully applied to autonomous driving tasks, many in an end-to-end manner. Previous end-to-end steering control methods take an image or an image sequence as the input and directly predict the steering angle with CNN. Although single task learning on steering angles has reported good performances, the steering angle alone is not sufficient for vehicle control. In this work, we propose a multi-task learning framework to predict the steering angle and speed control simultaneously in an end-to-end manner. Since it is nontrivial to predict accurate speed values with only visual inputs, we first propose a network to predict discrete speed commands and steering angles with image sequences. Moreover, we propose a multi-modal multi-task network to predict speed values and steering angles by taking previous feedback speeds and visual recordings as inputs. Experiments are conducted on the public Udacity dataset and a newly collected SAIC dataset. Results show that the proposed model predicts steering angles and speed values accurately. Furthermore, we improve the failure data synthesis methods to solve the problem of error accumulation in real road tests.

研究动机与目标

通过联合预测速度和转向角，推动端到端的车辆控制超越单纯的转向角。
开发一个多任务框架，初期能够预测离散的速度指令，随后细化为速度数值。
将先前的反馈速度作为额外模态引入，以提升速度和转向预测。
在 Udacity 和新收集的 SAIC 数据集（日间与夜间驾驶）上进行收集与评估。
解决失败数据合成问题，以降低现实道路测试中的误差累积。

提出的方法

提出一个用于端到端转向角预测的基础CNN模型，采用更大的卷积核和1:1的输入纵横比。
引入一个离散速度指令网络，利用图像序列通过LSTM获取时间上下文，预测加速、减速或保持速度。
开发一个多模态多任务网络，增加一个速度编码器（用于先前速度）并将视觉特征与速度特征融合，以同时预测转向角和速度。
对转向角使用均值绝对误差(MAE)，对离散速度指令使用交叉熵，并对各任务使用加权联合损失。
应用数据增强，包括HSV颜色空间变换、轻微旋转、水平翻转以及侧视摄像头数据合成，以生成失败样本。
在Udacity和SAIC数据集上进行评估，比较 PilotNet 和其他基线，并在路试中进行扩展数据合成以解决误差累积。

实验结果

研究问题

RQ1端到端学习是否能够从视觉输入同时预测转向角和速度（离散指令和/或连续数值）？
RQ2将先前的反馈速度作为额外模态是否能提高转向和速度的预测准确性？
RQ3提出的多模态多任务网络与单任务或纯视觉驱动的转向模型在标准驾驶数据集上的表现有何差异？
RQ4哪些数据增强和失败样本合成策略可以提升在现实世界中的鲁棒性并减少路测中的误差累积？
RQ5新收集的SAIC数据集（日间与夜间）如何影响模型性能与泛化能力？

主要发现

模型/数据集	角度（MAE，单位：度）	速度 MAE（m/s）
Nvidia’s PilotNet	4.26
Cg Network	4.18
Base Steering Model	2.84
Discrete Speed Command Network	1.85
Multi-modal Multi-task Network	1.26
Udacity		0.19
SAIC		0.45

转向角 MAE：在基线和离散速度模型的改进后，多模态多任务网络在 Udacity 上达到 1.26 度。
基础转向模型达到 2.84 度 MAE，在评估设置中优于 PilotNet 和更简化的 Cg 网络。
离散速度指令网络在预测离散指令时对转向角的 MAE 达到 1.85 度，表明联合训练的益处。
速度数值预测 MAE：在Udacity 为 0.19 m/s，在 SAIC 为 0.45 m/s，使用多模态多任务网络。
使用先前速度作为额外输入模态时，速度数值和转向角预测均有所提升。
SAIC数据集包含日间和夜间；经过筛选后用于训练/验证/测试的数据为两小时。
路试表明数据增强和合成失败样本有助于缓解误差累积，并在测试条件下实现自主驾驶。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。