QUICK REVIEW

[论文解读] DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing

Mengshi Zhang, Yuqun Zhang|arXiv (Cornell University)|Feb 7, 2018

Software Testing and Debugging Techniques参考文献 12被引用 63

一句话总结

DeepRoad 使用 GAN 基于图像到图像翻译来合成真实的天气条件驾驶场景，并应用变形测试以检测在极端天气（雪/雨）下基于 DNN 的自动驾驶系统的一致性。

ABSTRACT

While Deep Neural Networks (DNNs) have established the fundamentals of DNN-based autonomous driving systems, they may exhibit erroneous behaviors and cause fatal accidents. To resolve the safety issues of autonomous driving systems, a recent set of testing techniques have been designed to automatically generate test cases, e.g., new input images transformed from the original ones. Unfortunately, many such generated input images often render inferior authenticity, lacking accurate semantic information of the driving scenes and hence compromising the resulting efficacy and reliability. In this paper, we propose DeepRoad, an unsupervised framework to automatically generate large amounts of accurate driving scenes to test the consistency of DNN-based autonomous driving systems across different scenes. In particular, DeepRoad delivers driving scenes with various weather conditions (including those with rather extreme conditions) by applying the Generative Adversarial Networks (GANs) along with the corresponding real-world weather scenes. Moreover, we have implemented DeepRoad to test three well-recognized DNN-based autonomous driving systems. Experimental results demonstrate that DeepRoad can detect thousands of behavioral inconsistencies in these systems.

研究动机与目标

促使对基于 DNN 的自动驾驶系统进行鲁棒性测试，超越简单的图像过滤。
引入一个基于无监督 GAN 的框架，在极端天气下合成真实的驾驶场景。
定义元关系，以测试在天气变换场景下的驾驶决策的一致性。
在真实世界的自动驾驶模型上评估该框架，以揭示鲁棒性差距。

提出的方法

使用 UNIT（GAN+VAE）在场景域之间执行无监督的图像到图像翻译（细节场景与雪天/雨天）。
用两域的未配对真实驾驶图像来训练 UNIT，以学习共享潜在空间。
将训练好的 UNIT 应用，将每个原始驾驶场景转换为天气变换版本 tau(i)。
将原始和变换后的图像对输入到自动驾驶 DNN，并比较转向输出以检测不一致性。
通过跨 i 和 tau(i) 的预测转向角度的阈值差异来量化不一致性。
在 Udacity 的三种驾驶模型（Autumn、Chauffeur、Rwightman）上使用 Udacity 和 YouTube 来源的天气数据进行评估。

实验结果

研究问题

RQ1GAN 基于元变换能否生成在不同天气条件下看起来真实的驾驶场景？
RQ2当驾驶场景被转换到雪天或雨天条件时，基于 DNN 的自动驾驶系统的转向预测是否不一致？
RQ3哪些模型在元天气变换下表现出更强的鲁棒性或脆弱性？
RQ4不同的误差界限如何影响对各模型检测到的不一致性？

主要发现

模型	不一致行为数量	场景	10°	20°	30°	40°
Autumn	11635	Snowy	11635	11602	11388	10239
Chauffeur	4839	Snowy	4839	2105	1093	653
Rwightman	334	Snowy	334	115	45	14
Autumn	5279	Rainy	5279	5279	5279	5279
Chauffeur	710	Rainy	710	175	94	71
Rwightman	656	Rainy	656	92	23	0

DeepRoad 在雪天和雨天条件下，识别出在三个 Udacity 驾驶模型中的数千个不一致转向预测。
Autumn 在两种天气情景下显示出最高的不一致性，而 Rwightman 在各条件下最为稳定。
在雨天，误差界限为 10° 时，检测到的不一致分别为 5279（Autumn）、710（Chauffeur）和 656（Rwightman）。
在雪天，误差界限为 10° 时，检测到的不一致分别为 11635（Autumn）、4839（Chauffeur）和 334（Rwightman）。
通常提高误差界限会减少每个模型检测到的不一致性数量，表明鲁棒性差异。
GAN 生成的场景在质性上类似真实天气场景，并保持了道路结构和对象等主要语义内容。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。