QUICK REVIEW

[论文解读] Efficient Black-box Assessment of Autonomous Vehicle Safety

Justin Norden, Matthew O’Kelly|arXiv (Cornell University)|Dec 8, 2019

Autonomous Vehicle Technology and Safety参考文献 43被引用 54

一句话总结

本文摘要：论文提出一种黑盒、基于风险的框架，使用自适应多水平分割和自适应重要性采样，在自动驾驶仿真中高效估计罕见事件的事故概率，并通过评估 Comma AI 的 OpenPilot 来演示。

ABSTRACT

While autonomous vehicle (AV) technology has shown substantial progress, we still lack tools for rigorous and scalable testing. Real-world testing, the $ extit{de-facto}$ evaluation method, is dangerous to the public. Moreover, due to the rare nature of failures, billions of miles of driving are needed to statistically validate performance claims. Thus, the industry has largely turned to simulation to evaluate AV systems. However, having a simulation stack alone is not a solution. A simulation testing framework needs to prioritize which scenarios to run, learn how the chosen scenarios provide coverage of failure modes, and rank failure scenarios in order of importance. We implement a simulation testing framework that evaluates an entire modern AV system as a black box. This framework estimates the probability of accidents under a base distribution governing standard traffic behavior. In order to accelerate rare-event probability evaluation, we efficiently learn to identify and rank failure scenarios via adaptive importance-sampling methods. Using this framework, we conduct the first independent evaluation of a full-stack commercial AV system, Comma AI's OpenPilot.

研究动机与目标

在基础交通分布下，将 AV 安全性量化为罕见事件概率的基于风险的框架。
开发一种自适应、无偏的罕见事件估计器，要求对 AV 策略具有黑盒访问。
从检测到的失败中学习失效模式的生成模型，以对风险进行优先排序和排名。
实现一个可扩展的仿真系统，能够对全栈 AV 策略进行确定性、同步的评估。
通过一个案例研究，估计 OpenPilot 的事故概率，并在显著更少的仿真次数下展示高效的置信度。

提出的方法

将安全性形式化为 p_gamma = P0(f(X) < gamma)，其中基分布建模标准交通行为。
使用通过多水平分割的自适应非参数重要性采样，将 p_gamma 分解为跨中间水平的条件概率乘积。
应用马尔科夫链蒙特卡洛（MCMC）来估算每个条件概率，并以固定的放弃分数 delta 在线自适应水平。
提出带有偏差/方差保证且方差按 O(log(1/p_gamma)) 比例缩放的自适应多水平分割（AMS）。
利用规范化流从 AMS 发现的失败分布中学习失败模式的生成模型。

实验结果

研究问题

RQ1在不暴露内部结构的情况下，是否可以对黑盒 AV 策略进行安全性评估？
RQ2如何用无偏风险度量高效地估计自动驾驶仿真中的罕见高风险事件？
RQ3自适应采样是否能够以可扩展的方式识别并对高可能性的失效场景进行优先排序？
RQ4基于风险、以仿真驱动的框架是否能为全栈 AV 系统提供可靠的安全性量化？

主要发现

该框架实现了对一个全栈商业 AV 策略（OpenPilot）的首次独立评估。
该方法在比标准方法少得多的仿真次数下，估计 OpenPilot 的失效率为每 1250 英里1次。
AMS 提供无偏估计，其相对方差按 log(1/p_gamma) 缩放，而不是按 1/p_gamma。
可以学习并使用一个概率化、基于风险的世界模型来按可能性对失效场景进行排名。
一个可扩展的分布式仿真设置支持实时、异步软件的确定性、同步执行。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。