QUICK REVIEW

[论文解读] Scaling Out-of-Distribution Detection for Real-World Settings

Dan Hendrycks, Steven Basart|arXiv (Cornell University)|Nov 25, 2019

Anomaly Detection Techniques and Applications参考文献 35被引用 179

一句话总结

这篇论文表明，一个简单的 MaxLogit 检测器在大规模多类、多标签和异常分割的 OOD 任务中优于 MSP，并为现实世界 OOD 评估引入了新的基准（Species 和 CAOS）。

ABSTRACT

Detecting out-of-distribution examples is important for safety-critical machine learning applications such as detecting novel biological phenomena and self-driving cars. However, existing research mainly focuses on simple small-scale settings. To set the stage for more realistic out-of-distribution detection, we depart from small-scale settings and explore large-scale multiclass and multi-label settings with high-resolution images and thousands of classes. To make future work in real-world settings possible, we create new benchmarks for three large-scale settings. To test ImageNet multiclass anomaly detectors, we introduce the Species dataset containing over 700,000 images and over a thousand anomalous species. We leverage ImageNet-21K to evaluate PASCAL VOC and COCO multilabel anomaly detectors. Third, we introduce a new benchmark for anomaly segmentation by introducing a segmentation benchmark with road anomalies. We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.

研究动机与目标

在现实、规模较大的设定中激发 OOD 检测，而不仅限于小规模基准。
为大规模多类（ImageNet-21K）、多标签和分割 OOD 场景创建基准。
评估现有基线并为真实世界 OOD 检测建立一个简单而强大的基线。
检验在 ImageNet-21K 上预训练的 Vision Transformer 是否天然解决大规模设置中的 OOD 检测。

提出的方法

提出 MaxLogit：将最大未归一化对数几率的负数用作 OOD 分数，以避免类别计数偏差。
构建 Species 数据集：一个大型的、与训练测试严格不重叠的 OOD 集合，包含超过 70 万张图像和 1000+ 异常物种，用于在无训练/测试重叠的情况下测试 OOD。
在 PASCAL VOC 和 MS-COCO 上开发并评估一个多标签 OOD 设置，包含 20 个 ImageNet-21K OOD 类，比较 MSP、LogitAvg 和 MaxLogit。
创建 CAOS 基准，将 StreetHazards（基于仿真的异常）与 BDD-Anomaly（真实世界异常）结合用于异常分割。
在 StreetHazards 和 BDD-Anomaly 上，将 MaxLogit 与基线方法（MSP、背景、Dropout、重建自编码器）进行比较。
利用 ImageNet-21K-P 表征，结合 ResNet-50、ViT 和 Mixer 主干来评估 OOD 检测性能。

实验结果

研究问题

RQ1MSP 会在具有成千上万类的大规模 OOD 检测中表现差吗？
RQ2MaxLogit 是否是一个更强大、普遍适用的适用于大规模多类和多标签 OOD 检测的基线？
RQ3在 ImageNet-21K 上预训练的 Vision Transformer 是否本质上解决了大规模设置中的 OOD 检测？
RQ4我们是否可以构建现实世界条件下评估 OOD 的现实基准（Species、CAOS）？
RQ5在驾驶场景中的异常分割任务中，OOD 探测器的表现如何？

主要发现

MaxLogit 在大规模多类、多标签和异常分割任务中持续优于 MSP及其他基线。
Species 数据集表明，在不进行仔细评估以避免数据泄露的情况下，ImageNet-21K 预训练的 Vision Transformer 并不能简单地解决 OOD 检测。
MaxLogit 在多标签设置下泛化良好，优于 MSP、LogitAvg 以及经典检测器。
CAOS 基准显示 MaxLogit 在像素级异常分割方面的表现最好，相较于 MSP、背景、Dropout 和自编码器基线。
在 StreetHazards 与 BDD-Anomaly 的评估中，MaxLogit 提供了强劲、持续的改进，表明其为现实世界 OOD 检测的稳健基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。