QUICK REVIEW

[论文解读] WILDS: A Benchmark of in-the-Wild Distribution Shifts

Pang Wei Koh|CaltechAUTHORS (California Institute of Technology)|Dec 14, 2020

AI in cancer detection参考文献 393被引用 286

一句话总结

WILDS 提出一个经过筛选的10个现实世界分布漂移数据集基准，覆盖多种模态，记录标准方法和基线鲁棒性方法在分布外数据上的性能不足，并提供开源包装和排行榜以促进方法开发。

ABSTRACT

Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.

研究动机与目标

激发并量化现实世界分布漂移对机器学习模型的影响。
提供一组多样且现实的领域泛化和子人群漂移基准。
提供一个开源的数据加载/评估包及排行榜，以标准化鲁棒性方面的进展。

提出的方法

策划10个现实世界数据集，覆盖多模态下的领域泛化和子人群漂移。
用领域注释定义训练/测试分割，以实现领域感知学习。
评估标准训练和现有的分布漂移鲁棒基线，以量化ID与OOD之间的性能差距。
提供一个开源的 Wilds 包，自动化数据加载、模型基线和评估流程。
建立一个公开排行榜，以跟踪对分布漂移的鲁棒性。

实验结果

研究问题

RQ1在跨域和跨子人群的现实世界分布漂移面前，标准训练程序在多大程度上会退化？
RQ2现有的分布漂移鲁棒性方法能否缩小 Wilds 数据集上的性能差距？
RQ3是否可以利用领域注释来提升对未见领域或子人群的鲁棒性？
RQ4鲁棒性差距如何在多样化的数据模态和任务中表现？

主要发现

在所有数据集中，训练在分布外数据上的性能相对于分布内评估呈下降。
分布漂移的基线方法显示出持续的差距，强调需要新的鲁棒性方法。
基准涵盖医院、相机陷阱、卫星影像，以及用户/区域变体等领域，以反映现实世界的漂移。
一个开源包标准化数据加载、模型、超参数和评估，并提供公开排行榜以跟踪进展。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。