QUICK REVIEW

[论文解读] The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning

Wenqian Ye, Jiang, Luyang|arXiv (Cornell University)|Feb 20, 2024

Neural Networks and Applications被引用 12

一句话总结

本综述对机器学习中的虚假相关性进行了形式化定义，综述了缓解它们的分类法和方法，并讨论了数据集、指标和未来挑战。

ABSTRACT

Back in the early 20th century, a horse named Hans appeared to perform arithmetic and other intellectual tasks during exhibitions in Germany, while it actually relied solely on involuntary cues in the body language from the human trainer. Modern machine learning models are no different. These models are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. Such features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a comprehensive survey of this emerging issue, along with a fine-grained taxonomy of existing state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to facilitate future research. The paper concludes with a discussion of the broader impacts, the recent advancements, and future challenges in the era of generative AI, aiming to provide valuable insights for researchers in the related domains of the machine learning community.

研究动机与目标

给出机器学习中虚假相关性的正式定义。
提供最先进缓解方法的全面分类。
总结用于虚假相关性的数据集、基准和评估指标。
讨论该领域的挑战、未来方向，以及基础模型的作用。

提出的方法

引入带有分组标签 (y,a) 的虚假相关性的正式定义，以及分组集合 G = Y & A.
将缓解方法分为数据操作、表征学习、学习策略和其他方法。
综述数据增强、概念/伪标签发现、因果干预、不变学习、特征解耦和对比学习。
讨论基于优化的方法、集成学习、识别-再缓解、微调策略和对抗训练。
提供数据集和指标的概述，强调 worst-group accuracy 作为鲁棒性度量。

Figure 2 : Hospital tags, strips, and medical devices exemplify several unknown group labels in the MIMIC-CXR dataset, which might spuriously correlate with the ground truth diagnosis results.

实验结果

研究问题

RQ1在机器学习中虚假相关性的正式定义是什么，以及如何检测和表征它们？
RQ2哪种分类法最能系统地组织当前在数据操作、表征学习和学习策略方面缓解虚假相关性的方法？
RQ3用来评估对虚假相关性鲁棒性的数据集和指标有哪些，它们的权衡是什么？
RQ4在缓解虚假相关性方面，关键挑战和未来方向是什么，包括分组标签依赖和基础模型？

主要发现

提供了虚假相关性的正式定义，包括从虚假属性到分组的映射。
提出了全面的缓解方法分类，涵盖数据操作、表征学习、学习策略和其他方法。
该综述总结了用于评估 worst-group 性能及相关度量的常见数据集和指标。
它讨论了 worst-group 与平均精度之间的权衡，并强调分组标签依赖和可扩展性等挑战。
基础性讨论将虚假相关性与领域泛化、不变学习、Group Robustness 和捷径学习联系起来。

The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。