QUICK REVIEW

[论文解读] A Hierarchy of Limitations in Machine Learning

Momin M. Malik|arXiv (Cornell University)|Feb 12, 2020

Explainable Artificial Intelligence (XAI)参考文献 232被引用 49

一句话总结

对机器学习在社会中的基本概念、程序和统计局限性进行分层结构化批评，聚焦下游后果如何通过从量化到交叉验证的层级传播。

ABSTRACT

"All models are wrong, but some are useful", wrote George E. P. Box (1979). Machine learning has focused on the usefulness of probability models for prediction in social systems, but is only now coming to grips with the ways in which these models are wrong---and the consequences of those shortcomings. This paper attempts a comprehensive, structured overview of the specific conceptual, procedural, and statistical limitations of models in machine learning when applied to society. Machine learning modelers themselves can use the described hierarchy to identify possible failure points and think through how to address them, and consumers of machine learning models can know what to question when confronted with the decision about if, where, and how to apply machine learning. The limitations go from commitments inherent in quantification itself, through to showing how unmodeled dependencies can lead to cross-validation being overly optimistic as a way of assessing model performance.

研究动机与目标

识别并整理将机器学习应用于社会系统时的基本假设和局限性。
解释依赖性和测量选择如何偏倚模型评估，尤其通过交叉验证。
推荐混合方法和替代方法如何解决仅依赖机器学习的工作流程的局限性。
为建模者和用户提供在实践中应质疑ML使用的领域的指导。

提出的方法

提出一个四级决策层次，指导ML的使用：(1) 定量分析优先于定性分析，(2) 概概率建模优先于其他建模，(3) 预测建模优于解释性建模，(4) 将交叉验证作为评估工具。
发展从量化到交叉验证的局限性如何传播的逻辑追踪链的概念。
将 Efron, 2004 的优化/乐观主义思想扩展到关于依赖性如何偏倚交叉验证的理论。
借鉴哲学、社会学、统计学和机器学习的批评观点，将ML的抽象与社会语境联系起来。
引用并讨论构念、潜变量，以及在测量中真值与构念之间的角色。

实验结果

研究问题

RQ1将机器学习用于社会分析的基本假设和局限性是什么？
RQ2依赖性和测量选择如何偏倚交叉验证和模型评估？
RQ3优先考虑定量、概率和预测方法对理解社会现象的影响是什么？
RQ4混合方法如何解决应用于社会的ML的局限性？
RQ5什么框架可以帮助建模者和使用者更有效地质疑ML的断言？

主要发现

存在一条通过量化、构念、到交叉验证传播的分层局限链。
当存在依赖性且未得到充分考虑时，交叉验证可能对泛化性过于乐观。
量化带来中心极限定向，可能误解意义的建构和现实生活经验。
构念和测量问题可能导致与潜在潜因不一致的真值代理。
混合方法和替代的验证方法可以缓解一些ML局限，尽管这需要谨慎的协作和方法学工作。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。