QUICK REVIEW

[论文解读] A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Beau Coker, Cynthia Rudin|arXiv (Cornell University)|Apr 23, 2018

Statistical and Computational Modeling参考文献 69被引用 15

一句话总结

本文提出了黑客区间（hacking intervals）——一种新的统计推断理论，旨在通过量化合理、诚实的数据分析选择可能带来的结果范围，提升科学研究结果的稳健性与可重复性。与经典置信区间不同，黑客区间不依赖于虚幻的总体或概率理论，而是提供一种更直观、透明且可解释的度量方式，用于衡量研究者在模型设定中自由度带来的不确定性。

ABSTRACT

Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one's own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We introduce hacking intervals, which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals.

研究动机与目标

为应对研究者在数据分析中自由度导致结果偏倚、不稳健且不可重复的日益增长的担忧。
发展一种推断理论，量化合理、诚实的分析选择对实证结论的影响。
提供一种透明、可解释且无需概率理论的古典置信区间的替代方案，反映模型与分析设定带来的不确定性。
通过使研究者和读者能够评估某一结果在不同但合理的分析决策下是否会改变，从而提升科学研究的完整性。
形式化一个框架，支持在观测和因果推断场景下的可重复性与稳健性，尤其在模型依赖性和未测量混杂因素存在的情况下。

提出的方法

提出两种类型的黑客区间：规定性约束型（prescriptively constrained）与系绳型（tethered），两者均通过一组合理分析选择下总结统计量（如回归系数）的范围来定义区间。
利用模型类别、损失函数和预测性能的约束来定义区间的边界，确保与合理研究者所认为的有效分析一致。
系绳型黑客区间仅要求所选模型在观测数据上达到较小的损失，避免显式枚举所有分析路径。
证明在最大似然设定下，系绳型黑客区间在数学上等价于轮廓似然置信区间，提供一种无需概率理论的新颖、直观的古典区间解释。
将该框架应用于最小二乘估计，利用t分布和卡方分布的性质，推导出黑客区间边界方差的精确表达式。
利用Vapnik-Chervonenkis理论，推导出经“黑客”处理数据的泛化误差界，建立模型复杂度与分析扰动下泛化误差之间的联系。

实验结果

研究问题

RQ1在真实世界的研究中，合理、诚实的数据分析选择在多大程度上会导致不同的实证结论？
RQ2我们能否以一种透明、可解释且不依赖于虚幻总体的方式，量化科学结果对这些分析选择的稳健性？
RQ3在解释和统计特性方面，黑客区间与古典置信区间相比如何？
RQ4黑客区间能否与现有统计方法（如轮廓似然区间）建立正式联系？
RQ5当分析程序受到诚实但多样的选择（即“黑客”操作）影响时，模型的泛化误差是多少？

主要发现

黑客区间为数据分析中诚实研究者选择的稳健性提供了直接、直观的度量，区间越小表示结果越稳健。
系绳型黑客区间在数学上等价于轮廓似然置信区间，为古典区间提供了无需概率理论的新颖解释。
最小二乘估计的平均处理效应（ATE）黑客区间边界的方差取决于残差平方和与自由度，其精确公式通过卡方分布性质推导得出。
对于个体处理效应估计，黑客区间边界通过带互补松弛条件的优化推导得出，结果为以最小二乘估计为中心的对称区间。
当缩放因子 √(θ − SSE) / ||XΥ|| 等于t分布临界值乘以标准误时，个体处理效应的黑客区间边界与标准置信区间的边界一致。
利用VC理论推导出黑客模型的泛化误差界，表明在模型复杂度有界且存在扰动的情况下，真实风险与经验风险接近，且以高概率成立。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。