QUICK REVIEW

[论文解读] Towards a Theoretical Framework of Out-of-Distribution Generalization

Haotian Ye, Chuanlong Xie|arXiv (Cornell University)|Jun 8, 2021

Domain Adaptation and Few-Shot Learning参考文献 60被引用 41

一句话总结

本论文提出一个用于OOD泛化的定量框架，使用变异、信息量和扩展函数，推导OOD泛化边界，并提出一套在实验中提升OOD准确度的模型选择标准。

ABSTRACT

Generalization to out-of-distribution (OOD) data is one of the central problems in modern machine learning. Recently, there is a surge of attempts to propose algorithms that mainly build upon the idea of extracting invariant features. Although intuitively reasonable, theoretical understanding of what kind of invariance can guarantee OOD generalization is still limited, and generalization to arbitrary out-of-distribution is clearly impossible. In this work, we take the first step towards rigorous and quantitative definitions of 1) what is OOD; and 2) what does it mean by saying an OOD problem is learnable. We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features. Based on these, we prove OOD generalization error bounds. It turns out that OOD generalization largely depends on the expansion function. As recently pointed out by Gulrajani and Lopez-Paz (2020), any OOD learning algorithm without a model selection module is incomplete. Our theory naturally induces a model selection criterion. Extensive experiments on benchmark OOD datasets demonstrate that our model selection criterion has a significant advantage over baselines.

研究动机与目标

将OOD泛化形式化为可用域与未见域之间基于特征分布的关系。
引入变异性、信息量（信息性）和扩展函数，用以量化不变性与可学习性。
基于扩展函数和特征变异，推导OOD泛化误差的上界和下界。
提出一个模型选择准则，在验证准确性与特征变异之间取得平衡，以提升OOD性能。
通过标准OOD数据集的实验来演示该方法，并分析真实世界OOD问题的可学习性。

提出的方法

使用分布距离ρ在跨域的一维特征上定义变异性和信息量。
引入扩展函数s(·)以关联可用域与未见域之间的变异。
将OOD的可学习性表述为在给定信息量阈值下存在(s(·), δ)-可学习性。
给出泛化界限：err(f) ≤ O(s(Vsup(h, Eavail))^(α^2/(α+d)^2))，在所列正则条件下。
将界限推广到线性顶层模型，显示可能的线性收敛：err(f) ≤ O(s(Vsup(h, Eavail))。
提出一个模型选择算法，优化Acc - r0 · V，结合验证准确性与特征变异。

实验结果

研究问题

RQ1当训练域和测试域不同步时，如何对OOD泛化进行严格表征？
RQ2特征变异性和信息量在跨未见域维持不变性中起到怎样的作用？
RQ3我们能推导出依赖于扩展函数和特征变异的OOD泛化误差定量界限吗？
RQ4是否通过同时考虑验证性能和特征变异来预测OOD性能，从而改进模型选择？

主要发现

Env	A	C	P	S	avg	acc inc
PACS	88.72%	81.74%	96.83%	79.00%	86.57%	1.66% ↑
OfficeHome	65.76%	55.07%	75.20%	76.31%	68.09%	1.00% ↑
VLCS	97.81%	66.98%	69.50%	70.97%	76.32%	0.63% ↑

扩展函数s(·)量化特征变异从可用域扩展到所有域的增长程度，并决定OOD难度。
OOD泛化误差被与特征变异和信息量相关的量所界定，上界和下界的紧缩随变异降低而增大。
对于在可用域上变异低且信息量高的特征，OOD泛化得到改善，且在某些条件下误差可趋近于零。
一个将验证准确性与变异惩罚结合的模型选择准则在若干OOD基准上优于仅基于准确性的选择。
Office-Home的实证分析显示可识别的扩展函数；增大δ（信息量阈值）会降低扩展函数，使学习更可行。
在PACS和OfficeHome上的实验表明，所提出的选择方法比基于验证准确性的选择得到更高的OOD准确率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。