QUICK REVIEW

[论文解读] Model Agnostic Sample Reweighting for Out-of-Distribution Learning

Xiaofang Zhou, Yong Lin|arXiv (Cornell University)|Jan 24, 2023

Domain Adaptation and Few-Shot Learning被引用 10

一句话总结

MAPLE 引入一个二层优化框架，通过学习样本权重对训练数据进行再加权，使加权经验风险最小化（weighted ERM）能够超过最先进的 OOD 方法，同时具有模型无关性，并且在大型网络中不易过拟合。

ABSTRACT

Distributionally robust optimization (DRO) and invariant risk minimization (IRM) are two popular methods proposed to improve out-of-distribution (OOD) generalization performance of machine learning models. While effective for small models, it has been observed that these methods can be vulnerable to overfitting with large overparameterized models. This work proposes a principled method, extbf{M}odel extbf{A}gnostic sam extbf{PL}e r extbf{E}weighting ( extbf{MAPLE}), to effectively address OOD problem, especially in overparameterized scenarios. Our key idea is to find an effective reweighting of the training samples so that the standard empirical risk minimization training of a large model on the weighted training data leads to superior OOD generalization performance. The overfitting issue is addressed by considering a bilevel formulation to search for the sample reweighting, in which the generalization complexity depends on the search space of sample weights instead of the model size. We present theoretical analysis in linear case to prove the insensitivity of MAPLE to model size, and empirically verify its superiority in surpassing state-of-the-art methods by a large margin. Code is available at \url{https://github.com/x-zho14/MAPLE}.

研究动机与目标

解决 OOD 泛化问题，特别是针对过度参数化的模型，通过避免在基于正则化的方法中的过拟合。
将优化从模型参数转换为样本权重，以降低泛化风险。
在没有强先验或分组标签的情况下自动学习样本权重。
在线性情形下提供理论见解，并在不同数据集和模型规模上进行实证验证。

提出的方法

将 MAPLE 表述为一个二层优化问题，其中内部循环在训练数据上最小化带权重的 ERM 损失，外部循环通过在验证集上最小化 OOD 标准来更新样本权重。
通过投影梯度下降结合截断反向传播来求解双层问题，以计算相对于权重的外部目标的梯度。
将权重空间表示为一个更小的、模型无关的空间，以减轻与大型神经网络相关的过拟合。
可选地引入稀疏性，以强制仅有一部分训练样本影响内部优化，从而降低计算量。
给出理论结果，显示在线性情形下理想样本权重的可辨识性，以及依赖于权重空间复杂度而非模型大小的有限样本泛化界。

实验结果

研究问题

RQ1MAPLE 是否能够识别出不依赖于虚假特征的样本权重，同时保持或提升 OOD 性能？
RQ2所学习的样本权重映射是否对不同的模型规模和体系结构具有鲁棒性？
RQ3在不同数据情景和过参数化条件下，MAPLE 与 IRM 和 GroupDRO 的比较如何？
RQ4在权重设计中引入稀疏性是否能提升泛化性和计算效率？
RQ5MAPLE 在线性和有限样本设定下有哪些理论保证？

主要发现

MAPLE 在多项任务和模型中相比最先进方法实现了更优的 OOD 性能。
在线性设定中，存在一个权重函数能产生去偏的最优预测器，且在合适条件下 MAPLE 可以辨识地恢复它。
有限样本分析表明泛化界取决于权重空间复杂度和验证数据规模，而非模型容量。
实证结果表明，在某些情况下，MAPLE 能达到与 Oracle（去除虚假特征的 ERM）同等或更好的最坏组准确率。
学习得到的样本权重在同一任务的不同网络骨干之间具有可迁移性（例如，用 ResNet-18 学得的权重可用于 ResNet-50）。
MAPLE 通过在样本权重空间中搜索而非在完整模型参数空间中搜索来避免过拟合，从而能够优于基于正则化的方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。