QUICK REVIEW

[论文解读] In Search of Lost Domain Generalization

Ishaan Gulrajani, David López-Paz|arXiv (Cornell University)|Jul 2, 2020

Domain Adaptation and Few-Shot Learning参考文献 84被引用 85

一句话总结

本文认为领域泛化方法必须包含一个模型选择策略，并且在实现细致、基线强大的前提下，经验风险最小化（ERM）在多个数据集上可以达到或超过最先进的方法。它还引入 DomainBed，这是一个用于公平、可重复评估 DG 方法的 PyTorch 测试平台。

ABSTRACT

The goal of domain generalization algorithms is to predict well on distributions different from those seen during training. While a myriad of domain generalization algorithms exist, inconsistencies in experimental conditions -- datasets, architectures, and model selection criteria -- render fair and realistic comparisons difficult. In this paper, we are interested in understanding how useful domain generalization algorithms are in realistic settings. As a first step, we realize that model selection is non-trivial for domain generalization tasks. Contrary to prior work, we argue that domain generalization algorithms without a model selection strategy should be regarded as incomplete. Next, we implement DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria. We conduct extensive experiments using DomainBed and find that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets. Looking forward, we hope that the release of DomainBed, along with contributions from fellow researchers, will streamline reproducible and rigorous research in domain generalization.

研究动机与目标

评估在不同数据集、模型和选择标准下领域泛化评估的现实性。
探究模型选择策略如何影响 DG 方法的性能。
提供一个标准化、可重复使用的 DG 实验测试平台，以提升可重复性。
在领域泛化研究中推动更强的基线和公平比较。

提出的方法

讨论并阐明领域泛化中模型选择的挑战。
在 DomainBed 上实现并比较七个多域数据集上的九个 DG 算法。
评估三种模型选择标准（在训练域上验证、留一域、以及测试域 oracle）。
对每个算法/数据配置进行 20 次随机超参数搜索，进行三次独立运行。
以严格的多次平均和标准误来报告结果。
发布 DomainBed，以简化添加新算法/数据集并实现端到端实验。

实验结果

研究问题

RQ1不同的模型选择策略如何影响领域泛化的性能？
RQ2在现实、标准化的评估条件下，DG 算法是否始终优于强力的 ERM 基线？
RQ3标准化的测试平台（DomainBed）是否能够实现更公平、可重复的领域泛化研究？

主要发现

算法	CMNIST	RMNIST	VLCS	PACS	Office-Home	TerraInc	DomainNet	平均值
ERM	52.0 ± 0.1	98.0 ± 0.0	77.4 ± 0.3	85.7 ± 0.5	67.5 ± 0.5	47.2 ± 0.4	41.2 ± 0.2	67.0
IRM	51.8 ± 0.1	97.9 ± 0.0	78.1 ± 0.0	84.4 ± 1.1	66.6 ± 1.0	47.9 ± 0.7	35.7 ± 1.9	66.0
DRO	52.0 ± 0.1	98.1 ± 0.0	77.2 ± 0.6	84.1 ± 0.4	66.9 ± 0.3	47.0 ± 0.3	33.7 ± 0.2	65.5
Mixup	51.9 ± 0.1	98.1 ± 0.0	77.7 ± 0.4	84.3 ± 0.5	69.0 ± 0.1	48.9 ± 0.8	39.6 ± 0.1	67.1
MLDG	51.6 ± 0.1	98.0 ± 0.0	77.1 ± 0.4	84.8 ± 0.6	68.2 ± 0.1	46.1 ± 0.8	41.8 ± 0.4	66.8
CORAL	51.7 ± 0.1	98.1 ± 0.1	77.7 ± 0.5	86.0 ± 0.2	68.6 ± 0.4	46.4 ± 0.8	41.8 ± 0.2	67.2
MMD	51.8 ± 0.1	98.1 ± 0.0	76.7 ± 0.9	85.0 ± 0.2	67.7 ± 0.1	49.3 ± 1.4	39.4 ± 0.8	66.8
DANN	51.5 ± 0.3	97.9 ± 0.1	78.7 ± 0.3	84.6 ± 1.1	65.4 ± 0.6	48.4 ± 0.5	38.4 ± 0.0	66.4
C-DANN	51.9 ± 0.1	98.0 ± 0.0	78.2 ± 0.4	82.8 ± 1.5	65.6 ± 0.5	47.6 ± 0.8	38.9 ± 0.1	66.1

在使用现代架构、数据增强和仔细的超参数调整时，ERM 在所评估的数据集上达到了最先进的性能。
在数据集和配置条件相同的情况下，没有 DG 算法持续性地比 ERM 优越超过一个小幅度。
模型选择策略显著影响 DG 结果；在训练域上验证通常优于留一域，而 oracle（测试域）选择在改进空间方面仍有潜力。
DomainBed 提供了一个可扩展、可重复的框架来运行 DG 实验，添加新算法或数据集成本较低。
更大网络（如 ResNet-50）、更强的数据增强以及充分的超参数搜索共同解释了 ERM 的强劲表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。