QUICK REVIEW

[论文解读] Nonlinear Invariant Risk Minimization: A Causal Approach

Chaochao Lu, Yuhuai Wu|arXiv (Cornell University)|Feb 24, 2021

Domain Adaptation and Few-Shot Learning参考文献 96被引用 26

一句话总结

提出 iCaRL，一种非线性不变量风险最小化框架，能够可识别地学习潜在原因并在一般指数族先验下实现对分布外（OOD）泛化。

ABSTRACT

Due to spurious correlations, machine learning systems often fail to generalize to environments whose distributions differ from the ones used at training time. Prior work addressing this, either explicitly or implicitly, attempted to find a data representation that has an invariant relationship with the target. This is done by leveraging a diverse set of training environments to reduce the effect of spurious features and build an invariant predictor. However, these methods have generalization guarantees only when both data representation and classifiers come from a linear model class. We propose invariant Causal Representation Learning (iCaRL), an approach that enables out-of-distribution (OOD) generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers). It builds upon a practical and general assumption: the prior over the data representation (i.e., a set of latent variables encoding the data) given the target and the environment belongs to general exponential family distributions. Based on this, we show that it is possible to identify the data representation up to simple transformations. We also prove that all direct causes of the target can be fully discovered, which further enables us to obtain generalization guarantees in the nonlinear setting. Extensive experiments on both synthetic and real-world datasets show that our approach outperforms a variety of baseline methods. Finally, in the discussion, we further explore the aforementioned assumption and propose a more general hypothesis, called the Agnostic Hypothesis: there exist a set of hidden causal factors affecting both inputs and outcomes. The Agnostic Hypothesis can provide a unifying view of machine learning. More importantly, it can inspire a new direction to explore a general theory for identifying hidden causal factors, which is key to enabling the OOD generalization guarantees.

研究动机与目标

在不同环境中推动对分布漂移和虚假相关性的鲁棒性。
提出一个通用的非线性框架（iCaRL），实现可识别性和分布外保证。
将 iVAE 拓展到一个通用的指数族先验，以捕捉潜在因子之间的依赖关系。
识别目标的直接原因并从它们学习不变预测器。
讨论不可知假设，作为跨监督学习、无监督学习和强化学习等表示学习的统一视角。

提出的方法

将 iVAE 扩展为一个通用的非因子化潜在先验，其中包含基于神经网络的依赖性（T_NN）以捕捉潜在依赖关系。
阶段1：使用数据 (O, Y, E) 训练 NF-iVAE，以通过先验参数的得分匹配将潜在变量 X 识别到置换/变换的程度。
阶段2：对潜在 X 进行成对和条件独立性检验，以发现目标的直接原因 Pa(Y)。
阶段3：使用 Pa(Y) 作为特征，在各环境上最小化风险来学习不变预测器，并通过近似最大后验推断（类似极大后验优化，方程式12）从 O 推断 Pa(Y) 在新环境中的值。
理论结果表明 X 在简单变换下的可识别性（定理1–3）以及对分布外泛化的保证（命题1）。
该框架依赖于假设1（因果图和不变性）和假设2（通用指数族先验）以实现可识别性和泛化。

实验结果

研究问题

RQ1在灵活的潜在先验下，非线性数据表征和分类器是否能够在不同环境中产生不变的预测？
RQ2在假设1和假设2 下，我们是否能够可识别地恢复目标的潜在原因并保证分布外泛化？
RQ3将 iVAE 扩展到通用指数族先验是否能够在非因子化先验之外实现可识别性？
RQ4是否可以通过独立性检验从推断出的潜在变量中可靠地发现目标的直接原因？
RQ5从发现的原因学习预测器在未见环境中是否能够实现鲁棒性能？

主要发现

在一般指数族先验下，iCaRL 能实现潜在变量 X 的可识别性，直至置换和简单变换。
可以使用独立性检验从识别出的潜在变量中完整发现目标的直接原因 Pa(Y)。
可以从 Pa(Y) 学习出在所有环境中均具有泛化能力的不变预测器，在非线性设置下提供 OOD 保证。
带有泛化先验的 NF-iVAE 在可识别地从 O, Y, E 中估计 X，为后续因果发现和不变预测提供支持。
提出了不可知假设，作为监督学习、无监督学习和强化学习等情境下表示学习的统一视角。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。