[论文解读] Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs
CIGA 提出一个以因果为灵感的框架,用于提取保持标签因果信息的不变子图,从而在没有环境标签的情况下实现对图分类的可证明的OOD泛化。它引入基于 FIIF/PIIF 的因果模型和两个实用目标,以识别不变子图并提升OOD鲁棒性,在包括 DrugOOD 在内的16个数据集上取得强劲经验结果。
Despite recent success in using the invariance principle for out-of-distribution (OOD) generalization on Euclidean data (e.g., images), studies on graph data are still limited. Different from images, the complex nature of graphs poses unique challenges to adopting the invariance principle. In particular, distribution shifts on graphs can appear in a variety of forms such as attributes and structures, making it difficult to identify the invariance. Moreover, domain or environment partitions, which are often required by OOD methods on Euclidean data, could be highly expensive to obtain for graphs. To bridge this gap, we propose a new framework, called Causality Inspired Invariant Graph LeArning (CIGA), to capture the invariance of graphs for guaranteed OOD generalization under various distribution shifts. Specifically, we characterize potential distribution shifts on graphs with causal models, concluding that OOD generalization on graphs is achievable when models focus only on subgraphs containing the most information about the causes of labels. Accordingly, we propose an information-theoretic objective to extract the desired subgraphs that maximally preserve the invariant intra-class information. Learning with these subgraphs is immune to distribution shifts. Extensive experiments on 16 synthetic or real-world datasets, including a challenging setting -- DrugOOD, from AI-aided drug discovery, validate the superior OOD performance of CIGA.
研究动机与目标
- 激发图上的OOD泛化及来自结构层和属性层变迁的挑战。
- 提出 CIGA,在 FIIF/PIIF SCM 下识别编码标签原因的不变子图。
- 提供信息理论目标与实际实现以提取不变子图。
- 从理论上证明并在多样化图数据集上经验验证OOD保证。
提出的方法
- 将GNN分解为特征提取器g,它提取不变子图G_c,以及从G_c预测Y的分类器f_c。
- 构建信息理论目标,最大化I(G_c_hat; Y),并加入使G_c_hat与环境E独立的约束。
- 通过加入基于互信息的约束来最小化G_s信息向G_c_hat泄漏,同时保持预测能力,推出CIGA v2。
- 用对比学习目标(式5)近似互信息项,以便实际训练。
- 提供理论保证(定理3.1),表明对所提目标的解在FIIF/PIIF SCM下会产生不变的GNN。
- 讨论使用可解释的GNN架构和铰链式损失等实际实现选择,以强制信息界限。
实验结果
研究问题
- RQ1如何将不变性原理扩展到图数据,以保证OOD泛化?
- RQ2我们是否可以识别编码标签原因且与环境变化无关的不变子图G_c?
- RQ3如何在没有环境标签情况下将信息理论目标落地,以提取不变结构?
- RQ4提出的CIGA目标是否在多样的图变动和真实世界数据集上给出可证明的OOD保证?
主要发现
| 方法 | SPMotif-Struc_bias0.33 | SPMotif-Struc_bias0.60 | SPMotif-Struc_bias0.90 | SPMotif-Mixed_bias0.33 | SPMotif-Mixed_bias0.60 | SPMotif-Mixed_bias0.90 | 均值 |
|---|---|---|---|---|---|---|---|
| ERM | 59.49 (3.50) | 55.48 (4.84) | 49.64 (4.63) | 58.18 (4.30) | 49.29 (8.17) | 41.36 (3.29) | 52.24 |
| ASAP | 64.87 (13.8) | 64.85 (10.6) | 57.29 (14.5) | 66.88 (15.0) | 59.78 (6.78) | 50.45 (4.90) | 60.69 |
| DIR | 58.73 (11.9) | 48.72 (14.8) | 41.90 (9.39) | 67.28 (4.06) | 51.66 (14.1) | 38.58 (5.88) | 51.14 |
| IRM | 57.15 (3.98) | 61.74 (1.32) | 45.68 (4.88) | 58.20 (1.97) | 49.29 (3.67) | 40.73 (1.93) | 52.13 |
| V-Rex | 54.64 (3.05) | 53.60 (3.74) | 48.86 (9.69) | 57.82 (5.93) | 48.25 (2.79) | 43.27 (1.32) | 51.07 |
| EIIL | 56.48 (2.56) | 60.07 (4.47) | 55.79 (6.54) | 53.91 (3.15) | 48.41 (5.53) | 41.75 (4.97) | 52.73 |
| IB-IRM | 58.30 (6.37) | 54.37 (7.35) | 45.14 (4.07) | 57.70 (2.11) | 50.83 (1.51) | 40.27 (3.68) | 51.10 |
| CNC | 70.44 (2.55) | 66.79 (9.42) | 50.25 (10.7) | 65.75 (4.35) | 59.27 (5.29) | 41.58 (1.90) | 59.01 |
| CIGA v1 | 71.07 (3.60) | 63.23 (9.61) | 51.78 (7.29) | 74.35 (1.85) | 64.54 (8.19) | 49.01 (9.92) | 62.33 |
| CIGA v2 | 77.33 (9.13) | 69.29 (3.06) | 63.41 (7.38) | 72.42 (4.80) | 70.83 (7.54) | 54.25 (5.38) | 67.92 |
| Oracle (IID) | - | 88.70 (0.17) | - | - | 88.73 (0.25) | - | - |
- CIGA 在16个合成与真实世界数据集上显著提升了OOD性能,相较于现有方法。
- CIGA v2 达到显著提升,例如在报道的实验中对比方法的平均改进。
- 框架提供理论保证,在FIIF/PIIF SCM下会产生不变的GNN(定理3.1)。
- 类比对比的近似使在没有环境标签的情况下对互信息目标的实际优化成为可能。
- CIGA 在具挑战性的DrugOOD设置中优于强基线。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。