Skip to main content
QUICK REVIEW

[论文解读] Context-aware Graph Causality Inference for Few-Shot Molecular Property Prediction

Van Thuy Hoang, O. Lee|arXiv (Cornell University)|Jan 16, 2026
Advanced Graph Neural Networks被引用 0
一句话总结

CaMol 引入一个上下文感知的因果框架,使用上下文图、原子掩蔽和回门调整来识别用于少样本分子性质预测的因果子结构,从而提高准确性和可解释性。

ABSTRACT

Molecular property prediction is becoming one of the major applications of graph learning in Web-based services, e.g., online protein structure prediction and drug discovery. A key challenge arises in few-shot scenarios, where only a few labeled molecules are available for predicting unseen properties. Recently, several studies have used in-context learning to capture relationships among molecules and properties, but they face two limitations in: (1) exploiting prior knowledge of functional groups that are causally linked to properties and (2) identifying key substructures directly correlated with properties. We propose CaMol, a context-aware graph causality inference framework, to address these challenges by using a causal inference perspective, assuming that each molecule consists of a latent causal structure that determines a specific property. First, we introduce a context graph that encodes chemical knowledge by linking functional groups, molecules, and properties to guide the discovery of causal substructures. Second, we propose a learnable atom masking strategy to disentangle causal substructures from confounding ones. Third, we introduce a distribution intervener that applies backdoor adjustment by combining causal substructures with chemically grounded confounders, disentangling causal effects from real-world chemical variations. Experiments on diverse molecular datasets showed that CaMol achieved superior accuracy and sample efficiency in few-shot tasks, showing its generalizability to unseen properties. Also, the discovered causal substructures were strongly aligned with chemical knowledge about functional groups, supporting the model interpretability.

研究动机与目标

  • Motivate few-shot molecular property prediction (MPP) and the need to leverage functional-group causality.
  • Propose CaMol to discover causal substructures by integrating chemical priors via a context graph.
  • Disentangle causal substructures from confounders using learnable atom masking and a distribution-based backdoor intervention.
  • Align discovered substructures with chemical knowledge to improve interpretability and transferability.

提出的方法

  • Construct a context graph encoding functional groups, molecules, and properties within each episode.
  • Decompose molecules into BRICS-based functional groups and learn contextual representations via a GNN encoder.
  • Introduce a learnable atom masking mechanism to separate causal substructures C from confounding S.
  • Apply a distribution intervention with backdoor adjustment to estimate P(Y|do(C)) by marginalizing over S using chemically grounded confounders.
  • Optimize a total loss combining causal prediction loss, KL divergence to a uniform prior over S, and a variance/invariance term across interventional subgraphs.
  • Use MAML-style meta-training with inner-loop causal updates and outer-loop evaluation to encourage few-shot generalization.
Figure 1: (a) The seen properties are relevant to the unseen property prediction. (b) The causal substructures vary and depend on molecular property prediction tasks.
Figure 1: (a) The seen properties are relevant to the unseen property prediction. (b) The causal substructures vary and depend on molecular property prediction tasks.

实验结果

研究问题

  • RQ1How can a context graph bridging functional groups, molecules, and properties improve few-shot molecular property prediction?
  • RQ2Can learnable atom masking effectively disentangle causal substructures from confounding substructures in molecular graphs?
  • RQ3Does backdoor-adjusted distribution intervention improve robustness to confounders across molecules and properties?
  • RQ4Do discovered causal substructures align with chemical knowledge and enhance model interpretability?

主要发现

  • CaMol achieves superior accuracy across six MoleculeNet datasets in few-shot settings versus strong baselines.
  • Discovered causal substructures show strong alignment with known functional groups and support interpretability.
  • The framework demonstrates strong sample efficiency, particularly on high-diversity and imbalanced datasets (e.g., MUV, PCBA).
  • Backdoor-adjusted causal inference with context guidance yields more robust predictions than models relying on molecule–property relations alone.
  • The approach provides faithful, model-consistent explanations for predicted properties.
Figure 2: Causal relationships between variables in MPP.
Figure 2: Causal relationships between variables in MPP.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。