[Paper Review] Chemical-Reaction-Aware Molecule Representation Learning
MolR learns molecule embeddings by enforcing chemical-reaction equivalence in embedding space using a GNN encoder, achieving state-of-the-art results across reaction prediction, molecule property prediction, and GED tasks.
Molecule representation learning (MRL) methods aim to embed molecules into a real vector space. However, existing SMILES-based (Simplified Molecular-Input Line-Entry System) or GNN-based (Graph Neural Networks) MRL methods either take SMILES strings as input that have difficulty in encoding molecule structure information, or over-emphasize the importance of GNN architectures but neglect their generalization ability. Here we propose using chemical reactions to assist learning molecule representation. The key idea of our approach is to preserve the equivalence of molecules with respect to chemical reactions in the embedding space, i.e., forcing the sum of reactant embeddings and the sum of product embeddings to be equal for each chemical equation. This constraint is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings. Moreover, our model can use any GNN as the molecule encoder and is thus agnostic to GNN architectures. Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks, e.g., 17.4% absolute Hit@1 gain in chemical reaction prediction, 2.3% absolute AUC gain in molecule property prediction, and 18.5% relative RMSE gain in graph-edit-distance prediction, respectively, over the best baseline method. The code is available at https://github.com/hwwang55/MolR.
Motivation & Objective
- Motivate robust molecule representations that generalize across tasks by leveraging chemical reaction structure.
- Propose a reaction-equivalence constraint to organize the embedding space and enable reaction templates to emerge.
- Show that the method is agnostic to the choice of GNN encoder and improves multiple downstream tasks.
- Demonstrate strong empirical gains on chemical reaction prediction, molecule property prediction, and graph-edit-distance prediction.
- Visualize embeddings to illustrate reaction-awareness and structural encodings.
Proposed method
- Represent molecules as graphs with atom and bond features and encode them with a GNN-based molecular encoder.
- Impose a reaction-equivalence constraint: the sum of embeddings of reactants equals the sum of embeddings of products for each reaction.
- Train with a minibatch contrastive objective that pulls correct reactant-product sums together and pushes incorrect pairings apart (margin-based loss).
- Show that, with a summation readout, the constraint induces reaction templates that generalize to unseen reactions (Proposition 2).
- Use end-to-end training with various GNN backbones (GCN, GAT, SAGE, TAG) and evaluate on reaction prediction, property prediction, and GED tasks.
Experimental results
Research questions
- RQ1Can chemical reactions be used to regularize molecule embeddings to improve generalization across tasks?
- RQ2Do reaction constraints induce compositional embeddings and learnable reaction templates within GNN-based representations?
- RQ3How does MolR perform across reaction prediction, molecule property prediction, and graph-edit-distance prediction compared to baselines?
- RQ4Is MolR agnostic to the choice of GNN architecture while retaining performance gains?
Key findings
- MolR achieves 17.4% absolute Hit@1 gain in chemical reaction prediction over the best baseline.
- MolR achieves 2.3% absolute AUC gain on BBBP dataset for molecule property prediction.
- MolR achieves 18.5% relative RMSE gain in graph-edit-distance prediction over the best baseline.
- MolR variants with different GNNs (GCN, GAT, SAGE, TAG) all surpass baselines, with MolR-TAG often strongest.
- Even with only 1% of training data, MolR-TAG maintains strong performance, supporting few-shot generalization claims.
- Embedding visualizations show reaction-aware organization, correlation with molecule size and ring count, and learned reaction templates.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.