[Paper Review] Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network
A template-free approach using Weisfeiler-Lehman networks identifies reaction centers and ranks candidate products, achieving about 84% and 78% accuracy on USPTO datasets, outperforming templates and running ~140x faster.
The prediction of organic reaction outcomes is a fundamental problem in computational chemistry. Since a reaction may involve hundreds of atoms, fully exploring the space of possible transformations is intractable. The current solution utilizes reaction templates to limit the space, but it suffers from coverage and efficiency issues. In this paper, we propose a template-free approach to efficiently explore the space of product molecules by first pinpointing the reaction center -- the set of nodes and edges where graph edits occur. Since only a small number of atoms contribute to reaction center, we can directly enumerate candidate products. The generated candidates are scored by a Weisfeiler-Lehman Difference Network that models high-order interactions between changes occurring at nodes across the molecule. Our framework outperforms the top-performing template-based approach with a 10\% margin, while running orders of magnitude faster. Finally, we demonstrate that the model accuracy rivals the performance of domain experts.
Motivation & Objective
- Motivate and address the challenge of predicting organic reaction outcomes without predefined reaction templates.
- Identify minimal reaction centers where graph edits occur to constrain the search space.
- Enumerate chemically feasible candidate products and rank them to select the true product.
Proposed method
- Represent molecules as labeled graphs and frame reactions as graph edits transforming reactants to products.
- Use a Weisfeiler-Lehman Network (WLN) to learn atom-level embeddings and predict pairwise atom reactivity scores.
- Incorporate a global attention mechanism to capture distal chemical effects on reaction centers.
- Select top K atom pairs to form a reaction center and enumerate feasible bond configurations within this center to generate candidates.
- Rank candidate products using a Weisfeiler-Lehman Difference Network (WLDN) that models higher-order interactions between difference vectors of reactants and candidates.
- Train end-to-end with a loss over predicted reactivities and a softmax-based ranking objective for candidates.
Experimental results
Research questions
- RQ1Can a template-free approach identify the reaction center efficiently for diverse organic reactions?
- RQ2Do WLN-based representations plus attention capture distal effects necessary for accurate reaction center prediction?
- RQ3Does enumerating candidates within the predicted reaction center and ranking them with WLDN outperform template-based methods in coverage and accuracy?
- RQ4How does the template-free method compare in speed and scalability to template-based approaches on large USPTO-derived datasets?
Key findings
- The global WLN model (with attention) improves reaction center identification over the local model, achieving high coverage (≥~90% for K=8) and better prediction of centers influenced by distal reagents.
- Candidate generation using the predicted reaction center yields a compact set of candidates (e.g., ~60 candidates on average at K=6) with coverage competitive to template-based methods but with far fewer templates.
- WLDN outperforms WLN in ranking accuracy, demonstrating that higher-order interactions between reaction-center differences improve product ranking.
- On USPTO-15K, WLDN(*) achieves 83.9% P@1, 93.2% P@3, and 95.2% P@5; on USPTO, WLDN(*) maintains strong performance with higher coverage and ranking than WLN.
- Human evaluation shows the model attains 69.1% accuracy on 80 reactions, surpassing average chemist performance in the study.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.