QUICK REVIEW

[Paper Review] Retrosynthesis Prediction with Conditional Graph Logic Network

Hanjun Dai, Chengtao Li|arXiv (Cornell University)|Jan 6, 2020

Machine Learning in Materials Science45 citations

TL;DR

The paper introduces the Conditional Graph Logic Network (GLN), a graph-neural-network–based probabilistic model that learns when retrosynthesis templates apply, achieving state-of-the-art single-step retrosynthesis accuracy with efficient hierarchical sampling and interpretable predictions.

ABSTRACT

Retrosynthesis is one of the fundamental problems in organic chemistry. The task is to identify reactants that can be used to synthesize a specified product molecule. Recently, computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities. Most existing approaches rely on template-based models that define subgraph matching rules, but whether or not a chemical reaction can proceed is not defined by hard decision rules. In this work, we propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks that learns when rules from reaction templates should be applied, implicitly considering whether the resulting reaction would be both chemically feasible and strategic. We also propose an efficient hierarchical sampling to alleviate the computation cost. While achieving a significant improvement of $8.1\%$ over current state-of-the-art methods on the benchmark dataset, our model also offers interpretations for the prediction.

Motivation & Objective

Address the single-step retrosynthesis problem by combining chemical reaction templates with neural reasoning.
Encode chemistry knowledge as logic rules and learn when to apply them through a probabilistic graphical model.
Improve scalability and interpretability over purely rule-based or purely neural approaches.
Provide an efficient training/inference framework using hierarchical sampling and graph embeddings.

Proposed method

Model retrosynthesis as a conditional graphical model over templates T and reactant sets R given product O, with p(T|O) and p(R|T,O).
Represent templates as logic rules with decomposition: match product centers o^T in O and match reactants r_i^T inside R via subgraph isomorphism.
Parameterize energy terms w1, w2 and phi functions with Graph Neural Networks to embed molecules and subgraphs (v1, v2, w2).
Decompose p(T|O) into p(o^T|O) and p({r^T}|O) to speed up learning and inference, with a tractable partition function Z(O) and hierarchical sampling.
Train via maximum likelihood with efficient gradient estimation using importance sampling that leverages the logic-driven sparsity.
Use beam search and caching strategies to accelerate prediction and provide interpretable reaction centers and templates.

Experimental results

Research questions

RQ1Can a conditional graphical model over reaction templates and reactants improve single-step retrosynthesis accuracy?
RQ2How can logic-rule–driven matching be integrated with neural embeddings to provide both interpretability and scalability?
RQ3What efficient inference techniques (e.g., hierarchical sampling, beam search) enable scalable learning on large template sets?

Key findings

GLN achieves significant improvement over state-of-the-art baselines on USPTO-50k, including an 8.1% top-1 accuracy gain in unknown reaction class settings.
GLN with reaction-class prior matches or exceeds performance of rule-based and neural seq2seq baselines across top-k metrics.
The method scales to large datasets (USPTO-full) and maintains competitive top-k accuracies against strong baselines.
The model provides interpretable predictions by visualizing reaction centers and subgraph pattern embeddings aligned with ground-truth cores.
Efficient inference via decomposed template modeling, caching, and hierarchical sampling yields feasible training (≈12 hours on a GTX 1080 Ti for USPTO-50k) and practical prediction times.
The framework supports optional conditioning on a known reaction type c through restricted template sets, enabling targeted retrosynthesis planning.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.