QUICK REVIEW

[论文解读] A Graph to Graphs Framework for Retrosynthesis Prediction

Chence Shi, Minkai Xu|arXiv (Cornell University)|Mar 28, 2020

Advanced Graph Neural Networks参考文献 35被引用 74

一句话总结

G2Gs 是一种模板自由的 retrosynthesis（反向合成）模型，通过反应中心识别与变分图翻译，将目标分子图翻译为反应物图，取得接近模板基方法的强一阶性能并具良好可扩展性。

ABSTRACT

A fundamental problem in computational chemistry is to find a set of reactants to synthesize a target molecule, a.k.a. retrosynthesis prediction. Existing state-of-the-art methods rely on matching the target molecule with a large set of reaction templates, which are very computationally expensive and also suffer from the problem of coverage. In this paper, we propose a novel template-free approach called G2Gs by transforming a target molecular graph into a set of reactant molecular graphs. G2Gs first splits the target molecular graph into a set of synthons by identifying the reaction centers, and then translates the synthons to the final reactant graphs via a variational graph translation framework. Experimental results show that G2Gs significantly outperforms existing template-free approaches by up to 63% in terms of the top-1 accuracy and achieves a performance close to that of state-of-the-art template based approaches, but does not require domain knowledge and is much more scalable.

研究动机与目标

激发反向合成预测并解决基于模板的方法的局限性（成本和泛化能力）。
提出一个在分子图上运行的模板自由的图到图框架（G2Gs）。
识别反应中心以创建合成单元并用变分图翻译器将其翻译为反应物。
通过图生成框架中的潜在变量捕捉预测的多模态性和多样性。
展示在 USPTO-50k 数据集上的可扩展性和相对于基线的竞争性能。

提出的方法

将分子表示为图，并通过建立在关系型GCN上的反应中心评分网络来识别反应中心。
通过断开反应中心将产物拆分为合成单元，并使用带潜在变量 z 的变分图翻译模型将每个 synthon 翻译为反应物。
通过自回归地产生以 z 和 S 为条件的图变换动作来对分布 P(G|S) 进行建模。
使用带高斯近似后验 q(z|G,S) 的摊销变分目标（ELBO）来训练翻译器。
在推理阶段使用束搜索以生成多样且有效的反应物图，并缓解暴露偏差。

实验结果

研究问题

RQ1在没有领域特定反应模板的情况下，模板自由的基于图的模型是否能达到具有竞争力的反向合成准确率？
RQ2从产物图有效识别反应中心以将反向合成分解为 synthon 级翻译的效果如何？
RQ3变分图翻译模块在保持化学有效性的同时，是否能够捕捉给定 synthon 的多模态可能反应物分布？
RQ4G2Gs 在 USPTO-50k 上的可扩展性与性能，与基于模板的方法及其他模板自由方法相比如何？

主要发现

Methods	Top-1 %	Top-3 %	Top-5 %	Top-10 %
Seq2seq	37.4	52.4	57.0	61.7
G2Gs	61.0	81.3	86.0	88.7
Retrosim	52.9	73.8	81.2	88.1
Neuralsym	55.3	76.0	81.4	85.1
GLN	64.2	79.1	85.2	90.0

G2Gs 在 USPTO-50k 的 Top-1 准确率上最高可比同类模板自由基线高出 63%。
G2Gs 接近或达到最先进的基于模板的方法的水平，而不依赖领域知识。
反应中心识别在准确率上表现出色，尤其是在已知反应类别时（Top-1 90.2%）；未知类别下也很强（Top-1 75.8%）。
变分图翻译获得较高的 Top-k 准确率（例如已知反应类别：Top-1 66.8%，Top-5 91.5%，Top-10 93.9%）。
潜在变量使得反应物生成多样且有效，通过给定 synthon 的多种合理翻译来证明。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。