Skip to main content
QUICK REVIEW

[论文解读] Generative Artificial Intelligence for Navigating Synthesizable Chemical Space

Wenhao Gao, Shitong Luo|arXiv (Cornell University)|Oct 4, 2024
Advanced Data Processing Techniques被引用 13
一句话总结

SynFormer 是一个基于 Transformer 的框架,能够生成可合成的合成路径,使在扩散式构建块选择和端到端可微分性下实现局部和全局可合成化学空间探索。

ABSTRACT

We introduce SynFormer, a generative modeling framework designed to efficiently explore and navigate synthesizable chemical space. Unlike traditional molecular generation approaches, we generate synthetic pathways for molecules to ensure that designs are synthetically tractable. By incorporating a scalable transformer architecture and a diffusion module for building block selection, SynFormer surpasses existing models in synthesizable molecular design. We demonstrate SynFormer's effectiveness in two key applications: (1) local chemical space exploration, where the model generates synthesizable analogs of a reference molecule, and (2) global chemical space exploration, where the model aims to identify optimal molecules according to a black-box property prediction oracle. Additionally, we demonstrate the scalability of our approach via the improvement in performance as more computational resources become available. With our code and trained models openly available, we hope that SynFormer will find use across applications in drug discovery and materials science.

研究动机与目标

  • 激发对以合成为中心的分子设计的需求,确保合成可行性。
  • 开发一个可扩展的生成框架,产出的是可合成的路径,而不仅仅是分子结构。
  • 利用带扩散模块的 transformer 主干来选择构建块和反应。
  • 展示局部(参考分子驱动)和全局(黑盒目标驱动)可合成化学空间的探索。

提出的方法

  • 用后缀记号表示合成路径,标记为 START、END、RXN 和 BB。
  • 使用 transformer 自回归地生成路径令牌,并在每一步对令牌类型进行分类。
  • 结合去噪扩散概率模块来预测构建块指纹并选择 BB。
  • 训练两个实现:SynFormer-D(仅解码器)和 SynFormer-ED(基于输入 SMILES 条件的编码–解码器)。
  • 在由 115 个反应模板和 223,244 个构建块组成的模拟空间上进行训练,扩展 Enamine REAL Space。
Figure 1: Schematic illustration of the SynFormer framework and architecture. (A) The SynFormer-ED architecture is an encoder-decoder that takes a molecule as input and outputs a synthetic route to the same or an analogous molecule. (B) SynFormer-D is a decoder-only framework designed to generate sy
Figure 1: Schematic illustration of the SynFormer framework and architecture. (A) The SynFormer-ED architecture is an encoder-decoder that takes a molecule as input and outputs a synthetic route to the same or an analogous molecule. (B) SynFormer-D is a decoder-only framework designed to generate sy

实验结果

研究问题

  • RQ1SynFormer 是否能够准确重构分子并覆盖一个大规模、可合成的化学空间?
  • RQ2SynFormer 是否能够为不可合成的输入生成可合成的类似物,同时保留关键特征?
  • RQ3SynFormer 是否能够在全球化学空间中导航以在尊重合成可行性的前提下优化性质?
  • RQ4SynFormer 作为变异算子或在强化学习引导的优化框架中表现如何?

主要发现

  • SynFormer-ED 在 REAL Space 上的重构率(66%)高于以往模型,并且相较 ChEMBL(20%)有所提升。
  • 模型性能(指纹 BCE)随模型规模和数据增加而提升;性能扩展需要更多数据和计算资源。
  • SynFormer-ED 能为不可合成设计生成可合成的类似物,同时在提高合成可及性的同时保持目标分数。
  • 经 RL(SF-RL)微调的 SynFormer-D 能将生成偏向于高得分的分子以提升 DRD2 结合,并在某些设置下优于若干方法。
  • 在 GraphGA(GraphGA-SF)中将 SynFormer-ED 作为变异算子,可在 GuacaMol 任务中实现具有竞争力的优化并提升合成可行性。
  • 该框架在确保存在合成路径的同时,能够实现局部空间投影、命中扩展和全局优化。
Figure 2: Model performance on molecular reconstruction. (A and B) Comparison of the reconstruction rate and average structural (Tanimoto) similarity between input and output molecules for SynFormer-ED, ChemProjector [ 67 ] , and SynNet [ 65 ] on 1,000 randomly selected molecules from (A) REAL Diver
Figure 2: Model performance on molecular reconstruction. (A and B) Comparison of the reconstruction rate and average structural (Tanimoto) similarity between input and output molecules for SynFormer-ED, ChemProjector [ 67 ] , and SynNet [ 65 ] on 1,000 randomly selected molecules from (A) REAL Diver

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。