QUICK REVIEW

[论文解读] RGFN: Synthesizable Molecular Generation Using GFlowNets

Michał Koziarski, Andrei Rekesh|arXiv (Cornell University)|Jun 1, 2024

Machine Learning in Materials Science被引用 8

一句话总结

RGFN 将 GFlowNets 扩展到化学反应空间，在实现可合成性的同时达到可比的优化质量，并能够探索一个极大、成本低廉的搜索空间。它在对接和活性任务上展示了有效的配体生成，具备可扩展的片段库。

ABSTRACT

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

研究动机与目标

为药物发现规模化生成可合成分子这一挑战提供动机并给出解决方案。
提出 Reaction-GFlowNet (RGFN)，是在 GFlowNets 的扩展，通过预定义的一组反应和碎片进行采样，以保证可合成性。
设计领域特定的行动表示和训练组件，以扩展到大型碎片库。
在对接、代理活性和 senolytic 分类任务上证明 RGFN 的有效性，并与基线方法进行比较。

提出的方法

RGFN 通过按顺序选择初始碎片、反应模板和第二个碎片，然后进行体外反应以产生候选分子来生成分子。
正向策略使用图变换器 f 对分子和反应进行嵌入，产生选择碎片、反应和反应产物的行动概率。
行动嵌入包括机制 g(m_i)，通过指纹捕获碎片之间的结构相似性，从而提升对更大碎片库的可扩展性。
精选的 17 种高收率反应和 350 种可负担的构建块用于引导合成的可行性和成本效率。
引入反向策略以使中间步骤与可行的逆向合成路线对齐，确保连贯的生成路径。
该框架利用 RDKit RunReactants 进行反应仿真，并在奖励中混合合成-和成本感知的度量。

Figure 1 : Illustration of RGFN sampling process. At the beginning, the RGFN selects an initial molecular building block. In the next two steps, a reaction and a proper reactant are chosen. Then the in silico reaction is simulated with RDKit’s RunReactants functionality and one of the resulting mole

实验结果

研究问题

RQ1反应空间生成模型能否在规模上超过传统碎片库的可实现分子？
RQ2基于反应的 GFlowNets 是否在优化质量和多样性方面与非可合成性强制的基线方法具有竞争力？
RQ3扩展碎片库对学习效率和生成空间质量有何影响？
RQ4生成的配体在对接位姿上是否现实且在多个靶点上具有多样性？
RQ5基于指纹的行动嵌入对可扩展性和收敛性的影响？

主要发现

Task	Method	Mol. weight ↓	QED ↑	SAScore ↓	AiZynth ↑
sEH	GraphGA	528.6 ± 42.3	0.21 ± 0.06	3.87 ± 0.24	0.04
sEH	SyntheMol	411.1 ± 66.7	0.57 ± 0.18	2.85 ± 0.55	0.80
sEH	FGFN	473.4 ± 58.9	0.39 ± 0.13	3.43 ± 0.48	0.14
sEH	RGFN	495.2 ± 49.6	0.29 ± 0.10	3.09 ± 0.39	0.56
Senolytics	GraphGA	485.7 ± 75.6	0.09 ± 0.05	2.92 ± 0.26	0.05
Senolytics	SyntheMol	441.4 ± 83.5	0.48 ± 0.19	2.77 ± 0.40	0.53
Senolytics	FGFN	467.9 ± 57.3	0.41 ± 0.14	3.74 ± 0.54	0.01
Senolytics	RGFN	558.7 ± 62.8	0.21 ± 0.09	3.24 ± 0.32	0.58
ClpP	GraphGA	521.0 ± 31.8	0.32 ± 0.07	4.14 ± 0.51	0.00
ClpP	SyntheMol	458.2 ± 60.7	0.45 ± 0.16	2.86 ± 0.56	0.56
ClpP	FGFN	548.6 ± 42.9	0.22 ± 0.03	2.94 ± 0.54	0.25
ClpP	RGFN	526.2 ± 37.6	0.23 ± 0.04	2.83 ± 0.22	0.65

RGFN 的搜索空间比典型筛选库大数量级，并保持低合成成本。
RGFN 在平均奖励方面与基线方法相近，在若干任务中（包括 senolytic 发现）优于非可合成性强制的方法。
RGFN 在 Top-k 模式下产生高可合成性分数（可与 SyntheMol 相媲美）并给出实际的逆合成路线（AiZynthFinder），适用于各任务。
通过指纹基行动嵌入扩展碎片库可显著改善大行动空间的收敛和性能。
生成的配体在化学空间中形成现实的对接位姿并按靶标聚类，表明具有意义的多样性和靶向化学。
RGFN 在多个靶点（sEH、ClpP、Mpro）上的对接奖励下表现稳健，并为生成的化合物提供可行的合成路线。

Figure 2 : Estimation of the state space size of RGFN as a function of the maximum number of allowed reactions. RGFN (350) indicates a variant using 350 hand-picked inexpensive building blocks, while RGFN (8350) also uses 8,000 randomly selected Enamine building blocks. Enamine REAL (6.5B compounds)

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。