QUICK REVIEW

[論文レビュー] RGFN: Synthesizable Molecular Generation Using GFlowNets

Michał Koziarski, Andrei Rekesh|arXiv (Cornell University)|Jun 1, 2024

Machine Learning in Materials Science被引用数 8

ひとこと要約

RGFN は GFlowNets を化学反応空間で動作させ、合成可能性を保証しつつ、最適化品質を同等に保ち、はるかに大きく低コストな探索空間を可能にします。スケーラブルなフラグメントライブラリを用いたドッキングおよび活性タスクで、リガンド生成の効果を示しています。

ABSTRACT

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

研究の動機と目的

薬剤発見のためのスケールで合成可能な分子を生成するという課題を動機づけ、対処する。
合成可能性を保証するように事前に定義された反応とフラグメントのセットを介してサンプリングする GFlowNets の拡張である Reaction-GFlowNet (RGFN) を提案する。
大規模なフラグメントライブラリへスケールするためのドメイン特化のアクション表現と訓練コンポーネントを設計する。
ベースラインと比較しつつ、ドッキング、プロキシベース活性、 senolytic 分類タスクで RGFN の有効性を示す。

提案手法

RGFN は初期フラグメント、反応テンプレート、2 番目のフラグメントを順次選択し、その後、候補分子を生成するためにインシリコ反応を実行して分子を生成する。
Forward policy はグラフトランスフォーマー f を用いて分子と反応を埋め込み、フラグメント、反応、反応生成物を選択するためのアクション確率を生成する。
アクション埋め込みには、指紋を介してフラグメント間の構造的類似性を捉えるメカニズム g(m_i) が含まれ、より大規模なフラグメントライブラリへのスケーラビリティを向上させる。
合成実現性とコスト効iciency を指示する、17 の高収率反応と 350 個の手頃なビルディングブロックのキュレーション済みセット。
中間ステップを実現可能なリトロ-合成ルートと整合させるために Backward policy を組み込み、整合性のある生成経路を保証する。
このフレームワークは反応シミュレーションに RDKit RunReactants を活用し、報酬内で合成とコストを考慮した指標を組み合わせる。

Figure 1 : Illustration of RGFN sampling process. At the beginning, the RGFN selects an initial molecular building block. In the next two steps, a reaction and a proper reactant are chosen. Then the in silico reaction is simulated with RDKit’s RunReactants functionality and one of the resulting mole

実験結果

リサーチクエスチョン

RQ1反応空間生成モデルは、従来のフラグメントベースのライブラリを凌ぐ規模で合成可能な分子を生み出せるか？
RQ2反応ベースの GFlowNets は、合成可能性の強制を行わないベースラインと比較して、競争力のある最適化品質と多様性を達成できるか？
RQ3フラグメントライブラリのスケールアップが学習効率と生成空間の質にどう影響するか？
RQ4生成されたリガンドはドッキングポーズとして現実的で、複数のターゲットにわたって多様か？
RQ5指紋ベースのアクション埋め込みがスケーラビリティと収束に及ぼす影響は？

主な発見

Task	Method	Mol. weight ↓	QED ↑	SAScore ↓	AiZynth ↑
sEH	GraphGA	528.6 ± 42.3	0.21 ± 0.06	3.87 ± 0.24	0.04
sEH	SyntheMol	411.1 ± 66.7	0.57 ± 0.18	2.85 ± 0.55	0.80
sEH	FGFN	473.4 ± 58.9	0.39 ± 0.13	3.43 ± 0.48	0.14
sEH	RGFN	495.2 ± 49.6	0.29 ± 0.10	3.09 ± 0.39	0.56
Senolytics	GraphGA	485.7 ± 75.6	0.09 ± 0.05	2.92 ± 0.26	0.05
Senolytics	SyntheMol	441.4 ± 83.5	0.48 ± 0.19	2.77 ± 0.40	0.53
Senolytics	FGFN	467.9 ± 57.3	0.41 ± 0.14	3.74 ± 0.54	0.01
Senolytics	RGFN	558.7 ± 62.8	0.21 ± 0.09	3.24 ± 0.32	0.58
ClpP	GraphGA	521.0 ± 31.8	0.32 ± 0.07	4.14 ± 0.51	0.00
ClpP	SyntheMol	458.2 ± 60.7	0.45 ± 0.16	2.86 ± 0.56	0.56
ClpP	FGFN	548.6 ± 42.9	0.22 ± 0.03	2.94 ± 0.54	0.25
ClpP	RGFN	526.2 ± 37.6	0.23 ± 0.04	2.83 ± 0.22	0.65

RGFN は、低い合成コストを維持しつつ、通常のスクリーニングライブラリより数オーダー大きい探索空間を生み出す。
RGFN はベースライン手法と同程度の平均報酬を達成し、セノリティック発見を含むいくつかのタスクで合成可能性を強制しないアプローチを上回る。
RGFN は高い合成可能性スコアを生成（SyntheMol に相当）し、Top-k モードで実用的なリトロ合成ルート（AiZynthFinder）を提供。
指紋ベースのアクション埋め込みを用いたフラグメントライブラリのスケーリングは、大規模なアクション空間に対する収束と性能を劇的に向上させる。
生成されたリガンドは現実的なドッキングポーズを形成し、化学空間でターゲットごとにクラスタリングされ、意味のある多様性とターゲット特異的な化学を示している。
RGFN は複数のターゲット（sEH、ClpP、Mpro）でドッキングベースの報酬とともに堅牢な性能を示し、生成化合物の実用的な合成ルートを提供。

Figure 2 : Estimation of the state space size of RGFN as a function of the maximum number of allowed reactions. RGFN (350) indicates a variant using 350 hand-picked inexpensive building blocks, while RGFN (8350) also uses 8,000 randomly selected Enamine building blocks. Enamine REAL (6.5B compounds)

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。