QUICK REVIEW

[논문 리뷰] RGFN: Synthesizable Molecular Generation Using GFlowNets

Michał Koziarski, Andrei Rekesh|arXiv (Cornell University)|2024. 06. 01.

Machine Learning in Materials Science인용 수 8

한 줄 요약

RGFN은 GFlowNets를 화학 반응 공간에서 작동하도록 확장하여 합성 가능성을 보장하면서 유사한 최적화 품질을 달성하고 훨씬 더 크고 저비용의 탐색 공간을 가능하게 한다. 확장 가능한 프래그먼트 라이브러리를 통해 도킹 및 활성 작업에서 효과적인 리간드 생성을 보여주며

ABSTRACT

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

연구 동기 및 목표

약물 발견을 위한 대규모로 합성 가능 분자를 생성하는 과제를 동기 부여하고 해결한다.
합성 가능성을 보장하기 위해 미리 정의된 반응 및 조각 세트를 통해 샘플링하는 GFlowNets의 확장인 Reaction-GFlowNet (RGFN)을 제안한다.
대규모 프래그먼트 라이브러리를 다룰 수 있도록 도메인 특화된 행동 표현 및 학습 구성 요소를 설계한다.
베이스라인과 비교하면서 도킹, 프록시 기반 활성, 및 senolytic 분류 작업에서 RGFN의 효과를 시연한다.

제안 방법

RGFN은 초기 프래그먼트, 반응 서식, 그리고 두 번째 프래그먼트를 순차적으로 선택한 다음, 가상 반응을 수행하여 후보 분자를 생성한다.
Forward 정책은 그래프 트랜스포머 f를 사용하여 분자와 반응을 임베딩하고, 프래그먼트 선택, 반응 및 반응 생성물 선택에 대한 행동 확률을 산출한다.
Action 임베딩은 지문(fingerprints)을 통해 프래그먼트 간 구조적 유사성을 포착하는 메커니즘 g(m_i)를 포함하여 더 큰 프래그먼트 라이브러리에 대한 확장성을 향상시킨다.
합성 가능성과 비용 효율성을 이끄는 17개의 고수율 반응과 350개의 저가 빌딩 블록의 선별된 세트가 사용된다.
중간 단계가 실행 가능 역합성 경로와 일치하도록 역방향 정책을 포함하여 일관된 생성 경로를 보장한다.
프레임워크는 반응 시뮬레이션을 위해 RDKit RunReactants를 활용하고 보상 내에서 합성 및 비용 인식 지표의 혼합을 사용한다.

Figure 1 : Illustration of RGFN sampling process. At the beginning, the RGFN selects an initial molecular building block. In the next two steps, a reaction and a proper reactant are chosen. Then the in silico reaction is simulated with RDKit’s RunReactants functionality and one of the resulting mole

실험 결과

연구 질문

RQ1반응 공간 생성 모델이 기존의 프래그먼트 기반 라이브러리를 능가하는 규모로 합성 가능 분자를 생성할 수 있는가?
RQ2합성 가능성 보장을 하지 않는 베이스라인과 비교했을 때 반응 기반 GFlowNets가 경쟁력 있는 최적화 품질과 다양성을 달성하는가?
RQ3프래그먼트 라이브러리의 확장이 학습 효율성과 생성 공간의 질에 어떤 영향을 미치는가?
RQ4생성된 리간드가 도킹 포즈에서 현실적이며 다수의 타깃에 걸쳐 다양성을 가지는가?
RQ5지문 기반 액션 임베딩이 확장성 및 수렴에 미치는 영향은 무엇인가?

주요 결과

작업	방법	Mol. weight ↓	QED ↑	SAScore ↓	AiZynth ↑
sEH	GraphGA	528.6 ± 42.3	0.21 ± 0.06	3.87 ± 0.24	0.04
sEH	SyntheMol	411.1 ± 66.7	0.57 ± 0.18	2.85 ± 0.55	0.80
sEH	FGFN	473.4 ± 58.9	0.39 ± 0.13	3.43 ± 0.48	0.14
sEH	RGFN	495.2 ± 49.6	0.29 ± 0.10	3.09 ± 0.39	0.56
Senolytics	GraphGA	485.7 ± 75.6	0.09 ± 0.05	2.92 ± 0.26	0.05
Senolytics	SyntheMol	441.4 ± 83.5	0.48 ± 0.19	2.77 ± 0.40	0.53
Senolytics	FGFN	467.9 ± 57.3	0.41 ± 0.14	3.74 ± 0.54	0.01
Senolytics	RGFN	558.7 ± 62.8	0.21 ± 0.09	3.24 ± 0.32	0.58
ClpP	GraphGA	521.0 ± 31.8	0.32 ± 0.07	4.14 ± 0.51	0.00
ClpP	SyntheMol	458.2 ± 60.7	0.45 ± 0.16	2.86 ± 0.56	0.56
ClpP	FGFN	548.6 ± 42.9	0.22 ± 0.03	2.94 ± 0.54	0.25
ClpP	RGFN	526.2 ± 37.6	0.23 ± 0.04	2.83 ± 0.22	0.65

RGFN은 일반적인 스크리닝 라이브러리보다 수십 배 큰 탐색 공간을 달성하면서도 합성 비용은 낮은 상태를 유지한다.
RGFN은 기준 방법과 유사한 평균 보상을 달성하고 senolytic 발견을 포함한 여러 작업에서 합성 가능성 강제 방법보다 우수하다.
RGFN은 Top-k 모드에서 합성 가능성 점수가 높고(SyntheMol에 비해 비슷한 수준) 실용적인 역합성 경로(AiZynthFinder)를 생성한다.
지문 기반 액션 임베딩으로 프래그먼트 라이브러리를 확장하면 큰 액션 공간에서 수렴성 및 성능이 크게 향상된다.
생성된 리간드는 현실적인 도킹 포즈를 형성하고 화학 공간에서 타깃별로 군집되며, 의미 있는 다양성과 타깃 특이적 화학성을 시사한다.
RGFN은 다중 타깃(sEH, ClpP, Mpro)에서 도킹 기반 보상으로 견고한 성능을 보이고 생성 화합물에 대해 실행 가능한 합성 경로를 제공한다.

Figure 2 : Estimation of the state space size of RGFN as a function of the maximum number of allowed reactions. RGFN (350) indicates a variant using 350 hand-picked inexpensive building blocks, while RGFN (8350) also uses 8,000 randomly selected Enamine building blocks. Enamine REAL (6.5B compounds)

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.