QUICK REVIEW

[논문 리뷰] Reinforced Genetic Algorithm for Structure-based Drug Design

Tianfan Fu, Wenhao Gao|arXiv (Cornell University)|2022. 11. 28.

Computational Drug Discovery Methods인용 수 27

한 줄 요약

RGA는 구조 기반 약물 설계에서 도킹 기반 최적화를 향상시키기 위해 3D 표적-리간드 구조로 안내되는 강화학습과 유전 알고리즘을 통합하며, 사전학습과 표적 간 지식 전달이 성능과 견고성을 향상시킨다.

ABSTRACT

Structure-based drug design (SBDD) aims to discover drug candidates by finding molecules (ligands) that bind tightly to a disease-related protein (targets), which is the primary approach to computer-aided drug discovery. Recently, applying deep generative models for three-dimensional (3D) molecular design conditioned on protein pockets to solve SBDD has attracted much attention, but their formulation as probabilistic modeling often leads to unsatisfactory optimization performance. On the other hand, traditional combinatorial optimization methods such as genetic algorithms (GA) have demonstrated state-of-the-art performance in various molecular optimization tasks. However, they do not utilize protein target structure to inform design steps but rely on a random-walk-like exploration, which leads to unstable performance and no knowledge transfer between different tasks despite the similar binding physics. To achieve a more stable and efficient SBDD, we propose Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps and suppress random-walk behavior. The neural models take the 3D structure of the targets and ligands as inputs and are pre-trained using native complex structures to utilize the knowledge of the shared binding physics from different targets and then fine-tuned during optimization. We conduct thorough empirical studies on optimizing binding affinity to various disease targets and show that RGA outperforms the baselines in terms of docking scores and is more robust to random initializations. The ablation study also indicates that the training on different targets helps improve performance by leveraging the shared underlying physics of the binding processes. The code is available at https://github.com/futianfan/reinforced-genetic-algorithm.

연구 동기 및 목표

구조 기반 약물 설계에서 전통적 GA의 비효율성과 불안정성을 단백질 구조 정보를 도입하여 해결한다.
강화 학습을 가능하게 하기 위해 진화 과정을 진화적 마코프 결정 프로세스(EMDP)로 재정식화한다.
3D 구조 데이터를 사용하여 교차 및 돌연변이를 안내하기 위한 타깃-리간드 등변 신경망을 개발한다.
자연 단백질-리간드 복합체에서 모델을 사전 학습하고 타깃 간 지식 전달을 가능하게 하여 공유 결합 물리를 포착한다.
SARS-CoV-2 주효소를 포함한 여러 질병 타깃에서 도킹 점수와 견고성을 향상시키는 것을 입증한다.

제안 방법

GA를 집단 수준 상태와 도킹 점수 기반 보상으로 진화적 마코프 결정 프로세스(EMDP)로 모델링한다.
교차를 안내하기 위해 두 개의 정책 네트워크를, 돌연변이를 안내하기 위해 두 개의 정책 네트워크를 사용한다(부모 선택의 두 단계와 반응 선택).
타깃-리간드 복합체를 처리하고 행동 확률을 출력하기 위해 E(3)-등변 신경망을 사용한다.
CrossDocked2020 데이터를 사용하여 3D 결합 친화성 작업에서 ENN을 사전 학습하여 공유 결합 물리를 포착하고 최적화 중에 미세 조정한다.
정책 그래디언트(REINFORCE)를 사용하여 기대 도킹 점수 개선을 최대화하도록 정책을 최적화한다.
도킹 오라클로 AutoDock Vina를 사용하고 합성 가능성을 보장하기 위해 화학적으로 의미 있는 단일(uni-) 및 이중(bi-) 분자 반응으로 돌연변이를 설계하여 합성 가능성을 보장한다.

실험 결과

연구 질문

RQ1강화 학습으로 안내된 GA가 도킹 점수 최적화에서 기초 구조 기반 설계 방법을 능가할 수 있는가?
RQ2표적 구조 정보를 활용하는 것이 무작위성을 줄이고 여러 실행 간 견고성을 향상시키는가?
RQ3자연 복합체에서의 사전 학습과 타깃 간 지식 전달이 SBDD의 성능을 향상시키는가?
RQ4장거리 교차를 포함하는 것이 로컬 수정에 초점을 맞춘 RL 방법과 비교하여 최적화에 어떤 영향을 미치는가?

주요 결과

RGA는 평가된 타깃 전반에서 최고 TOP-100, TOP-10, TOP-1 도킹 점수를 달성한다.
RGA는 다섯 번의 독립 실행에서 분산이 감소하여 무작위 워크 동작이 억제되었음을 나타낸다.
지식 전달과 다양한 타깃에 대한 사전 학습은 상위-k 도킹 점수의 성능을 더욱 향상시킨다.
Autogrow 4.0과 비교하여 학습된 행동 가이드라인과 장거리 탐색으로 더 우수한 도킹 성능을 제공한다.
장거리 교차 탐색은 로컬 수정에 초점을 맞춘 RL 방법보다 우수하며 구조 정보를 이용한 탐색의 이점을 보여준다.
이 방법은 QED와 SA 점수도 경쟁력을 유지하여 합리적인 구조 품질과 합성 가능성을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.