QUICK REVIEW

[논문 리뷰] CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models

Vijil Chenthamarakshan, Payel Das|arXiv (Cornell University)|2020. 04. 02.

Computational Drug Discovery Methods참고 문헌 65인용 수 36

한 줄 요약

CogMol은 적응적 사전 학습과 SMILES VAE 및 다속성 제어 샘플링을 활용하여 unseen SARS-CoV-2 단백질에 대해 새로운, 표적 특이적이며 오프타깃 선택적 약물 유사 분자를 생성하는 엔드-투-엔드 프레임워크로서, 인 실리코 독성, 합성 타당성, 도킹 스크리닝을 포함합니다.

ABSTRACT

The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme that uses guidance from attribute predictors trained on latent features. To generate novel and optimal drug-like molecules for unseen viral targets, CogMol leverages a protein-molecule binding affinity predictor that is trained using SMILES VAE embeddings and protein sequence embeddings learned unsupervised from a large corpus. CogMol framework is applied to three SARS-CoV-2 target proteins: main protease, receptor-binding domain of the spike protein, and non-structural protein 9 replicase. The generated candidates are novel at both molecular and chemical scaffold levels when compared to the training data. CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations. Docking reveals favorable binding of generated molecules to the target protein structure, where 87-95 % of high affinity molecules showed docking free energy < -6 kcal/mol. When compared to approved drugs, the majority of designed compounds show low parent molecule and metabolite toxicity and high synthetic feasibility. In summary, CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity, and does not need target-dependent fine-tuning of the framework or target structure information.

연구 동기 및 목표

SARS-CoV-2와 같은 새로운 바이러스 타깃에 대해 높은 친화도와 오프타깃 선택성을 갖춘 de novo 약물 설계를 동기화한다.
타깃 의존적 재훈련 없이 unseen 타깃으로 일반화할 수 있는 엔드-투-엔드 프레임워크를 개발한다.
분자 생성에서 다중 제약 제어(친화도, 선택성, 약물-유사성)를 통합한다.
독성, 합성 타당성 및 타깃-구조 도킹에 대한 인 실리코 스크리닝을 도입한다.
세 가지 SARS-CoV-2 타깃(NSP9, Mpro, RBD)과 암 타깃(HDAC1)에 대한 적용 가능성을 시연한다.

제안 방법

ZINC에서 QED와 SA 감독을 사용하여 SMILES 기반의 Variational Autoencoder(VAE)를 Adaptive하게 BindingDB까지 학습한다.
VAE 임베딩에서 잠재 공간 속성 예측기(QED, logP, SA)를 학습하고 사전 학습된 단백질 서열 임베딩을 사용하여 단백질-분자 결합 친화도 회귀 모델을 훈련한다.
사전학습된 UniRef50 단백질 임베딩을 사용하여 unseen 단백질에 대한 타깃 일반화를 가능하게 한다.
고 친화도, 고 선택성, 고 QED를 조건으로 분자를 생성하기 위해 Conditional Latent Space Sampling(CLaSS)을 적용한다.
다중작업 독성 예측기(MT-DNN), 합성 가능성 예측기, 도킹 시뮬레이션으로 생성된 분자를 스크리닝한다.
3D 타깃 포켓에 대한 결합 에너지를 평가하기 위해 도킹(Autodock Vina)을 수행하고 결합 에너지를 분석한다.
합성 가능성을 FDA 승인 약물과 비교하고 지문 기반 메트릭스 및 PubChem 매치를 통해 신규성을 평가한다.
약 3500개 수준의 CogMol 생성 분자를 공유하고 screening 및 분석을 위한 Molecule Explorer 도구를 제공한다.

실험 결과

연구 질문

RQ1CogMol가 unseen SARS-CoV-2 타깃 서열에 대해 고친화도와 표적 승인 없이도 새로운 약물-유사 분자를 생성할 수 있는가?
RQ2다중 제약 설정에서 타깃 친화도, 오프타깃 선택성, 약물-유사성(QED) 및 합성 가능성을 얼마나 잘 균형 잡을 수 있는가?
RQ3생성된 분자들이 실제 3D 포켓에 위치한 타깃 단백질에 대해 유리한 도킹 에너지로 결합하는가?
RQ4CogMol 생성 후보물이 FDA 승인 약물과 비교하여 합성 접근 가능하고 대사 산물에 대한 독성이 없는가?

주요 결과

CogMol은 세 가지 SARS-CoV-2 타깃(NSP9, Mpro, RBD)에 대해 약물화 가능한 주머니에 결합하는 새로운 분자를 생성했고, 도킹 자유 에너지가 유리한 경우가 많았다; 고 친화도 분자 중 87–95%가 도킹 자유 에너지 < -6 kcal/mol 이었다.
생성된 분자는 학습 데이터에 대해 높은 신규성을 보였고, 상당한 골격 신규성과 PubChem 엔트리와의 일부 매칭으로 생물학적 활성이 잠재될 수 있음을 시사한다.
제어 샘플링(CLaSS)은 모든 대상에서 친화도, QED, 선택성 기준을 충족하는 분자의 비율이 무작위 샘플링보다 높게 나타났다.
합성 가능성: COVID-19 타깃에 대한 CogMol 설계는 FDA 승인 약물보다 retrosynthetic feasibility가 높았고, 85–90% 이상 가능성에 비해 FDA 항목은 약 78% 수준이었다; HDAC1은 약 67%의 가능성을 보였다.
독성 스크리닝에서 대부분의 CogMol 산물 및 예측 대사물이 13개 중 0–1점에서 독성을 보였고 이는 FDA 승인 약물과 비슷한 수준으로 인실리코에서 안전 신호가 양호함을 시사한다.
대상별 미세 조정 없이도 대규모 비표지 코퍼스에서 학습한 단백질 임베딩을 사용해 unseen 타깃으로 일반화하는 능력을 시연한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.