[論文レビュー] CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models
CogMol は、適応的事前学習を用いた SMILES VAE と多属性制御サンプリングを活用して、 unseen SARS-CoV-2 蛋白質に対する novel、ターゲット特異的、オフターゲット選択性を持つ薬物様分子を生成するエンドツーエンドのフレームワークで、計算による毒性評価、合成実現性、ドッキングスクリーニングを行う。
The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme that uses guidance from attribute predictors trained on latent features. To generate novel and optimal drug-like molecules for unseen viral targets, CogMol leverages a protein-molecule binding affinity predictor that is trained using SMILES VAE embeddings and protein sequence embeddings learned unsupervised from a large corpus. CogMol framework is applied to three SARS-CoV-2 target proteins: main protease, receptor-binding domain of the spike protein, and non-structural protein 9 replicase. The generated candidates are novel at both molecular and chemical scaffold levels when compared to the training data. CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations. Docking reveals favorable binding of generated molecules to the target protein structure, where 87-95 % of high affinity molecules showed docking free energy < -6 kcal/mol. When compared to approved drugs, the majority of designed compounds show low parent molecule and metabolite toxicity and high synthetic feasibility. In summary, CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity, and does not need target-dependent fine-tuning of the framework or target structure information.
研究の動機と目的
- Motivate de novo drug design for novel viral targets like SARS-CoV-2 with high affinity and off-target selectivity.
- Develop an end-to-end framework that can generalize to unseen targets without target-dependent retraining.
- Integrate multi-constraint control (affinity, selectivity, drug-likeness) in molecule generation.
- Incorporate in silico screening for toxicity, synthetic feasibility, and target-structure docking.
- Demonstrate applicability to three SARS-CoV-2 targets (NSP9, Mpro, RBD) and a cancer target (HDAC1).
提案手法
- Train a SMILES-based Variational Autoencoder (VAE) adaptively from ZINC to BindingDB with QED and SA supervision.
- Learn latent-space attribute predictors (QED, logP, SA) from VAE embeddings and train a protein-molecule binding affinity regressor using pre-trained protein sequence embeddings.
- Use pre-trained UniRef50 protein embeddings to enable target generalization to unseen proteins.
- Apply Conditional Latent Space Sampling (CLaSS) to generate molecules conditioned on high affinity, high selectivity, and high QED.
- Screen generated molecules with a multitask toxicity predictor (MT-DNN), a retrosynthesis-based synthetic feasibility predictor, and docking simulations against target structures.
- Perform in silico docking ( Autodock Vina) to assess binding to 3D target pockets and analyze binding energies.
- Compare synthetic feasibility against FDA-approved drugs and assess novelty via fingerprint-based metrics and PubChem matches.
- Share ~3500 CogMol-generated molecules and provide an interactive Molecule Explorer tool for screening and analysis.
実験結果
リサーチクエスチョン
- RQ1Can CogMol generate novel, drug-like molecules with high affinity to unseen SARS-CoV-2 target sequences without target-specific retraining?
- RQ2How well can the framework balance target affinity, off-target selectivity, drug-likeness (QED), and synthetic feasibility in a multi-constraint setting?
- RQ3Do generated molecules bind to the actual 3D pockets of target proteins with favorable docking energies?
- RQ4Are CogMol-generated candidates synthetically accessible and non-toxic to parent molecules or their metabolites compared to FDA-approved drugs?
主な発見
- CogMol generated novel molecules for three SARS-CoV-2 targets (NSP9, Mpro, RBD) that bind to druggable pockets with favorable docking energies; 87–95% of high-affinity molecules had docking free energy < -6 kcal/mol.
- Generated molecules show high novelty relative to training data, with substantial scaffold novelty and some matches to PubChem entries suggesting potential biological activity.
- Controlled sampling (CLaSS) yields higher fractions of molecules meeting affinity, QED, and selectivity criteria than random sampling across all three targets.
- Synthesizability: CogMol-designs for the COVID-19 targets show higher retrosynthetic feasibility than FDA-approved drugs, with >85–90% feasibility compared to ~78% for FDA items; HDAC1 shows ~67% feasibility.
- Toxicity screening indicates most CogMol products and predicted metabolites exhibit toxicity in 0–1 endpoints out of 13, comparable to FDA-approved drugs, indicating acceptable safety signals in silico.
- Demonstrates the ability to generalize to unseen targets using protein embeddings learned from large unlabeled corpora without target-specific fine-tuning.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。