QUICK REVIEW

[論文レビュー] Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Joseph Gomes, Bharath Ramsundar|arXiv (Cornell University)|Mar 30, 2017

Computational Drug Discovery Methods参考文献 28被引用数 98

ひとこと要約

本論文は座標から原子レベルの相互作用を学習し、タンパク質-リガンド結合アフィニティを予測するエンドツーエンドの3D原子畳み込みニューラルネットワーク（ACNN）を提案し、PDBBindデータセットでベースラインと競合する、あるいはそれを上回る性能を示す。

ABSTRACT

Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.

研究の動機と目的

手動で調整された特徴量を用いず、薬物発見における結合アフィニティ予測の正確性を動機づける。
座標から原子間相互作用を学習する、微分可能で物理学に触発されたモデルを開発する。
構造ベースおよびリガンドベースのベースラインに対してPDBBind上でACNNを実証する。
この手法がより大規模な系へ一般化でき、競争力のある性能を維持できることを示す。

提案手法

原子畳み込み演算を導入する：原子タイプ畳み込みと半径プーリング。
カットオフ距離(12 Å)内の局所的な3D環境を捕捉するために隣接リストに基づく距離行列を構築する。
ACNN層を積み重ねて各原子のエネルギーを生成し、それを合計して分子全体のエネルギーを求め、サイズ依存性の予測を行う。
三つの重み共有レプリカ（複合体、タンパク質、リガンド）を用いた熱力学的結合サイクルを組み込み、ΔG_complexを予測する。
ADAMでエンドツーエンドに訓練し、PDBBind core/refined セットの分割（random、stratified、scaffold、temporal）を用いて100エポック訓練する。
ACNNをGRIDベース（GRID-RF、GRID-NN）、グラフ畳み込み（GCNN）、ECFPベースのベースラインと比較する。

実験結果

リサーチクエスチョン

RQ13Dでエンドツーエンドの微分可能なニューラルネットワークは、座標から直接原子レベルの相互作用を学習し、タンパク質-リガンド複合体の結合自由エネルギー（ΔG）を予測できるのか。
RQ2異なるデータ分割（random、stratified、scaffold、temporal）下で、ACNNの性能はPDBBindにおける最新の構造ベースおよびリガンドベースの手法とどの程度比較できるか。
RQ3ACNNはより大規模な系へ一般化し、結晶構造由来のデータノイズを扱いながら化学的精度を維持できるか。
RQ4正則化（例：dropout）とデータセット品質が、訓練/テスト分割を横断したACNNの一般化に与える影響は何か。

主な発見

ACNNはcoreデータのテストセットで平均絶対誤差が1 kcal/mol未満を達成し、分割を超えてGRID-RFと比較して同等またはそれ以上のPearson R^2を示す。
refinedデータセットでは、エンドツーエンド訓練を用いたACNNの性能はGRIDモデルと同等で、良い一般化を示す。 dropoutはテスト性能を向上させる。
リガンドベースのベースライン（GCNN、ECFPベース）は、タンパク質構造情報の欠如のため、構造認識分割では一般化が悪い。 scaffold分割では例外がある。
ACNNは、構造ベースの生物活性予測のための完全に微分可能でエンドツーエンドで学習された表現の可能性を示し、より大規模な系にも拡張できる。
著者らはデータ品質と正則化に対する感度を認め、coreデータセットでの過剰適合と、低品質の全PDBBindデータを使用した場合の性能劣化を指摘している。
データセット間の頑健性を高めるために、より高品質の構造と複数のリガンド立体構造を追加することを提案している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。