QUICK REVIEW

[논문 리뷰] Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Joseph Gomes, Bharath Ramsundar|arXiv (Cornell University)|2017. 03. 30.

Computational Drug Discovery Methods참고 문헌 28인용 수 98

한 줄 요약

이 논문은 좌표로부터 원자 수준 상호작용을 학습하여 단백질-리간드 결합 친화도(binding affinity)를 예측하는 엔드 투 엔드 3D 원자 컨볼루션 신경망(ACNN)을 소개하며, PDBBind 데이터셋에서 기준 모델과 경쟁하거나 이를 능가한다.

ABSTRACT

Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.

연구 동기 및 목표

약물 발굴에서 수동으로 조정된 특징 없이 정확한 결합 친화도 예측의 필요성을 제시한다.
좌표로부터 원자 간 상호작용을 학습하는 미분 가능하고 물리학에서 영감을 받은 모델을 개발한다.
구조 기반 및 리간드 기반 기준선과 비교하여 PDBBind에서 ACNN을 시연한다.
이 방법이 더 큰 시스템으로 일반화되고 경쟁력 있는 성능을 유지함을 보인다.

제안 방법

원자 유형 컨볼루션 및 방사형 풀링을 도입한다.
이웃 목록 기반 거리 행렬을 구축하여 컷오프(12 Å) 이내의 국부 3D 환경을 포착한다.
ACNN 계층을 쌓아 원자당 에너지를 생성하고 이를 합쳐 분자 총 에너지를 얻어 규모 확장 예측을 수행한다.
복합체, 단백질, 리간드의 세 개의 가중치 공유 복제본으로 열역학적 결합 사이클을 통합하여 ΔG_complex를 예측한다.
PDBBind core/refined 세트에서 (random, stratified, scaffold, temporal) 분할을 사용하고 ADAM으로 엔드 투 엔드 학습, 100 에포크.
ACNN을 GRID 기반(GRID-RF, GRID-NN), 그래프 컨볼루션(GCNN), 그리고 ECFP 기반 기준선과 비교한다.

실험 결과

연구 질문

RQ13D, 엔드-투-엔드 미분 가능 신경망이 좌표로부터 직접 원자 수준 상호작용을 학습하여 단백질-리간드 복합체의 결합 자유에너지 ΔG를 예측할 수 있는가?
RQ2다양한 데이터 분할(random, stratified, scaffold, temporal)에서 PDBBind에 대한 ACNN 성능은 최첨단 구조 기반 및 리간드 기반 방법과 어떻게 비교되는가?
RQ3ACNN이 더 큰 시스템으로 일반화되고 결정구조의 데이터 노이즈를 처리하면서 화학적 정확성을 유지하는가?
RQ4정규화(예: 드롭아웃) 및 데이터 세트 품질이 학습-테스트 분할 전반에 걸친 일반화에 어떤 영향을 미치는가?

주요 결과

ACNN은 코어 데이터의 테스트 세트에서 평균 절대 오차가 1 kcal/mol 미만이며, 다양한 분할에서 GRID-RF에 비해 Pearson R^2가 동등하거나 더 좋다.
정제된 데이터 세트에서 ACNN 성능은 end-to-end 학습이 좋은 일반화를 제공하는 GRID 모델과 비교 가능; 드롭아웃은 테스트 성능을 향상시킨다.
리간드 기반 기준선(GCNN, ECFP 기반)은 단백질 구조 정보를 활용하지 못해 구조 인식 분할에서 일반화가 더 떨어지며, 스캐폴드 분할에서 한 예외가 있다.
ACNN은 구조 기반 생물활성 예측에 대한 완전 미분 가능하고 엔드-투-엔드로 학습된 표현의 잠재력을 보여주며 더 큰 시스템으로 확장 가능하다.
저자는 데이터 품질 및 정규화에 대한 민감성을 인정하며 핵심 데이터에서 과적합이 발생하고 전체 PDBBind 데이터의 저품질 사용 시 성능 저하를 관찰한다.
그들은 데이터 세트 간 강건성을 향상시키기 위해 더 높은 품질의 구조와 여러 리간드 구형(conformations)을 추가하는 것을 제안한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.