QUICK REVIEW

[论文解读] DeepGS: Deep Representation Learning of Graphs and Sequences for Drug-Target Binding Affinity Prediction

Xuan Lin|arXiv (Cornell University)|Mar 31, 2020

Computational Drug Discovery Methods参考文献 34被引用 32

一句话总结

DeepGS 同时建模药物的局部化学环境与分子拓扑，以及蛋白序列，以在不需要3D结构的情况下预测药物-靶标结合亲和力，并优于若干基线。

ABSTRACT

Accurately predicting drug-target binding affinity (DTA) in silico is a key task in drug discovery. Most of the conventional DTA prediction methods are simulation-based, which rely heavily on domain knowledge or the assumption of having the 3D structure of the targets, which are often difficult to obtain. Meanwhile, traditional machine learning-based methods apply various features and descriptors, and simply depend on the similarities between drug-target pairs. Recently, with the increasing amount of affinity data available and the success of deep representation learning models on various domains, deep learning techniques have been applied to DTA prediction. However, these methods consider either label/one-hot encodings or the topological structure of molecules, without considering the local chemical context of amino acids and SMILES sequences. Motivated by this, we propose a novel end-to-end learning framework, called DeepGS, which uses deep neural networks to extract the local chemical context from amino acids and SMILES sequences, as well as the molecular structure from the drugs. To assist the operations on the symbolic data, we propose to use advanced embedding techniques (i.e., Smi2Vec and Prot2Vec) to encode the amino acids and SMILES sequences to a distributed representation. Meanwhile, we suggest a new molecular structure modeling approach that works well under our framework. We have conducted extensive experiments to compare our proposed method with state-of-the-art models including KronRLS, SimBoost, DeepDTA and DeepCPI. Extensive experimental results demonstrate the superiorities and competitiveness of DeepGS.

研究动机与目标

在不依赖 3D 结构或大量领域知识的前提下，推动准确的体算 DTA 预测。
提出一个端到端的框架，将药物的局部化学环境与拓扑信息以及靶标序列结合起来。
开发用于 SMILES 和氨基酸序列的新型基于嵌入的表示（Smi2Vec 和 Prot2Vec）。
将目标蛋白序列的 CNN、药物拓扑的 GAT 以及局部药物上下文的 BiGRU 整合起来以预测结合亲和力。

提出的方法

使用 Smi2Vec 编码 SMILES 序列以获得分布式原子表征。
在 SMILES 嵌入矩阵上用 BiGRU 建模药物的局部化学环境。
使用图注意网络（GAT）对 r 半径子图进行表示，并聚合为分子向量。
用 Prot2Vec 编码目标蛋白序列，并通过 CNN 处理以捕获局部上下文。
连接药物和靶标表征，通过多层全连接网络预测结合亲和力。
使用均方误差（MSE）损失在药物–靶标对上进行优化。

实验结果

研究问题

RQ1将局部化学环境和拓扑结构联合建模，是否能相比仅使用一种信息的方法提升 DTA 预测？
RQ2基于嵌入的表示（Smi2Vec/Prot2Vec）是否提高了对 SMILES 和氨基酸序列中功能上下文的捕获，从而改进 DTA？
RQ3在标准 DTA 基准测试（Davis 和 KIBA）及多种评估指标上，DeepGS 相对于最先进基线的表现如何？

主要发现

DeepGS 在 Davis 数据集上在 CI、MSE、r_m^2 和 AUPR 指标上超越 KronRLS、SimBoost、DeepCPI 和 DeepDTA。
在 KIBA 数据集上，DeepGS 在 CI 方面具竞争力，在 MSE、r_m^2 和 AUPR 方面优于基线。
消融研究表明去除局部化学上下文（Smi2Vec/Prot2Vec）会降低性能，验证上下文化嵌入的重要性。
实验结果表明同时结合局部上下文和拓扑药物信息在不同数据集上带来稳定的性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。