QUICK REVIEW

[论文解读] Deep Learning Based Regression and Multi-class Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction

Youjun Xu, Jianfeng Pei|arXiv (Cornell University)|Apr 16, 2017

Computational Drug Discovery Methods参考文献 37被引用 23

一句话总结

本研究提出一种深度学习框架 MGE-CNN，通过端到端分子图编码预测急性口服毒性（AOT）。其在外部测试集上达到最先进的性能，R² = 0.864，MAE = 0.195，同时实现自动特征学习，从学习到的表征中提取可解释的毒性药效团片段。

ABSTRACT

For quantitative structure-property relationship (QSPR) studies in chemoinformatics, it is important to get interpretable relationship between chemical properties and chemical features. However, the predictive power and interpretability of QSPR models are usually two different objectives that are difficult to achieve simultaneously. A deep learning architecture using molecular graph encoding convolutional neural networks (MGE-CNN) provided a universal strategy to construct interpretable QSPR models with high predictive power. Instead of using application-specific preset molecular descriptors or fingerprints, the models can be resolved using raw and pertinent features without manual intervention or selection. In this study, we developed acute oral toxicity (AOT) models of compounds using the MGE-CNN architecture as a case study. Three types of high-level predictive models: regression model (deepAOT-R), multi-classification model (deepAOT-C) and multi-task model (deepAOT-CR) for AOT evaluation were constructed. These models highly outperformed previously reported models. For the two external datasets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively.

研究动机与目标

开发一种高性能的深度学习模型，用于基于端到端分子表征的急性口服毒性（AOT）预测。
通过实现自动化学特征学习，克服传统分子描述符的局限性。
通过反向挖掘激活模式以识别化学上有意义的片段，提升黑箱深度学习模型的可解释性。
展示该框架在其他毒性及理化性质预测终点上的泛化能力。

提出的方法

提出一种分子图编码卷积神经网络（MGE-CNN），将二维分子结构视为无向图，其中原子为节点，化学键为边。
采用基于汇点的图编码策略，将分子图转换为固定大小的向量，以供深度学习模型输入。
训练三种模型：回归模型（deepAOT-R）、多分类分类模型（deepAOT-C）和多任务模型（deepAOT-CR），实现同时预测。
应用自动特征学习，通过反向挖掘学习到的滤波器，将神经元激活映射到化学亚结构。
从训练好的模型中提取深度指纹，以支持浅层机器学习系统，其预测能力优于传统指纹。
使用两个外部数据集验证模型性能，并通过将特征映射到已知结构警示（TAs）来比较可解释性。

实验结果

研究问题

RQ1基于端到端分子图编码的深度学习模型是否能超越现有的体外 AOT 预测方法？
RQ2深度神经网络中的自动特征学习在多大程度上可提取与毒性相关的化学可解释亚结构？
RQ3从学习表征中提取的深度指纹与传统分子指纹相比，在支持下游机器学习任务方面表现如何？
RQ4模型的内部表征是否能以高一致性反向映射到已知毒性药效团或结构警示（TAs）？
RQ5MGE-CNN 框架是否可泛化到急性口服毒性以外的其他化学终点预测？

主要发现

deepAOT-R 模型在测试集 I（1673 个化合物）上达到 R² = 0.864 和 MAE = 0.195，显著优于先前模型。
deepAOT-C 模型在测试集 I 上预测准确率达 95.5%，在测试集 II 上达 96.3%，表现出强大的泛化能力。
多任务 deepAOT-CR 模型在测试集 I 上达到 R² = 0.861 和 MAE = 0.204，分类准确率分别为 95.0%（测试集 I）和 94.1%（测试集 II）。
从模型中提取的深度指纹使共识 MLR 模型在包含 3,718 个化合物的大规模外部数据集上达到 PCC2 = 0.696 和 MAE = 0.348。
反向挖掘显示，模型中激活最强的特征与已知毒性药效团一致，8 个识别出的片段中有 8 个与已报道的结构警示（TAs）匹配。
该框架成功将模型激活映射到原子级片段，证明深度学习模型在无需预先化学知识的情况下，既具有高度预测性又具备可解释性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。