[论文解读] ADCNet: a unified framework for predicting the activity of antibody-drug conjugates
ADCNet 是一个统一的深度学习框架,将蛋白质语言模型(ESM-2)和小分子模型(FG-BERT)结合起来,基于蛋白质序列、linker/payload SMILES 与 DAR 值预测 ADC 活性,在测试中表现出强劲的性能。
Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug. Nevertheless, the realization of rational design of ADC is very difficult because the relationship between their structures and activities is difficult to understand. In the present study, we introduce a unified deep learning framework called ADCNet to help design potential ADCs. The ADCNet highly integrates the protein representation learning language model ESM-2 and small-molecule representation learning language model FG-BERT models to achieve activity prediction through learning meaningful features from antigen and antibody protein sequences of ADC, SMILES strings of linker and payload, and drug-antibody ratio (DAR) value. Based on a carefully designed and manually tailored ADC data set, extensive evaluation results reveal that ADCNet performs best on the test set compared to baseline machine learning models across all evaluation metrics. For example, it achieves an average prediction accuracy of 87.12%, a balanced accuracy of 0.8689, and an area under receiver operating characteristic curve of 0.9293 on the test set. In addition, cross-validation, ablation experiments, and external independent testing results further prove the stability, advancement, and robustness of the ADCNet architecture. For the convenience of the community, we develop the first online platform (https://ADCNet.idruglab.cn) for the prediction of ADCs activity based on the optimal ADCNet model, and the source code is publicly available at https://github.com/idrugLab/ADCNet.
研究动机与目标
- 通过更好地将结构与活性联系起来,推动抗体-药物偶联物(ADCs)的理性设计。
- 开发一个整合蛋白质序列与小分子表征以预测 ADC 活性的统一模型。
- 提供一个在线平台和开源代码,以促进社区访问和可重复性。
提出的方法
- 将蛋白质表征学习与 ESM-2 相结合,以及小分子表征学习与 FG-BERT 相结合。
- 将抗原/抗体蛋白质序列、连接子与载荷的 SMILES 字符串,以及 DR 值 (DAR) 作为输入。
- 在精心设计的 ADC 数据集上进行训练与评估,包含消融研究和外部测试。
- 在测试集上使用准确性、平衡准确性和 AUC 来衡量性能。
- 在多项指标上将 ADCNet 与基线机器学习模型进行比较。
实验结果
研究问题
- RQ1将蛋白质与小分子嵌入结合的统一框架是否能够超越基线模型来改善 ADC 活性预测?
- RQ2模型在不同测试设置和外部数据上的泛化能力如何?
- RQ3每种输入模态(蛋白质、SMILES、DAR)对预测性能的贡献是多少?
主要发现
- ADCNet 在测试集上实现了平均准确率 87.12%。
- ADCNet 在测试集上的平衡准确度为 0.8689。
- ADCNet 在测试集上的 ROC 曲线下面积为 0.9293。
- 交叉验证、消融研究和外部测试支持 ADCNet 的鲁棒性和稳定性。
- 提供了一个在线平台,使用最优的 ADCNet 模型进行 ADC 活性预测;源代码公开可用。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。