[论文解读] Evaluating Protein Transfer Learning with TAPE
本论文在五个下游任务中基准自监督预训练用于蛋白质序列表示,显示大多数模型有提升,但没有单一胜出者,且对齐基于特征在某些结构任务中仍然强势。
Machine learning applied to protein sequences is an increasingly popular area of research. Semi-supervised learning for proteins has emerged as an important paradigm due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce the Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. We curate tasks into specific training, validation, and test splits to ensure that each task tests biologically relevant generalization that transfers to real-life scenarios. We benchmark a range of approaches to semi-supervised protein representation learning, which span recent work as well as canonical sequence learning techniques. We find that self-supervised pretraining is helpful for almost all models on all tasks, more than doubling performance in some cases. Despite this increase, in several cases features learned by self-supervised pretraining still lag behind features extracted by state-of-the-art non-neural techniques. This gap in performance suggests a huge opportunity for innovative architecture design and improved modeling paradigms that better capture the signal in biological sequences. TAPE will help the machine learning community focus effort on scientifically relevant problems. Toward this end, all data and code used to run these experiments are available at https://github.com/songlab-cal/tape.
研究动机与目标
- 阐明需要一个标准化的、多任务的蛋白质表示学习基准的必要性。
- 创建 Tasks Assessing Protein Embeddings (TAPE),包含五个多样且与生物学相关的下游任务。
- 在统一的数据划分上评估多种神经网络架构和自监督损失。
- 量化自监督预训练在何时有帮助,以及传统的对齐特征在何处优于学习得到的表征。
提出的方法
- 挑选五个下游任务,涵盖结构预测、进化理解和蛋白质工程。
- 使用无标签的 Pfam 序列进行自监督预训练,目标包括 next-token 和 masked-token(以及一种蛋白特定变体)。
- 评估三种架构(LSTM、Transformer、ResNet)以及两种先前的自监督方法(Bepler、Alley)和基线(one-hot、alignment features)。
- 在每个下游任务上使用标准化的监督架构微调预训练表征。
- 使用与任务相关的指标(准确率、精度、Spearman’s rho)比较性能,并分析分布外泛化。
实验结果
研究问题
- RQ1自监督预训练是否在多个下游任务上提升蛋白质表征的质量?
- RQ2不同架构(transformer、LSTM、ResNet)在跨任务的迁移表现上有何差异?
- RQ3在某些结构相关任务上,基于对齐的特征是否仍优于学习得到的表征?
- RQ4是否存在在所有任务中都稳定获胜的单一模型,还是多任务基准测试不可或缺?
- RQ5预训练与否对分布外泛化(held-out families)的影响如何?
主要发现
| 方法 | SS | 接触 | 同源性 | 荧光 | 稳定性 |
|---|---|---|---|---|---|
| Transformer No Pretrain | 0.70 | 0.32 | 0.09 | 0.22 | -0.06 |
| LSTM No Pretrain | 0.71 | 0.19 | 0.12 | 0.21 | 0.28 |
| ResNet No Pretrain | 0.70 | 0.20 | 0.10 | -0.28 | 0.61 |
| Transformer Pretrain | 0.73 | 0.36 | 0.21 | 0.68 | 0.73 |
| LSTM Pretrain | 0.75 | 0.39 | 0.26 | 0.67 | 0.69 |
| ResNet Pretrain | 0.75 | 0.29 | 0.17 | 0.21 | 0.73 |
| Supervised Bepler LSTM | 0.73 | 0.40 | 0.17 | 0.33 | 0.64 |
| UniRep mLSTM | 0.73 | 0.34 | 0.23 | 0.67 | 0.73 |
| Baseline One-hot | 0.69 | 0.29 | 0.09 | 0.14 | 0.19 |
| Alignment | 0.80 | 0.64 | 0.09 | N/A | N/A |
- 自监督预训练在几乎所有模型、几乎所有任务上提升了性能。
- 架构性能是任务相关的;没有单一模型在所有任务上都占优。
- 非深度、基于对齐的特征在二级结构和接触预测任务上超过学习表示,而学习表示在远程同源性检测方面表现出色。
- 在荧光和稳定性任务中,预训练模型显示显著提升,但在某些结构任务中,基于对齐的信号仍可能占优。
- 结果强调像 TAPE 这样的多任务基准的价值,以及对持续改进架构和训练的重要性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。