QUICK REVIEW

[论文解读] Inverse folding for antibody sequence design using deep learning

Frédéric A. Dreyer, Daniel Cutting|arXiv (Cornell University)|Oct 30, 2023

Monoclonal and Polyclonal Antibodies Research被引用 19

一句话总结

作者对 ProteinMPNN 进行微调，生成 AbMPNN，一种抗体特异的 inverse folding 模型，通过在 SAbDab 和 OAS 派生数据上训练，并在结构预测和 Rosetta 接口能量评估中评估，提升序列恢复和设计性，特别是对 CDR-H3。

ABSTRACT

We consider the problem of antibody sequence design given 3D structural information. Building on previous work, we propose a fine-tuned inverse folding model that is specifically optimised for antibody structures and outperforms generic protein models on sequence recovery and structure robustness when applied on antibodies, with notable improvement on the hypervariable CDR-H3 loop. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters. Finally, we consider the applications of our model to drug discovery and binder design and evaluate the quality of proposed sequences using physics-based methods.

研究动机与目标

Motivate and address antibody sequence design from 3D backbone structures.
Develop an antibody-specific inverse folding model by fine-tuning ProteinMPNN on antibody data.
Assess improvements in residue recovery, designability, and interface stability for antibodies.
Examine canonical CDR loop encoding and germline distribution compatibility.
Provide model weights for downstream antibody design and drug discovery applications.

提出的方法

Adapt ProteinMPNN architecture with structured graph neural networks and order-agnostic decoding for antibodies.
Fine-tune on two antibody datasets: SAbDab antigen-binding fragments and OAS paired heavy/light chains predicted by ABodyBuilder2.
Use CD-HIT clustering to create non-redundant training/validation/test splits and ensure dataset separation.
Train with Adam optimizer; reduce learning rate when validation loss stalls (OAS: factor 10 after 10 epochs; SAbDab: factor 10 after 10 epochs).
Evaluate designability via self-consistency RMSD after structure prediction with ABodyBuilder2; assess interface energy via Rosetta; measure residue recovery across CDRs; analyze canonical clustering with SCALOP.

Figure 1: Overview of an antibody structure and its domains.

实验结果

研究问题

RQ1Does antibody-specific fine-tuning improve amino acid sequence recovery in antibody CDR loops compared to generic ProteinMPNN?
RQ2How does AbMPNN affect designability as measured by structure self-consistency and Rosetta interface energy?
RQ3Can AbMPNN better recover canonical CDR loop conformations and align with germline distributions in antibodies?
RQ4What is the impact of antibody-focused training on framework and CDR-region validity and annotation with ANARCI?
RQ5How well can the model support antibody drug discovery and binder design applications?

主要发现

AbMPNN shows about a 20% improvement in median CDR-H3 RMSD self-consistency compared to ProteinMPNN.
40% of AbMPNN sequences are within 5 kcal/mol interface energy of the native sequence, vs 20.5% for ProteinMPNN.
Sequence recovery across CDR loops improves to ~60% with AbMPNN, versus ~40% with ProteinMPNN.
AbMPNN achieves large improvement in recovering the canonical cluster for non-H3 CDRs.
All AbMPNN predictions are recognized as antibodies by ANARCI, while ProteinMPNN predictions include 16.8% non-annotatable sequences due to framework errors.
Germline matching distribution improves with AbMPNN, reducing ANARCI annotation failures observed with ProteinMPNN.

Figure 2: Schematic representation of the data processing steps and model architecture.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。