QUICK REVIEW

[论文解读] Multi-Scale Representation Learning on Proteins

Vignesh Ram Somnath, Charlotte Bunne|arXiv (Cornell University)|Apr 4, 2022

Machine Learning in Materials Science被引用 22

一句话总结

HoloProt 构建一个两层的多尺度蛋白图（表面和结构），在不同尺度之间连接，学习整合表示，并展示在蛋白-配体结合亲和力回归和酶分类方面具有高参数效率，使用分子超像素实现内存节省。

ABSTRACT

Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein -- HoloProt -- connecting surface to structure and sequence. The surface captures coarser details of the protein, while sequence as primary component and structure -- comprising secondary and tertiary components -- capture finer details. Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level. We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification). On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the classification task, it achieves a performance close to the top-performing model while using 10x fewer parameters. To improve the memory efficiency of our construction, we segment the multiplex protein surface manifold into molecular superpixels and substitute the surface with these superpixels at little to no performance loss.

研究动机与目标

Motivate robust protein representations that capture sequence, structure, and surface information across scales.
Propose a multi-scale graph construction (surface and structure) linked by residue correspondences.
Develop a multi-scale encoder that propagates information from lower to higher scales.
Evaluate on protein-ligand binding affinity regression and enzyme-catalyzed reaction classification.
Demonstrate memory-efficient variants using molecular superpixels without significant performance loss.

提出的方法

Construct a two-layer protein graph: a surface graph G_S and a backbone/structure graph G_B.
Link surface and structure nodes via residue-aligned edges to enable cross-scale information flow.
Apply a separate Message Passing Neural Network (MPN) at each layer with inputs crafted per layer (surface features; residue-based structure features with averaged surface embeddings).
Aggregate structure-layer node representations to form a protein graph representation c_GP.
For ligands, use an MPN to obtain c_G Ligand and predict binding affinity via an MLP from concatenated protein and ligand representations.
For enzyme classification, predict enzyme class by feeding c_GP into an MLP for multi-class classification.
Introduce molecular superpixels on the protein surface to summarize features and reduce memory usage with minimal performance loss.

实验结果

研究问题

RQ1Can a multi-scale graph combining protein surface and structure improve predictive power over single-scale representations?
RQ2Do cross-scale connections enable residue-level encodings to reflect higher-level geometric and chemical properties?
RQ3Are molecular superpixels an effective memory-efficient surrogate for rich surface representations without sacrificing performance?
RQ4How does HoloProt perform on protein-ligand binding affinity regression across diverse dataset splits?
RQ5How does HoloProt fare in enzyme-catalyzed reaction classification compared to state-of-the-art methods?

主要发现

HoloProt achieves consistently strong performance on protein-ligand binding affinity prediction across scaffold and high-identity splits, outperforming most baselines on several splits.
On binding affinity, HoloProt with full surface input matches or exceeds baselines while using fewer parameters than many competitors.
On enzyme-catalyzed reaction classification, HoloProt attains competitive accuracy with substantially fewer parameters than sequence-based or larger structure-based models.
Using molecular superpixels maintains similar performance to full-surface variants, indicating effective motif capture with memory savings.
Ablation studies show multi-scale integration generally improves over single-scale (structure or surface) representations, and that the contribution of scales varies by task.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。