QUICK REVIEW

[论文解读] RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine

Hasi Hays, William J. Richardson|arXiv (Cornell University)|Jan 31, 2026

Bioinformatics and Genomic Networks被引用 0

一句话总结

RAG-GNN 将图神经网络与来自生物医学文献的检索增强知识相结合，以提升精准医学中的功能解释和治疗靶点识别，与仅拓扑方法相比具有互补优势。

ABSTRACT

Network topology excels at structural predictions but fails to capture functional semantics encoded in biomedical literature. We present a retrieval-augmented generation (RAG) embedding framework that integrates graph neural network representations with dynamically retrieved literature-derived knowledge through contrastive learning. Benchmarking against ten embedding methods reveals task-specific complementarity: topology-focused methods achieve near-perfect link prediction (GCN: 0.983 AUROC), while RAG-GNN is the only method achieving positive silhouette scores for functional clustering (0.001 vs. negative scores for all baselines). Information-theoretic decomposition shows network topology contributes 77.3% of predictive information, while retrieved documents provide 8.6% unique information. Applied to cancer signaling networks (379 proteins, 3,498 interactions), the framework identifies DDR1 as a therapeutic target based on retrieved evidence of synthetic lethality with KRAS mutations. These results establish that topology-only and retrieval-augmented approaches serve complementary purposes: structural prediction tasks are solved by network topology alone, while functional interpretation uniquely benefits from retrieved knowledge.

研究动机与目标

推动将网络拓扑与检索文献整合，弥合生物医学预测中的结构-功能鸿沟。
开发一个检索增强的 GNN（RAG-GNN）框架，联合编码拓扑和外部知识。
量化检索文献相对于拓扑的独特预测贡献。
在癌症信号网络上演示该框架，并用文献支持验证一个治疗靶点（DDR1）。

提出的方法

开发一个联合嵌入框架，将基于 GNN 的拓扑嵌入与检索文献嵌入融合到共享语义空间中。
使用密集检索器基于语义相似性和质量加权相关性得分为每个节点选择前 k 条文献。
通过对文献嵌入进行注意力机制，从检索文献中计算上下文化的知识向量。
通过拼接或门控融合将结构嵌入和语义嵌入融合，得到最终的节点表征。
优化多目标损失，包括任务损失、检索损失和对比对齐损失（L_total = L_task + lambda1 L_retrieval + lambda2 L_contrastive）。
在链接预测、功能聚类和节点分类任务上基准对比十种嵌入方法；进行信息论和反事实分析以验证检索知识的内容。

Figure 1: RAG-GNN framework for precision medicine: Architecture overview. The framework integrates six interconnected components for knowledge-augmented biomedical prediction. (1) Biological network input: heterogeneous molecular interaction networks representing protein-protein interactions, signa

实验结果

研究问题

RQ1拓扑仅嵌入与检索增强嵌入对预测任务是否展现互补优势？
RQ2相较于网络拓扑，检索知识对预测性能的量化贡献是多少？
RQ3检索文献是否能提供独特且不冗余的信息以改进功能解释和靶点识别？
RQ4RAG-GNN 在时间分离的验证中表现如何，以模拟现实部署？
RQ5在容量相当的情况下，检索知识是否仍然必要，还是仅拓扑模型即可达到同等表现？

主要发现

拓扑贡献了预测信息的 77.3%，检索文献独特贡献为 8.6%。
RAG-GNN 在功能聚类上获得正向轮廓系数（0.001），而仅拓扑方法未能达到。
在癌症信号网络（379个蛋白质，3498条相互作用）中，DDR1 被识别为治疗靶点，并有检索所得的与 KRAS 突变的合成致死证据。
时间性验证的 AUROC 为 0.891，接近全语料 AUROC 0.912，表明对新靶点具有良好的泛化性。
容量匹配的拓扑仅 GNN（无检索）AUROC 为 0.847，而 RAG-GNN 为 0.912，支持超越容量的检索知识价值。
反事实检索实验表明当检索被降级时性能显著下降，说明检索内容确实带来实质性收益。

Figure 2: RAG-GNN architecture for precision medicine. The complete system integrates network topology encoding, knowledge retrieval, and context fusion through six main components. The forward pass (solid arrows) begins with the input network $\mathcal{G}^{(p)}=(\mathbf{A},\mathbf{X})$ representing

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。