Skip to main content
QUICK REVIEW

[论文解读] Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

Junde Wu, Jiayuan Zhu|arXiv (Cornell University)|Aug 8, 2024
Topic Modeling被引用 28
一句话总结

MedGraphRAG 引入一个基于图的检索增强生成管线,用于医疗领域的语言模型,通过构建分层医疗知识图谱和U-retrieve检索策略,实现有据可查、带来源引用且更安全的回答。

ABSTRACT

We introduce a novel graph-based Retrieval-Augmented Generation (RAG) framework specifically designed for the medical domain, called extbf{MedGraphRAG}, aimed at enhancing Large Language Model (LLM) capabilities for generating evidence-based medical responses, thereby improving safety and reliability when handling private medical data. Graph-based RAG (GraphRAG) leverages LLMs to organize RAG data into graphs, showing strong potential for gaining holistic insights from long-form documents. However, its standard implementation is overly complex for general use and lacks the ability to generate evidence-based responses, limiting its effectiveness in the medical field. To extend the capabilities of GraphRAG to the medical domain, we propose unique Triple Graph Construction and U-Retrieval techniques over it. In our graph construction, we create a triple-linked structure that connects user documents to credible medical sources and controlled vocabularies. In the retrieval process, we propose U-Retrieval which combines Top-down Precise Retrieval with Bottom-up Response Refinement to balance global context awareness with precise indexing. These effort enable both source information retrieval and comprehensive response generation. Our approach is validated on 9 medical Q\&A benchmarks, 2 health fact-checking benchmarks, and one collected dataset testing long-form generation. The results show that MedGraphRAG consistently outperforms state-of-the-art models across all benchmarks, while also ensuring that responses include credible source documentation and definitions. Our code is released at: https://github.com/MedicineToken/Medical-Graph-RAG.

研究动机与目标

  • 解决医疗领域语言模型产生幻觉的风险,使回答以可验证的医疗来源为依据。
  • 开发一个三层级的分层图,将用户数据、医学文献和 UMLS 词汇整合。
  • 实现一个检索机制(U-retrieve),在全局上下文与对相关图段的高效访问之间取得平衡。
  • 证明 MedGraphRAG 在不进行额外模型微调的情况下提升医疗问答基准。
  • 展示为临床问题提供循证、基于来源的解释的能力。

提出的方法

  • Segment medical documents using a hybrid static-semantic chunking approach to capture context.
  • Extract entities from chunks and build a three-level graph: user documents, foundational medical books/papers, and UMLS-based terms.
  • Link entities into meta-graphs and merge them into a global graph based on semantic similarity.
  • Construct meta-graphs per data chunk and use U-retrieve to top-downly and bottom-up generate responses with source citations.
Figure 1: MedGraphRAG framework.
Figure 1: MedGraphRAG framework.

实验结果

研究问题

  • RQ1Can a three-tier hierarchical medical graph improve accuracy and reliability of LLM-generated medical answers without fine-tuning?
  • RQ2Does integrating user data, medical literature, and UMLS-grounded terms reduce hallucinations and improve grounding of medical assertions?
  • RQ3How does U-retrieve compare to other retrieval strategies in terms of retrieval accuracy and response quality?
  • RQ4What is the impact of hierarchical graph construction and advanced chunking on medical QA benchmarks?
  • RQ5Do MedGraphRAG outputs provide verifiable source-based explanations suitable for clinical use?

主要发现

模型大小开源MedQAMedMCQAPubMedQA
LLaMA213Byes42.737.468.0
LLaMA2-MedGraphRAG13Byes65.551.473.2
LLaMA270Byes43.735.074.3
LLaMA2-MedGraphRAG70Byes69.258.776.0
LLaMA38Byes59.857.375.2
LLaMA3-MedGraphRAG8Byes74.261.677.8
LLaMA370Byes72.165.577.5
LLaMA3-MedGraphRAG70Byes88.479.183.8
Gemini-pro-no59.054.869.8
Gemini-MedGraphRAG-no72.662.076.2
GPT-4-no81.772.475.2
GPT-4 MedGraphRAG-no91.381.583.3
Human (expert)--87.090.078.0
  • MedGraphRAG significantly improves multiple medical QA benchmarks across various models (e.g., MedQA, MedMCQA, PubMedQA).
  • Smaller LLMs (e.g., LLaMA2-13B, LLaMA3-8B) gain notable gains, widening applicability beyond large models.
  • On GPT-4, MedGraphRAG achieves state-of-the-art results on MedQA and surpasses several fine-tuned baselines.
  • Responses include grounded citations and explanations of medical terms, enhancing reliability and interpretability.
  • Ablation studies show hybrid semantic chunking, hierarchical graph construction, and U-retrieve contribute to performance gains.
Figure 2: Compare to SOTA Medical LLM Models on MedQA benchmark.
Figure 2: Compare to SOTA Medical LLM Models on MedQA benchmark.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。