QUICK REVIEW

[论文解读] Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

Junde Wu, Jiayuan Zhu|arXiv (Cornell University)|Aug 8, 2024

Topic Modeling被引用 28

一句话总结

MedGraphRAG 引入一个基于图的检索增强生成管线，用于医疗领域的语言模型，通过构建分层医疗知识图谱和U-retrieve检索策略，实现有据可查、带来源引用且更安全的回答。

ABSTRACT

We introduce a novel graph-based Retrieval-Augmented Generation (RAG) framework specifically designed for the medical domain, called extbf{MedGraphRAG}, aimed at enhancing Large Language Model (LLM) capabilities for generating evidence-based medical responses, thereby improving safety and reliability when handling private medical data. Graph-based RAG (GraphRAG) leverages LLMs to organize RAG data into graphs, showing strong potential for gaining holistic insights from long-form documents. However, its standard implementation is overly complex for general use and lacks the ability to generate evidence-based responses, limiting its effectiveness in the medical field. To extend the capabilities of GraphRAG to the medical domain, we propose unique Triple Graph Construction and U-Retrieval techniques over it. In our graph construction, we create a triple-linked structure that connects user documents to credible medical sources and controlled vocabularies. In the retrieval process, we propose U-Retrieval which combines Top-down Precise Retrieval with Bottom-up Response Refinement to balance global context awareness with precise indexing. These effort enable both source information retrieval and comprehensive response generation. Our approach is validated on 9 medical Q\&A benchmarks, 2 health fact-checking benchmarks, and one collected dataset testing long-form generation. The results show that MedGraphRAG consistently outperforms state-of-the-art models across all benchmarks, while also ensuring that responses include credible source documentation and definitions. Our code is released at: https://github.com/MedicineToken/Medical-Graph-RAG.

研究动机与目标

解决医疗领域语言模型产生幻觉的风险，使回答以可验证的医疗来源为依据。
开发一个三层级的分层图，将用户数据、医学文献和 UMLS 词汇整合。
实现一个检索机制（U-retrieve），在全局上下文与对相关图段的高效访问之间取得平衡。
证明 MedGraphRAG 在不进行额外模型微调的情况下提升医疗问答基准。
展示为临床问题提供循证、基于来源的解释的能力。

提出的方法

Segment medical documents using a hybrid static-semantic chunking approach to capture context.
Extract entities from chunks and build a three-level graph: user documents, foundational medical books/papers, and UMLS-based terms.
Link entities into meta-graphs and merge them into a global graph based on semantic similarity.
Construct meta-graphs per data chunk and use U-retrieve to top-downly and bottom-up generate responses with source citations.

实验结果

研究问题

RQ1Can a three-tier hierarchical medical graph improve accuracy and reliability of LLM-generated medical answers without fine-tuning?
RQ2Does integrating user data, medical literature, and UMLS-grounded terms reduce hallucinations and improve grounding of medical assertions?
RQ3How does U-retrieve compare to other retrieval strategies in terms of retrieval accuracy and response quality?
RQ4What is the impact of hierarchical graph construction and advanced chunking on medical QA benchmarks?
RQ5Do MedGraphRAG outputs provide verifiable source-based explanations suitable for clinical use?

主要发现

模型	大小	开源	MedQA	MedMCQA	PubMedQA
LLaMA2	13B	yes	42.7	37.4	68.0
LLaMA2-MedGraphRAG	13B	yes	65.5	51.4	73.2
LLaMA2	70B	yes	43.7	35.0	74.3
LLaMA2-MedGraphRAG	70B	yes	69.2	58.7	76.0
LLaMA3	8B	yes	59.8	57.3	75.2
LLaMA3-MedGraphRAG	8B	yes	74.2	61.6	77.8
LLaMA3	70B	yes	72.1	65.5	77.5
LLaMA3-MedGraphRAG	70B	yes	88.4	79.1	83.8
Gemini-pro	-	no	59.0	54.8	69.8
Gemini-MedGraphRAG	-	no	72.6	62.0	76.2
GPT-4	-	no	81.7	72.4	75.2
GPT-4 MedGraphRAG	-	no	91.3	81.5	83.3
Human (expert)	-	-	87.0	90.0	78.0

MedGraphRAG significantly improves multiple medical QA benchmarks across various models (e.g., MedQA, MedMCQA, PubMedQA).
Smaller LLMs (e.g., LLaMA2-13B, LLaMA3-8B) gain notable gains, widening applicability beyond large models.
On GPT-4, MedGraphRAG achieves state-of-the-art results on MedQA and surpasses several fine-tuned baselines.
Responses include grounded citations and explanations of medical terms, enhancing reliability and interpretability.
Ablation studies show hybrid semantic chunking, hierarchical graph construction, and U-retrieve contribute to performance gains.

Figure 2: Compare to SOTA Medical LLM Models on MedQA benchmark.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。