QUICK REVIEW

[論文レビュー] Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge

Mohammad R. Rezaei, Reza Saadati Fard|ArXiv.org|Feb 18, 2025

Topic Modeling被引用数 3

ひとこと要約

AMG-RAGは医療知識グラフの構築と継続的更新を自動化し、LLMを用いた医療QAを補強する。MEDQAおよびMEDMCQAで強力な性能を達成しつつ、比較的小型のモデルを使用。

ABSTRACT

Large Language Models (LLMs) have significantly advanced medical question-answering by leveraging extensive clinical data and medical literature. However, the rapid evolution of medical knowledge and the labor-intensive process of manually updating domain-specific resources pose challenges to the reliability of these systems. To address this, we introduce Agentic Medical Graph-RAG (AMG-RAG), a comprehensive framework that automates the construction and continuous updating of medical knowledge graphs, integrates reasoning, and retrieves current external evidence, such as PubMed and WikiSearch. By dynamically linking new findings and complex medical concepts, AMG-RAG not only improves accuracy but also enhances interpretability in medical queries. Evaluations on the MEDQA and MEDMCQA benchmarks demonstrate the effectiveness of AMG-RAG, achieving an F1 score of 74.1 percent on MEDQA and an accuracy of 66.34 percent on MEDMCQA, outperforming both comparable models and those 10 to 100 times larger. Notably, these improvements are achieved without increasing computational overhead, highlighting the critical role of automated knowledge graph generation and external evidence retrieval in delivering up-to-date, trustworthy medical insights.

研究の動機と目的

医療QAを急速に進化する知識とともに最新状態に保つ課題に対処する。
Medical Knowledge Graphs (MKGs) の自動構築と継続的更新を実現する。
MKGsをRAGとChain-of-Thought推論と統合し、医療分野のQAを強化する。
フレームワークをMEDQAおよびMEDMCQAベンチマークで評価する。
推論オーバーヘッドを増やさずに効率性を示す。

提案手法

AMG-RAGを提案する：LLMエージェントと医療検索ツールを用いてMKGを構築する反復的パイプライン。
医療用語をKGノードとして表現し、信頼度スコアとともに関係を推定する。
BFS/DFSを用いて信頼度閾値でKGを探索し、各エンティティのチェーン・オブ・思考を生成する。
MKG由来の推論をRAGおよびPubMedSearch、WikiSearchなどの外部証拠検索と統合する。
MEDQA（F1）およびMEDMCQA（正解率）を大規模モデルと比較して評価する。
Neo4jにMKGを格納し、信頼度スコア付きの双方向関係を提供する。

実験結果

リサーチクエスチョン

RQ1自動MKG構築と動的更新は医療QAの精度と信頼性をどう向上させるか。
RQ2CoT推論と外部検索をKGベースのQAと統合することは標準的な医療QAベンチマークへどう影響するか。
RQ3MKGと検索ツールを活用することで、約8Bパラメータの小型モデルはMEDQAおよびMEDMCQAで大規模モデルを上回れるか。
RQ4信頼度スコアとグラフ探索戦略は回答の品質と解釈性にどう影響するか。

主な発見

Model	Model Size	Acc. (%)	F1 (%)	Fine-Tuned	Uses CoT	Uses Search
Med-Gemini	≈1800B	91.1	89.5	✓	✓	✓
GPT-4	≈1760B	90.2	88.7	✓	✓	✓
Med-PaLM 2	≈340B	85.4	82.1	✓	✓	✗
Med-PaLM 2 (5-shot)	≈340B	79.7	75.3	✗	✓	✗
AMG-RAG	≈8B	73.9	74.1	✗	✓	✓
Meerkat	≈7B	74.3	70.4	✓	✓	✗
Meditron	≈70B	70.2	68.3	✓	✓	✓
Flan-PaLM	≈540B	67.6	65.0	✓	✓	✗
LLAMA-2	≈70B	61.5	60.2	✓	✓	✗
Shakti-LLM	≈2.5B	60.3	58.9	✓	✗	✗
Codex 5-shot CoT	–	60.2	57.7	✗	✓	✓
BioMedGPT	≈10B	50.4	48.7	✓	✗	✗
BioLinkBERT (base)	–	40.0	38.4	✓	✗	✗
(Table 2) MedMCQA models - AMG-RAG	≈8B	66.34	–	–	–	–
Meditron (70B)	≈70B	66.0	–	–	–	–
Codex 5-shot	–	59.7	–	–	–	–
VOD	–	58.3	–	–	–	–
Flan-PaLM	≈540B	57.6	–	–	–	–
PaLM	≈540B	54.5	–	–	–	–
GAL	≈120B	52.9	–	–	–	–
PubMedBERT	–	40.0	–	–	–	–
SciBERT	–	39.0	–	–	–	–
BioBERT	–	38.0	–	–	–	–
BERT	–	35.0	–	–	–	–

AMG-RAGはMEDQAで74.1%のF1、MEDMCQAで66.34%の正解率を達成し、同等のモデルや10～100倍大きいモデルを上回る。
約8BパラメータのAMG-RAGは微調整や高い推論コストなしで、いくつかの大規模ベースラインと同等またはそれ以上を達成。
PubMedSearchとWikiSearchの導入は性能を向上させ、MEDQA実験ではPubMedSearchがWikiSearchを上回った。
CoTまたはKG統合を除くと精度とF1が大幅に低下し、構造化推論とドメイン特化検索の重要性を示す。
MKGはクエリから動的に構築され、外部証拠で更新されるため、最新かつドメイン特化の推論を可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。