QUICK REVIEW

[논문 리뷰] Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka|arXiv (Cornell University)|2023. 08. 22.

Topic Modeling인용 수 13

한 줄 요약

짧은 요약: 본 논문은 다문서 질의응답(MD-QA)을 위한 Knowledge Graph Prompting(KGP)을 소개하고, 단락과 문서 구조 위에 지식 그래프를 구축하며 LLM 기반의 탐색 에이전트를 사용하여 문서 간 질문에 대한 맥락 증거를 검색한다.

ABSTRACT

The `pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in the scenario of multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of different documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. For graph construction, we create a knowledge graph (KG) over multiple documents with nodes symbolizing passages or document structures (e.g., pages/tables), and edges denoting the semantic/lexical similarity between passages or intra-document structural relations. For graph traversal, we design an LLM-based graph traversal agent that navigates across nodes and gathers supporting passages assisting LLMs in MD-QA. The constructed graph serves as the global ruler that regulates the transitional space among passages and reduces retrieval latency. Concurrently, the graph traversal agent acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality. Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design for LLMs. Our code: https://github.com/YuWVandy/KG-LLM-MDQA.

연구 동기 및 목표

Motivate MD-QA beyond open-domain QA by requiring cross-document reasoning and structured content understanding.
Propose a generally-applicableKG construction method that encodes lexical/semantic similarity and document structure relations.
Develop an LLM-guided graph traversal agent to adaptively retrieve relevant contexts.
Demonstrate that graph-based prompting improves MD-QA performance and retrieval efficiency across multiple datasets.

제안 방법

Construct knowledge graphs where nodes are passages or document structures (pages/tables) and edges encode lexical/semantic similarity or structural relations.
Augment graphs with structural nodes (pages, tables) and use markdown content for tables to aid LLM understanding.
Train or fine-tune an LLM-based graph traversal agent that, given visited passages, selects the next best neighbor to visit to approach the answer.
Employ instruction-finetuning to enhance the reasoning capability of the traversal agent to mitigate hallucinations.
Explore multiple KG construction strategies (TF-IDF, KNN-MDR, KNN-ST, TAGME) and compare their effectiveness and trade-offs.
Integrate the traversal process with a prompt design that uses the retrieved passages to answer MD-QA questions.

실험 결과

연구 질문

RQ1How can a knowledge graph over documents improve MD-QA prompting and retrieval compared to baseline methods?
RQ2What KG construction strategies best capture the necessary cross-document reasoning for MD-QA?
RQ3Can an LLM-guided KG traversal agent effectively navigate the graph to retrieve relevant context for answering questions?
RQ4How does incorporating document structures (pages/tables) influence MD-QA performance?
RQ5What are the performance and efficiency trade-offs as KG density and traversal strategies vary?

주요 결과

방법	HotpotQA 정확도	HotpotQA EM	HotpotQA F1	IIRC 정확도	IIRC EM	IIRC F1	2WikiMQA 정확도	2WikiMQA EM	2WikiMQA F1	MuSiQue 정확도	MuSiQue EM	MuSiQue F1	PDFTriage 구조-EM	w PDFTriage 정확도	w PDFTriage EM
None	41.80	19.00	30.50	19.50	8.60	13.17	44.40	18.60	25.07	30.40	4.60	10.58	0.00	8.53	9.00
KNN	71.57	40.73	57.97	43.82	25.15	37.24	52.40	31.20	42.13	44.70	18.86	30.04	–	7.00	7.33
TF-IDF	76.64	45.97	64.64	47.47	27.22	40.80	58.40	34.60	44.50	44.40	21.59	32.50	–	4.85	5.00
BM25	71.95	41.46	59.73	41.93	23.48	35.55	55.80	30.80	40.55	44.47	21.11	31.15	–	6.92	7.25
DPR	73.43	43.61	62.11	48.11	26.89	41.85	62.40	35.60	51.10	44.27	20.32	31.64	–	5.31	5.50
MDR	75.30	45.55	65.16	50.84	27.52	43.47	63.00	36.00	52.44	48.39	23.49	37.03	–	3.07	3.08
IRCoT	74.36	45.29	64.12	49.78	27.73	41.65	61.81	37.75	50.17	45.14	22.46	34.21	–	4.00	4.08
KGP-T5	76.53	46.51	66.77	48.28	26.94	41.54	63.50	39.80	53.50	50.92	27.90	41.19	67.00	2.69	2.75
Golden	82.19	50.20	71.06	62.68	35.64	54.76	72.60	40.20	59.69	57.00	30.60	47.75	100.00	1.00	1.00

KGP-T5 achieves top performance on MD-QA benchmarks, often outperforming baselines except for the Golden context.
MDR-based traversals and KGs tuned with domain-specific pretraining yield stronger results than generic embedding-based methods (DPR).
KGs incorporating structural nodes enable handling structural questions (e.g., differences between Page 1 and Page 2) with substantial Struct-EM gains (67% reported in Table 1).
GPT/LMM-based traversal agents significantly outperform random traversal and can surpass several baseline retrievers in accuracy and F1 across HotpotQA, 2WikiMQA, MuSiQue, and IIRC.
Trade-offs exist between KG density and retrieval latency: higher density improves EM/F1 but increases latency; a well-tuned branching factor is crucial for maximizing performance under a fixed context budget.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.