Skip to main content
QUICK REVIEW

[논문 리뷰] Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka|arXiv (Cornell University)|2023. 08. 22.
Topic Modeling인용 수 13
한 줄 요약

짧은 요약: 본 논문은 다문서 질의응답(MD-QA)을 위한 Knowledge Graph Prompting(KGP)을 소개하고, 단락과 문서 구조 위에 지식 그래프를 구축하며 LLM 기반의 탐색 에이전트를 사용하여 문서 간 질문에 대한 맥락 증거를 검색한다.

ABSTRACT

The `pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in the scenario of multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of different documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. For graph construction, we create a knowledge graph (KG) over multiple documents with nodes symbolizing passages or document structures (e.g., pages/tables), and edges denoting the semantic/lexical similarity between passages or intra-document structural relations. For graph traversal, we design an LLM-based graph traversal agent that navigates across nodes and gathers supporting passages assisting LLMs in MD-QA. The constructed graph serves as the global ruler that regulates the transitional space among passages and reduces retrieval latency. Concurrently, the graph traversal agent acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality. Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design for LLMs. Our code: https://github.com/YuWVandy/KG-LLM-MDQA.

연구 동기 및 목표

  • Motivate MD-QA beyond open-domain QA by requiring cross-document reasoning and structured content understanding.
  • Propose a generally-applicableKG construction method that encodes lexical/semantic similarity and document structure relations.
  • Develop an LLM-guided graph traversal agent to adaptively retrieve relevant contexts.
  • Demonstrate that graph-based prompting improves MD-QA performance and retrieval efficiency across multiple datasets.

제안 방법

  • Construct knowledge graphs where nodes are passages or document structures (pages/tables) and edges encode lexical/semantic similarity or structural relations.
  • Augment graphs with structural nodes (pages, tables) and use markdown content for tables to aid LLM understanding.
  • Train or fine-tune an LLM-based graph traversal agent that, given visited passages, selects the next best neighbor to visit to approach the answer.
  • Employ instruction-finetuning to enhance the reasoning capability of the traversal agent to mitigate hallucinations.
  • Explore multiple KG construction strategies (TF-IDF, KNN-MDR, KNN-ST, TAGME) and compare their effectiveness and trade-offs.
  • Integrate the traversal process with a prompt design that uses the retrieved passages to answer MD-QA questions.

실험 결과

연구 질문

  • RQ1How can a knowledge graph over documents improve MD-QA prompting and retrieval compared to baseline methods?
  • RQ2What KG construction strategies best capture the necessary cross-document reasoning for MD-QA?
  • RQ3Can an LLM-guided KG traversal agent effectively navigate the graph to retrieve relevant context for answering questions?
  • RQ4How does incorporating document structures (pages/tables) influence MD-QA performance?
  • RQ5What are the performance and efficiency trade-offs as KG density and traversal strategies vary?

주요 결과

방법HotpotQA 정확도HotpotQA EMHotpotQA F1IIRC 정확도IIRC EMIIRC F12WikiMQA 정확도2WikiMQA EM2WikiMQA F1MuSiQue 정확도MuSiQue EMMuSiQue F1PDFTriage 구조-EMw PDFTriage 정확도w PDFTriage EMw PDFTriage F1
None41.8019.0030.5019.508.6013.1744.4018.6025.0730.404.6010.580.008.539.00
KNN71.5740.7357.9743.8225.1537.2452.4031.2042.1344.7018.8630.047.007.33
TF-IDF76.6445.9764.6447.4727.2240.8058.4034.6044.5044.4021.5932.504.855.00
BM2571.9541.4659.7341.9323.4835.5555.8030.8040.5544.4721.1131.156.927.25
DPR73.4343.6162.1148.1126.8941.8562.4035.6051.1044.2720.3231.645.315.50
MDR75.3045.5565.1650.8427.5243.4763.0036.0052.4448.3923.4937.033.073.08
IRCoT74.3645.2964.1249.7827.7341.6561.8137.7550.1745.1422.4634.214.004.08
KGP-T576.5346.5166.7748.2826.9441.5463.5039.8053.5050.9227.9041.1967.002.692.75
Golden82.1950.2071.0662.6835.6454.7672.6040.2059.6957.0030.6047.75100.001.001.00
  • KGP-T5 achieves top performance on MD-QA benchmarks, often outperforming baselines except for the Golden context.
  • MDR-based traversals and KGs tuned with domain-specific pretraining yield stronger results than generic embedding-based methods (DPR).
  • KGs incorporating structural nodes enable handling structural questions (e.g., differences between Page 1 and Page 2) with substantial Struct-EM gains (67% reported in Table 1).
  • GPT/LMM-based traversal agents significantly outperform random traversal and can surpass several baseline retrievers in accuracy and F1 across HotpotQA, 2WikiMQA, MuSiQue, and IIRC.
  • Trade-offs exist between KG density and retrieval latency: higher density improves EM/F1 but increases latency; a well-tuned branching factor is crucial for maximizing performance under a fixed context budget.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.