[论文解读] Retrieval-Augmented Generation with Graphs (GraphRAG)
本文综述基于图结构数据的检索增强生成(Retrieval-Augmented Generation),提出一个整体的 GraphRAG 框架以及面向领域的设计,以指导在图上进行检索、组织和生成。
Retrieval-augmented generation (RAG) is a powerful technique that enhances downstream task execution by retrieving additional information, such as knowledge, skills, and tools from external sources. Graph, by its intrinsic "nodes connected by edges" nature, encodes massive heterogeneous and relational information, making it a golden resource for RAG in tremendous real-world applications. As a result, we have recently witnessed increasing attention on equipping RAG with Graph, i.e., GraphRAG. However, unlike conventional RAG, where the retriever, generator, and external data sources can be uniformly designed in the neural-embedding space, the uniqueness of graph-structured data, such as diverse-formatted and domain-specific relational knowledge, poses unique and significant challenges when designing GraphRAG for different domains. Given the broad applicability, the associated design challenges, and the recent surge in GraphRAG, a systematic and up-to-date survey of its key concepts and techniques is urgently desired. Following this motivation, we present a comprehensive and up-to-date survey on GraphRAG. Our survey first proposes a holistic GraphRAG framework by defining its key components, including query processor, retriever, organizer, generator, and data source. Furthermore, recognizing that graphs in different domains exhibit distinct relational patterns and require dedicated designs, we review GraphRAG techniques uniquely tailored to each domain. Finally, we discuss research challenges and brainstorm directions to inspire cross-disciplinary opportunities. Our survey repository is publicly maintained at https://github.com/Graph-RAG/GraphRAG/.
研究动机与目标
- 定义一个包含 query processor、retriever、organizer、generator 和 graph data source 的全面 GraphRAG 框架。
- 评审针对不同领域和图格式定制的 GraphRAG 技术。
- 总结在各领域使用的图构建方法、基准数据集和工具。
- 突出挑战并提出方向以启发跨学科研究和产业机会。
提出的方法
- 提出一个五组件的 GraphRAG 框架(query processor、retriever、organizer、generator、graph data source)。
- 评审领域专用的图构建和关系模式,以定制 retriever 和 generator 的设计。
- 对 GraphRAG 技术进行分类,并将检索方法适配到图结构数据(例如图遍历、GNNs、关系匹配)。
- 讨论五种针对图的查询处理技术(NER、关系抽取、查询结构化、查询分解、查询扩展)。
- 提供按领域划分的分类法(知识 KG、文档、科学、社会、规划、表格、基础设施、生物、场景、随机图)并总结数据集/工具。
实验结果
研究问题
- RQ1在图域之间,什么构成统一的 GraphRAG 框架?
- RQ2检索器和生成器应如何设计,以利用图结构而非仅文本/语义信号?
- RQ3GraphRAG 设计与部署中哪些领域特定的考量至关重要?
- RQ4GraphRAG 研究与应用中的主要挑战和有前景的方向有哪些?
主要发现
- 提出并分析了一个包含五个核心组件的全面 GraphRAG 框架。
- GraphRAG 技术针对不同领域进行专业化,以应对多样的关系模式和图格式。
- 对十个领域的分类法为图构建、检索、组织和生成策略提供指导。
- 该综述编目跨领域使用的基准数据集和工具资源。
- 讨论了挑战与未来方向,以促进跨学科机会和产业部署。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。