QUICK REVIEW

[论文解读] Financial Report Chunking for Effective Retrieval Augmented Generation

Antonio Jimeno Yepes, Yao You|arXiv (Cornell University)|Feb 5, 2024

Cloud Computing and Resource Management被引用 16

一句话总结

本论文评估通过文档结构的结构化元素进行分块（不仅仅是基于段落或标记的分块）在金融问答的 Retrieval Augmented Generation (RAG) 中的提升效果，结果显示基于元素的分块在检索与问答性能方面表现最好且所需分块数量更少。

ABSTRACT

Chunking information is a key step in Retrieval Augmented Generation (RAG). Current research primarily centers on paragraph-level chunking. This approach treats all texts as equal and neglects the information contained in the structure of documents. We propose an expanded approach to chunk documents by moving beyond mere paragraph-level chunking to chunk primary by structural element components of documents. Dissecting documents into these constituent elements creates a new way to chunk documents that yields the best chunk size without tuning. We introduce a novel framework that evaluates how chunking based on element types annotated by document understanding models contributes to the overall context and accuracy of the information retrieved. We also demonstrate how this approach impacts RAG assisted Question & Answer task performance. Our research includes a comprehensive analysis of various element types, their role in effective information retrieval, and the impact they have on the quality of RAG outputs. Findings support that element type based chunking largely improve RAG results on financial reporting. Through this research, we are also able to answer how to uncover highly accurate RAG.

研究动机与目标

通过利用文档结构来推动金融文档中 RAG 的更好预处理与分块。
提出并评估基于元素类型的分块，使用文档理解模型。
分析不同分块策略对 FinanceBench 检索质量和问答准确性的影响。
证明基于元素的分块在提高 LLM 生成上下文质量的同时，降低索引需求。

提出的方法

描述一个使用 Weaviate 向量数据库和句子变换器编码器对分块进行索引的 RAG 流程。
将基线的基于标记的分块（128、256、512）与通过 Chipper 根据文档结构导出的基于元素的分块进行比较。
使用 GPT-4 生成元数据，并从检索到的分块在固定提示下生成答案。
在 FinanceBench 上评估分块策略，使用页面级检索准确率和段落级检索的 ROUGE/BLEU。
通过 GPT-4 自动评估和人工检查来衡量问答准确性。
报告不同策略下的令牌数量与索引效率。

实验结果

研究问题

RQ1按文档结构进行分块在金融报告的检索和问答中是否优于基于标记的分块？
RQ2基于元素的分块对检索指标（页面级与段落级）和问答准确性的影响是什么？
RQ3在不调整分块大小的情况下，结合多种分块方法是否可以提高性能？
RQ4基于元素的分块如何影响向量数据库管道中的索引效率和令牌使用量？

主要发现

Chunking strategy	Total Chunks	Page Accuracy	ROUGE	BLEU
Base 128	64,058	72.34	0.383	0.181
Base 256	32,051	73.05	0.433	0.231
Base 512	16,046	68.09	0.455	0.250
Base Aggregation	112,155	83.69	0.536	0.277
Keywords Chipper	20,843	46.10	0.444	0.315
Summary Chipper	20,843	62.41	0.473	0.350
Prefix & Table Description Chipper	20,843	67.38	0.514	0.400
Chipper Aggregation	62,529	84.40	0.568	0.452

基于元素的分块在所评估的方法中实现了最高的检索分数和最强的问答准确性。
将多种分块方法结合使用可取得最高的页面检索（84.4%）以及强劲的 ROUGE（0.568）和 BLEU（0.452）分数。
基于元素的分块无需调整分块大小超参数即可泛化，并减少所需的总分块数量（62,529）相比非结构化方法（112,155）。
基本的 512-token 分块在上下文长度方面与某些基于元素的分块表现相近，但在 ROUGE/BLEU 和问答质量方面表现较差。
GPT-4 自动评估在大多数情况下与人工问答一致，突显了自动评估在 RAG 结果中的可靠性。
该研究验证了结构化信息与使用元素类型可提升金融报告的 RAG 表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。