QUICK REVIEW

[论文解读] SF-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Question Answering

Rui Yu, Tianyi Wang|arXiv (Cornell University)|Feb 14, 2026

Topic Modeling被引用 0

一句话总结

SF-RAG 在执行结构保真检索以进行问答时保留了学术论文的原生层级结构，降低检索碎片化，在固定 token 预算下改善证据分配。

ABSTRACT

Efficient question-answering (QA) over extensive scientific literature is essential for evidence-based engineering decision-making. Retrieval-augmented generation (RAG) is increasingly applied to question-answering over long academic papers, where accurate evidence allocation under a fixed token budget is critical. However, existing approaches flatten papers into unstructured chunks, destroying the native hierarchical structure and forcing retrieval to operate in a disordered space. This produces fragmented contexts, misallocates tokens to non-evidential regions, and increases the reasoning burden for downstream language models.To address these issues, we propose SF-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior.SF-RAG first inherits the native hierarchy to construct a structure-fidelity index, which prevents entropy increase at the source.It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget, yielding compact, coherent, and low-entropy retrieval contexts.In contrast to existing RAG approaches, SF-RAG avoids entropy increase caused by destructive preprocessing and provides a native low-entropy structural basis for subsequent retrieval. We further introduce entropy-based structural diagnostics to quantify retrieval fragmentation and evidence allocation accuracy.Evaluations across three QA benchmarks show that SF-RAG significantly reduces retrieval fragmentation and improves evidence allocation. These structural benefits drive superior answer quality, establishing a scalable foundation for intelligent engineering document systems and future applications in technical specifications.

研究动机与目标

推动在长篇科学文献上进行高效问答以支持基于证据的工程决策。
识别将论文扁平化为无结构片段的局限及其对证据分配的影响。
提出一个检索增强生成框架，保持结构以降低检索中的熵。

提出的方法

继承原生论文层级以构建结构保真索引。
设计一个路径引导的检索机制，使查询与相关章节和从根到叶的路径在 token 预算内对齐。
通过避免破坏性预处理并保持结构上下文来实现低熵检索。
引入基于熵的结构诊断以量化碎片化和证据分配。

实验结果

研究问题

RQ1在基于 RAG 的问答中，保留学术论文的原生层级结构是否能减少检索碎片化？
RQ2在固定 token 预算下，结构保真检索是否能改善证据分配和答案质量？
RQ3基于熵的结构诊断如何反映学术问答中的检索性能？
RQ4路径引导检索对将查询与相关章节对齐的影响如何？

主要发现

与基线相比，SF-RAG 显著减少检索碎片化。
SF-RAG 通过在检索上下文中保持结构一致性来改善证据分配。
在固定 token 约束下，结构感知方法在学术问答基准测试中获得更高的答案质量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。