QUICK REVIEW

[논문 리뷰] SF-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Question Answering

Rui Yu, Tianyi Wang|arXiv (Cornell University)|2026. 02. 14.

Topic Modeling인용 수 0

한 줄 요약

SF-RAG는 학술 논문의 고유한 계층 구조를 보존하여 구조-충실성 검색을 수행하고 고정된 토큰 예산 하에서 검색 단편화를 줄이고 증거 배치를 개선한다.

ABSTRACT

Efficient question-answering (QA) over extensive scientific literature is essential for evidence-based engineering decision-making. Retrieval-augmented generation (RAG) is increasingly applied to question-answering over long academic papers, where accurate evidence allocation under a fixed token budget is critical. However, existing approaches flatten papers into unstructured chunks, destroying the native hierarchical structure and forcing retrieval to operate in a disordered space. This produces fragmented contexts, misallocates tokens to non-evidential regions, and increases the reasoning burden for downstream language models.To address these issues, we propose SF-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior.SF-RAG first inherits the native hierarchy to construct a structure-fidelity index, which prevents entropy increase at the source.It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget, yielding compact, coherent, and low-entropy retrieval contexts.In contrast to existing RAG approaches, SF-RAG avoids entropy increase caused by destructive preprocessing and provides a native low-entropy structural basis for subsequent retrieval. We further introduce entropy-based structural diagnostics to quantify retrieval fragmentation and evidence allocation accuracy.Evaluations across three QA benchmarks show that SF-RAG significantly reduces retrieval fragmentation and improves evidence allocation. These structural benefits drive superior answer quality, establishing a scalable foundation for intelligent engineering document systems and future applications in technical specifications.

연구 동기 및 목표

증거 기반 엔지니어링 의사결정을 위해 긴 과학 문헌에 대한 효율적인 QA를 촉진한다.
논문을 비구조적 청크로 평탄화하는 한계와 그것이 증거 배치에 미치는 영향을 식별한다.
구조를 보존하여 검색의 엔트로피를 감소시키는 검색-증강 생성 프레임워크를 제안한다.

제안 방법

고유의 논문 계층 구조를 물려받아 구조-충실 색인(structure-fidelity index)을 구축한다.
토큰 예산 하에서 관련 섹션 및 루트-투-리프 경로와 일치하는 경로-가이드 검색 메커니즘을 설계한다.
파괴적 사전처리 없이 구조적 맥락을 보존하여 저-엔트로피 검색을 가능하게 한다.
단절화를 정량화하고 증거 배치를 측정하기 위한 엔트로피 기반 구조 진단을 도입한다.

실험 결과

연구 질문

RQ1학술 논문의 고유한 계층 구조를 보존하는 것이 RAG 기반 QA에서 검색 단편화를 줄일 수 있는가?
RQ2구조-충실 검색이 고정된 토큰 예산 하에서 증거 배치와 정답 품질을 개선하는가?
RQ3엔트로피 기반 구조 진단이 학술 QA에서 검색 성능을 어떻게 반영하는가?
RQ4경로 가이드 검색이 관련 섹션에 대한 질의 정렬에 미치는 영향은 어떠한가?

주요 결과

SF-RAG는 기초 방법 대비 검색 단편화를 크게 감소시킨다.
SF-RAG는 검색 맥락의 구조적 일관성을 유지함으로써 증거 배치를 개선한다.
구조 인식 접근법은 토큰 제약 하에서 학술 QA 벤치마크에서 더 높은 정답 품질을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.