QUICK REVIEW

[논문 리뷰] RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Parth Sarthi, Salman Abdullah|arXiv (Cornell University)|2024. 01. 31.

Semantic Web and Ontologies인용 수 23

한 줄 요약

RAPTOR는 재귀적 텍스트 조각과 요약의 상향식 트리를 구축하여 긴 문서에 대해 다중 스케일의 맥락이 풍부한 검색을 가능하게 하고, 특히 GPT-4와 함께 NarrativeQA, QASPER, QuALITY에서 QA 성능을 향상시킨다.

ABSTRACT

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

연구 동기 및 목표

트리 구조의 다단계 추상화를 통해 텍스트 조각을 길게 늘려 표현함으로써 짧은 맥락 검색의 한계를 해결한다.
텍스트를 재귀적으로 클러스터링하고 요약하며 임베딩하여 검색 트리를 형성하는 확장 가능한 파이프라인을 개발한다.
다양한 유형의 질문과 길이를 지원하기 위해 추론 시점에 여러 추상화 수준에서의 검색을 가능하게 한다.

제안 방법

말단 경계 문장을 보존하면서 코퍼스를 100토큰 청크로 분할한다.
리프 노드를 형성하기 위해 SBERT (multi-qa-mpnet-base-cos-v1)로 청크를 임베딩한다.
차원 축소를 위한 UMAP와 함께 가우시안 혼합 모델(GMM)을 사용해 임베딩을 클러스터링하고, 군집 수는 Bayesian Information Criterion (BIC)로 결정한다.
각 클러스터를 언어 모델(GPT-3.5-turbo)로 요약하고 요약문을 다시 임베딩해 더 높은 트리 수준을 형성한다.
토큰 한계로 불가능해질 때까지 임베딩, 클러스터링, 요약을 반복하여 텍스트와 요약의 상향식 트리를 생성한다.
질의는 두 가지 전략으로 수행한다: 트리 순회(레이어별 코사인 유사도 가지치기)와 축소 트리(모든 노드를 평탄화한 검색); 성능과 유연성 측면에서 축소 트리를 선호하는 쪽이 많다.

Figure 1: Tree construction process: RAPTOR recursively clusters chunks of text based on their vector embeddings and generates text summaries of those clusters, constructing a tree from the bottom up. Nodes clustered together are siblings; a parent node contains the text summary of that cluster.

실험 결과

연구 질문

RQ1계층적이고 재귀적으로 요약된 텍스트 표현이 기존의 청크 기반 검색에 비해 긴 문서의 검색 품질을 향상시킬 수 있는가?
RQ2다단계 추상화가 QA 작업에서 다중 히깁 및 주제 기반 추론을 더 잘 가능하게 하는가?
RQ3다른 질의 전략들(트리 순회 대 축소 트리)이 검색 효율성에 미치는 영향은 무엇인가?
RQ4클러스터링 선택(GMM과 UMAP)과 요약이 전반적인 QA 성능과 망상률에 어떤 영향을 주는가?

주요 결과

Retriever	GPT-3 F-1 Match	GPT-4 F-1 Match	UnifiedQA F-1 Match
Title + Abstract	25.2	22.2	17.5
BM25	46.6	50.2	26.4
DPR	51.3	53.0	32.1
RAPTOR	53.1	55.7	36.6

RAPTOR는 NarrativeQA, QASPER, QuALITY 데이터셋 전반에 걸쳐 BM25 및 DPR 기반선 대비 일관되게 향상된다.
GPT-4를 사용한 QASPER에서 RAPTOR는 F-1 매치 55.7%를 달성하여 CoLT5 XL 및 이전 베이스라인을 능가한다.
GPT-4를 활용한 QuALITY에서 RAPTOR는 82.6% 정확도를 달성하여 이전 최고 성능을 능가하고 특히 QuALITY-HARD 베이스라인을 능가한다.
NarrativeQA에서 RAPTOR를 UnifiedQA와 결합하면 최첨단 METEOR를 달성하고 ROUGE/BLEU/METEOR 점수가 강하다.
약 2000 토큰(상위 약 20개 노드)에 이르는 축소 트리 검색이 평가된 데이터셋에서 최고의 성능을 낸다.
전체 트리 검색(여러 계층을 사용하는)은 일반적으로 일부 계층에 주의를 한정하는 전략보다 우수하다.

Figure 2: Illustration of the tree traversal and collapsed tree retrieval mechanisms. Tree traversal starts at the root level of the tree and retrieves the top- $k$ (here, top- $1$ ) node(s) based on cosine similarity to the query vector. At each level, it retrieves the top- $k$ node(s) from the chi

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.