QUICK REVIEW

[논문 리뷰] RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

Zihao Wang, Anji Liu|arXiv (Cornell University)|2024. 03. 08.

Context-Aware Activity Recognition Systems인용 수 10

한 줄 요약

RAT는 관련 정보를 검색하여 각 chain-of-thought 단계들을 반복적으로 다듬어 장기적 추론을 개선하고, 코드 생성, 수학, 구현 계획, 창의적 글쓰기에 걸쳐 사실성 및 성능을 향상시킵니다.

ABSTRACT

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- *retrieval-augmented thoughts* (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT

연구 동기 및 목표

장기적 생성에서 환각을 줄이기 위해 검색을 반복적 사고 수정과 통합하는 것을 목표로 한다.
검색된 정보를 이용해 각 사고 단계를 수정하는 제로샷 프롬프트 파이프라인(RAT)을 개발한다.
다양한 작업(코드 생성, 수학적 추론, 구현 계획, 창의적 글쓰기) 및 여러 기본 LLM에서 RAT를 평가한다.
검색 전략과 인과적 대 비인과적 추론이 성능에 미치는 영향을 이해하기 위한 아블레이션을 분석한다.

제안 방법

태스크 프롬프트로부터 초기 제로샷 단계별 사고를 생성한다.
외부 지식 베이스에서 질의된 검색 구절을 사용해 각 사고 단계를 반복적으로 수정한다.
현재의 수정된 사고와 과거의 사고를 바탕으로 관련 정보를 검색하기 위한 질의를 구성한다.
모든 단계가 수정될 때까지 현재 사고를 수정하고 다음 사고 단계를 추가하며 진행한다.
검색을 지원하기 위해 코드 데이터 세트, Minecraft 위키, 웹 검색과 같은 태스크 특화 지식 원천 및 임베딩(text-embedding-ada-002)을 사용한다.
앞선 단계를 대대적으로 수정하지 않고 정확도를 높이기 위해 사고를 하나씩 인과적으로 진행하는 방식으로 작동한다.

Figure 1: Pipeline of RAT . Given a task prompt (denoted as $\mathit{I}$ in the figure), RAT starts from initial step-by-step thoughts ( $T_{1},T_{2},\cdots,T_{n}$ ) produced by an LLM in zero-shot (“let’s think step by step”). Some thought steps (such as $T_{1}$ in the figure) may be flawed due to

실험 결과

연구 질문

RQ1검색 기반 증강 사고가 장기적 생성의 사실성 향상 및 환각 감소에 기여하는가?
RQ2반복적이고 단계별 검색이 중간 추론의 질과 최종 출력에 어떤 영향을 미치는가?
RQ3 RAT의 이득이 코드 생성, 수학적 추론, 구현 계획, 창의적 글쓰기 및 다양한 기본 LLM에서 일관적인가?
RQ4인과적 대 비인과적 검색 유도 추론이 RAT 내에서 어떤 영향을 미치는가?

주요 결과

RAT는 작업 전반에서 상당한 평균 향상을 보여준다: 코드 생성에서 13.63%, 수학적 추론에서 16.96%, 창의적 글쓰기에서 19.2%, 구현 계획에서 42.78%의 향상.
RAT는 여러 벤치마크에서 기존 CoT 및 표준 RAG 베이스라인을 능가하며 새로운 최첨단 수준을 달성한다.
아블레이션 연구는 반복적 검색과 인과적 추론의 효과를 보여 성능 향상에 기여한다.
RAT는 다양한 모델(GPT-3.5, GPT-4, CodeLLaMA-7b)과 작업에 대해 견고함을 보이며, GPT-4에서 특히 큰 이득을 보인다.

Figure 2: Top : An example of different LLM reasoning methods on creative generation tasks. Red text indicates errors or illusions in the text generated by LLM, while green text represents correct generation. Methods without RAG often generate incorrect information with hallucination, classical RAG

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.