QUICK REVIEW

[論文レビュー] RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

Zihao Wang, Anji Liu|arXiv (Cornell University)|Mar 8, 2024

Context-Aware Activity Recognition Systems被引用数 10

ひとこと要約

RAT は、関連情報を取得して各思考過程を再評価することで長期的な推論を反復的に洗練し、コード生成、数学、具象計画、創作における事実性と性能を向上させます。

ABSTRACT

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- *retrieval-augmented thoughts* (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT

研究の動機と目的

長期的な生成における幻覚を減らす動機づけとして、取得を反復的な思考改訂と組み合わせる。
取得情報を用いて各思考ステップを改訂するゼロショットプロンプティングパイプライン（RAT）を開発する。
RAT を多様なタスク（コード生成、数学的推論、具象計画、創作 Writing）および複数のベース LLM で評価する。
取得戦略と因果的／非因果的推論が性能に与える影響を理解するためのアブレーションを分析する。

提案手法

タスクプロンプトから初期のゼロショットの逐次思考を生成する。
外部知識ベースから取得した passages を用いて各思考過程を反復的に改訂する。
現在の思考と過去の改訂思考を元に関連情報を取得するクエリを構築する。
取得情報で現在の思考を改訂し、次の思考ステップを付記してすべてのステップが改訂されるまで進める。
取得を支援するために、タスク固有の知識源（例：コードデータセット、Minecraft Wiki、ウェブ検索）と埋め込み（text-embedding-ada-002）を使用する。
思考を一つずつ改訂して正確さを高め、以前のステップを全面的に見直さずに進行する因果的・逐次的な方式で運用する。

Figure 1: Pipeline of RAT . Given a task prompt (denoted as $\mathit{I}$ in the figure), RAT starts from initial step-by-step thoughts ( $T_{1},T_{2},\cdots,T_{n}$ ) produced by an LLM in zero-shot (“let’s think step by step”). Some thought steps (such as $T_{1}$ in the figure) may be flawed due to

実験結果

リサーチクエスチョン

RQ1取得を用いた思考は長期的な生成における事実性を向上させ、幻覚を減らすのか。
RQ2反復的・段階的な取得は中間推論の質と最終出力にどのような影響を与えるのか。
RQ3RAT の利得はコード生成、数学的推論、具象計画、創作 Writing、そして異なるベースの LLM に跨って一貫しているのか。
RQ4因果的推論と非因果的推論を導く取得ガイド付き推論の間で RAT にどのような影響があるのか。

主な発見

RAT はタスク間で顕著な平均改善を示す：コード生成で 13.63%、数学的推論で 16.96%、創作 Writing で 19.2%、具象タスク計画で 42.78%。
RAT はいくつかのベンチマークで新たな最先端レベルを達成し、標準的な CoT や標準的な RAG のベースラインを上回った。
アブレーション研究は、反復的取得と因果的推論がパフォーマンスを高める効果を示している。
RAT はモデル（GPT-3.5、GPT-4、CodeLLaMA-7b）およびタスクを跨いで堅牢性を示し、特に GPT-4 で大きな利得を得ている。

Figure 2: Top : An example of different LLM reasoning methods on creative generation tasks. Red text indicates errors or illusions in the text generated by LLM, while green text represents correct generation. Methods without RAG often generate incorrect information with hallucination, classical RAG

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。