QUICK REVIEW

[論文レビュー] Multi-Step Semantic Reasoning in Generative Retrieval

Steven Dong, Yubao Tang|arXiv (Cornell University)|Mar 12, 2026

Information Retrieval and Search Behavior被引用数 0

ひとこと要約

ReasonGRは、構造化プロンプティングと推論アダプタを用いて生成型検索の多段階意味推論を強化し、FinQAの検索精度と学習効率を向上させる。

ABSTRACT

Generative retrieval (GR) models encode a corpus within model parameters and generate relevant document identifiers directly for a given query. While this paradigm shows promise in retrieval tasks, existing GR models struggle with complex queries in numerical contexts, such as those involving semantic reasoning over financial reports, due to limited reasoning capabilities. This limitation leads to suboptimal retrieval accuracy and hinders practical applicability. We propose ReasonGR, a framework designed to enhance multi-step semantic reasoning in numerical contexts within GR. ReasonGR employs a structured prompting strategy combining task-specific instructions with stepwise reasoning guidance to better address complex retrieval queries. Additionally, it integrates a reasoning-focused adaptation module to improve the learning of reasoning-related parameters. Experiments on the FinQA dataset, which contains financial queries over complex documents, demonstrate that ReasonGR improves retrieval accuracy and consistency, indicating its potential for advancing GR models in reasoning-intensive retrieval scenarios.

研究の動機と目的

複雑な文書に対して複数ステップの数値推論を要求するクエリの検索を改善する動機づけ。
構造化プロンプティングと段階的推論ガイダンスを組み合わせたReasonGRフレームワークを提案する。
推論関連パラメータを効率的に学習するための推論焦点型適応モジュールを導入する。
基準となる生成型検索手法に対してFinQAデータセットで改善を示す。

提案手法

LoRAベースの推論アダプタを備えた変換器ベースのエンコーダ–デコーダーを生成型検索に利用。
凍結バックボーンを量子化してメモリ使用量を削減する4-bit QLoRAを適用。
タスクテンプレートとChain-of-Thought指示を組み合わせたプロンプトを用いた推論ガイド付き訓練を設計。
2つのタスクで訓練：MLEによるdocidsの記憶化と推論 traces を用いた多段階関連性の学習。
トークンレベル予測を監督するためのEM、PM、SM、S-Score信号を組み合わせた適応ペナルティスケーリング損失を使用。

Figure 1: ReasonGR performing multi-step semantic reasoning on a FinQA query. The model extracts key info and locates relevant report sections to generate the docid, formed by the company name and report year.

実験結果

リサーチクエスチョン

RQ1構造化プロンプティング（Few-shot含むおよびCoT）を用いると、金融文書に対する生成型検索で多段階推論を改善できるか。
RQ2LoRA/QLoRAを用いた推論アダプタは、推論重視タスクの検索精度と学習効率を改善するか。
RQ3ReasonGRは従来の検索およびバニラGRベースラインと比較してFinQAデータセットでどの程度性能を発揮するか。
RQ4プロンプト設計（Zero vs CoT vs 完全なReasonGR）が性能と効率に与える影響はどれほどか。

主な発見

Model	EM (Eval)	PM (Eval)	SM (Eval)	EM (Test)	PM (Test)	SM (Test)
BM25	0.623	-	-	0.625	-	-
DSI	0.563	0.646	0.651	0.578	0.654	0.659
ReasonGR (Zero)	0.572	0.732	0.748	0.601	0.750	0.767
ReasonGR (CoT)	0.571	0.728	0.748	0.612	0.755	0.774
ReasonGR	0.607	0.751	0.765	0.626	0.762	0.779

ReasonGRのバリアントはFinQA評価およびテストセットでEM、PM、SMの指標においてBM25、DSIを上回る。
Full ReasonGRは最良のPMとSMスコアを達成し、BM25に比べてEMを改善。
プロンプト訓練（Few-shot + CoT）は有益；プロンプトなし（Zero）の場合は性能が低下。
CoTのみのプロンプトは中間的な利得を生み、Few-shotプロンプトとの組み合わせで恩恵を受ける。
ReasonGRはプロンプト設計に応じてメモリ使用量を抑えつつ訓練時間を短縮できる場合がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。