QUICK REVIEW

[論文レビュー] Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Keheng Wang, Feiyu Duan|arXiv (Cornell University)|Aug 25, 2023

Topic Modeling被引用数 10

ひとこと要約

KD-CoTは、外部QAシステムと協働して中間推論を検証・調整する知識駆動型のチェーン・オブ・ソウトワークフレームワークを導入し、WebQSPとCWQでの知識集約的KBQAの性能を改善します。

ABSTRACT

Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come with incorrect or unfaithful intermediate reasoning steps, especially in the context of answering knowledge-intensive tasks such as KBQA. To alleviate this issue, we propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge, and thus overcome the hallucinations and error propagation. Concretely, we formulate the CoT rationale process of LLMs into a structured multi-round QA format. In each round, LLMs interact with a QA system that retrieves external knowledge and produce faithful reasoning traces based on retrieved precise answers. The structured CoT reasoning of LLMs is facilitated by our developed KBQA CoT collection, which serves as in-context learning demonstrations and can also be utilized as feedback augmentation to train a robust retriever. Extensive experiments on WebQSP and ComplexWebQuestion datasets demonstrate the effectiveness of proposed KD-CoT in task-solving reasoning generation, which outperforms the vanilla CoT ICL with an absolute success rate of 8.0% and 5.1%. Furthermore, our proposed feedback-augmented retriever outperforms the state-of-the-art baselines for retrieving knowledge, achieving significant improvement in Hit and recall performance. Our code and data are released on https://github.com/AdelWang/KD-CoT/tree/main.

研究の動機と目的

知識集約的なKBQAタスクにおけるLLMの幻覚や不正確な中間推論に対処する。
正確な知識を取得し中間サブアンサーを検証/修正するために外部QAシステムを活用する。
インコンテクスト学習とリトリーバ訓練を可能にするKBQA CoTコレクションを構築する。
フィードバックを付与したリトリーバと検証器が知識アクセスと推論品質を向上させることを実証する。

提案手法

反復的な類似性ベースのデモンストレーションでLLMsを指示し、ChatGPTを用いて構造化されたCoT推論を生成することで、KBQA CoTコレクションを構築する。
KD-CoTを提案する：LLMsがサブクエスチョンを生成し、それをリトリーブ-リードQAシステムと検証器に投入する対話型ループ。最後の回答を変えずにCoTを洗練するよう反復する。
知識ベース（例：Freebaseの1ホップサブグラフ）を非構造化テキストに線形化して、Wikipediaのパッセージと共に検索できるようにする。
CoTコレクションを用いて、クエリと回答エンティティを含む関連パッセージを識別するDPRベースのフィードバック付リトリーバを訓練する。
トップNパッセージを質問とともにエンコードして候補回答を生成するFuse-in-Decoderリーダーを用い、LLMのサブアンサーと取得回答のどちらを採用するかを選択するPEFTベースの検証器を訓練する。
知識集約的なマルチホップKBQAにおいて、構造化されたCoTデモンストレーションが非構造化CoTや他のICLベースラインを上回ることを実証する。

実験結果

リサーチクエスチョン

RQ1外部知識の取得と中間検証は、KBQAにおけるLLM推論の忠実性を向上させるか？
RQ2構造化されたCoTコレクションは、知識集約的な質問に対してより良いインコンテクスト学習とリトリーバ拡張を可能にするか？
RQ3フィードバック付リトリーバと検証器は、WebQSPとCWQでの検索品質と回答精度にどのように影響するか？
RQ4KD-CoTフレームワークを用いたマルチホップKBQA推論で、反復回数のトレードオフは何か？

主な発見

KD-CoTはWebQSPのHits@1でベースのCoT ICLより8.0ポイント、CWQでは5.1ポイント上回る。
フィードバック付リトリーバ(FBA-DPR)は、トップ100パッセージのヒット/リコールを従来の研究より有意に改善する（WebQSPとCWQ）。
検証器はLLMのサブアンサーを保持することが多いが、頻繁に修正して最終回答の改善に寄与する。
反復的な対話はCWQ（マルチホップ）の方でパフォーマンス向上度が大きく、WebQSP（シングルホップ寄り）よりKD-CoTが複雑な質問でより有効であることを示す。
CoT推論を用いた小型モデルのファインチューニングは、単純な質問ではわずかな改善をもたらす一方、複雑なマルチホップKBQAでは悪化する可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。