QUICK REVIEW

[論文レビュー] LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Subbarao Kambhampati, Karthik Valmeekam|arXiv (Cornell University)|Feb 2, 2024

Semantic Web and Ontologies被引用数 10

ひとこと要約

本論文は、LLMsは自律的に計画を立てたり計画を検証したりすることはできないが、外部検証者を用いた健全な計画のための LLM-Modulo フレームワーク内で、有用な知識源および生成器として機能し得る、という主張を展開している。

ABSTRACT

There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {\bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.

研究の動機と目的

LLMs の計画能力と検証能力に関する誤解を明らかにする。
LLMs をモデルベースの検証者と組み合わせて健全な計画を可能にするフレームワークを提案する。
知識獲得、計画生成、仕様の精練において LLMs がどのように支援できるかを示す。
純粋に逐次的な LLM-シンボリックパイプラインを避ける神経-シンボリックなアプローチを推進する。

提案手法

Generate-Test-Critique ループアーキテクチャを提示する（LLMs が候補を生成し、評論者がそれを評価する）。
さまざまな検 verifier のために候補の計画を翻訳する reformulator モジュールを導入する。
Backprompt Controller を使用して批評を集約し再問い直す。
外部検証済みソリューションを用いたファインチューニング／データ生成の経路を説明する。
ドメイン/問題レベルでの人間の入力を伴う半自動的な仕様精練とモデル獲得の概要を示す。

実験結果

リサーチクエスチョン

RQ1LLMs は自律的に実行可能な計画を生成したり自分の計画を検証したりできるか？
RQ2自律的な計画能力や検証能力を主張することなく、LLMs は計画にどのように寄与できるか？
RQ3LLM-Modulo フレームワーク内で計画の健全性を確保するための外部批評家／検証者の役割は何か？
RQ4形式的保証を維持しつつ、ドメインモデルと問題仕様を LLM の支援を得てどのように取得・精練できるか？

主な発見

実証的証拠は、自律的な LLM の計画がしばしば実行不可能であることを示しており、テストされた IPC に類似するドメインで最高の LLM は約 12% の実行可能な計画を達成する（GPT-4）。
LLMs の反復的な自己批評は外部検証者なしには計画の質を信頼性高く向上させない。
LLM 単独の計画に保証が欠如していることは、正確性を確保するためにモデルベースの批評家との結合を促す。
外部検証者と統合された Generate-Test-Critique ループで、知識源および候補計画生成器としての LLM の価値がある。
このフレームワークは、検証者が承認する形式的な正確性を備えた計画をサポートしつつ、LLMs による柔軟で広範な知識獲得を許容する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。