QUICK REVIEW

[論文レビュー] Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents

Ahmed, Zergham, Kazuki Irie|arXiv (Cornell University)|Jan 31, 2026

AI-based Problem Solving and Planning被引用数 0

ひとこと要約

TheoryCoder-2 は文脈内学習を介して再利用可能な抽象概念を自動的に学習し、階層的計画を可能にする。これによりサンプル効率と一般化性能が改善され、さまざまな環境で適用可能性が高まる。

ABSTRACT

Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks -- an ability that remains challenging for state-of-the-art large language model (LLM) agents and deep reinforcement learning (RL) systems. Inspired by the cognitive science of how people form abstractions and intuitive theories of their world knowledge, Theory-Based RL (TBRL) systems, such as TheoryCoder, exhibit strong generalization through effective use of abstractions. However, they heavily rely on human-provided abstractions and sidestep the abstraction-learning problem. We introduce TheoryCoder-2, a new TBRL agent that leverages LLMs' in-context learning ability to actively learn reusable abstractions rather than relying on hand-specified ones, by synthesizing abstractions from experience and integrating them into a hierarchical planning process. We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban. We find that TheoryCoder-2 is significantly more sample-efficient than baseline LLM agents augmented with classical planning domain construction, reasoning-based planning, and prior program-synthesis agents such as WorldCoder. TheoryCoder-2 is able to solve complex tasks that the baselines fail, while only requiring minimal human prompts, unlike prior TBRL systems.

研究の動機と目的

人間のような抽象学習が計画効率と一般化を向上させる必要性を動機づける。
TheoryCoder-2 を開発し、自己完結的に高レベル抽象概念（PDDL 演算子）を合成し、低レベル世界モデルで基盤付けする。
カリキュラム主導の学習を通じて複数の環境で学習した抽象概念の再利用を可能にする。
LLM 支援計画や WorldCoder を含むベースラインに対して、サンプル効率とタスク成功率の改善を示す。

提案手法

TheoryCoder を拡張して、LLM の文脈内学習を用いて高レベル抽象概念を自動合成する。
抽象概念を階層的計画機（Fast Downward）用の PDDL ドメインと問題ファイルとして表現し、Python ベースの world model で grounding する。
高レベル計画機が抽象演算子を選択し、低レベル計画機が学習済み遷移関数を用いて grounded なアクション列を実行する bi-level planning ループを使用する。
環境データから学習した Python の述語分類器で述語意味を grounding する。
予測誤差と計画結果を用いて LLM にプロンプトを与え、world model と抽象を反復的に改良する。
エピソード的カリキュラムを通じて類似環境をグループ化し、学習した演算子と述語の再利用を促進する抽象ライブラリを成長させる。

Figure 1 : Comparison of agent–environment interaction between methods. WorldCoder and LLM + P both fall under the LLM + Planner category.

実験結果

リサーチクエスチョン

RQ1TheoryCoder-2 は抽象状態と行動を自律的に学習し、異なる環境間で移転できるか。
RQ2学習した抽象概念の再利用が新しいタスクのサンプル効率を改善するか。
RQ3VGDL、BabyAI、Minihack の多様なドメインで、トークンコスト、計算時間、解決率の観点で TheoryCoder-2 はベースラインとどう比較されるか。

主な発見

Task (Game)	Full	TC - P	TC - C	LLM + π	LLM + P	WorldCoder
Labyrinth	21378	24510	21378	5173	28931	56360
Maze	19737	23186	21236	3518	24396	56085
Sokoban	7171	10373	8441	2608	25919	19684
BabyAI Pickup	8588	6660	8588	2405	20589	18013
BabyAI Unlock	33116	41734	33116	5705	50071	97938
BabyAI Combined Skills 1	1961	54277	44725	40960	41515	119330
BabyAI Combined Skills 2	102528	53376	45175	49973	55078	120375
BabyAI Combined Skills 3	2454	53064	45017	29791	55078	120375
Minihack-5x5	5163	7671	5163	1115	12595	8144
Minihack-15x15	0	9815	4837	1402	12124	0
Minihack-Traps	0	14326	5007	9110	29712	0
Minihack-Monster	0	21189	6125	1290	30940	0
Minihack-WoD	19433	21932	19433	4376	52434	62165

TheoryCoder-2 は move_to などのコア抽象を学習し、Labyrinth、Maze、Sokoban で再利用する。
TheoryCoder-2 によって学習された抽象はより難しいタスク（例: BabyAI Boss）に転移し、ベースラインが苦戦する問題を解決可能にする。
TheoryCoder-2 は LLM ベースのベースラインおよび WorldCoder と比べて、サンプル効率が高く、解決率が競合的または優れており、トークンコストを抑えつつ計画を高速化する。
TheoryCoder-2 によって学習された抽象は、性能面で手作業で設計された Oracle 抽象の品質に近づく。
カリキュラムベースの学習と grounded な、コードベースの抽象が、 prompting-based 抽象よりも高速な合成と計画に寄与する。
Minihack ではゼロショット転移が見られ、学習した move_to 抽象が後続タスクの迅速な解決を可能にする。

Figure 2 : An illustration of the curriculum used in our experiments. A curriculum is a sequence of episodes in which each episode contains one or more environments/games. The sequence of the first episode (Labyrinth) and the second one (Maze, and Sokoban) is studied in Experiment 1 (Sec. 4.1 ), whi

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。