QUICK REVIEW

[論文レビュー] Towards Adaptive Environment Generation for Training Embodied Agents

Teresa Yeo, Dulaj Sanjaya Weerakoon|arXiv (Cornell University)|Feb 6, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

論文は、エージェントの軌跡分析を用いて訓練環境を適応的に修正する閉ループ型フレームワークを提案し、 embodiment navigation タスクのための徐々に難しくなる、フィードバック駆動型のカリキュラムを可能にします。

ABSTRACT

Embodied agents struggle to generalize to new environments, even when those environments share similar underlying structures to their training settings. Most current approaches to generating these training environments follow an open-loop paradigm, without considering the agent's current performance. While procedural generation methods can produce diverse scenes, diversity without feedback from the agent is inefficient. The generated environments may be trivially easy, providing limited learning signal. To address this, we present a proof-of-concept for closed-loop environment generation that adapts difficulty to the agent's current capabilities. Our system employs a controllable environment representation, extracts fine-grained performance feedback beyond binary success or failure, and implements a closed-loop adaptation mechanism that translates this feedback into environment modifications. This feedback-driven approach generates training environments that more challenging in the ways the agent needs to improve, enabling more efficient learning and better generalization to novel settings.

研究の動機と目的

unseen environment への一般化能力を向上させる必要性を動機付ける。
エージェントの性能に基づいて環境の難易度を適応させる閉ループシステムを提案する。
構造化された環境表現と細粒度の軌跡フィードバックを活用してターゲットを絞ったカリキュラム設計を行う。
アナリシスと修正にLLMを用いた概念実証で実現可能性を示す。
評価と拡張のための限界を強調し、将来の方向性を概説する。

提案手法

環境を構造化されたシーングラフ（O、A、R）として表現し、修正を可制御にする。
分析モデルF（例：GPT-4.1-mini）を使用して、エージェント軌跡から成功、途中の関心事、そして高レベルの修正提案を抽出する。
ジェネレータG（例：GPT-4.1-mini）を使用して、Fの分析を具体的な環境編集に翻訳し、妥当性と解法性を保証する。
衝突に配慮した配置を実装し、オブジェクトの交差を避けつつ修正を実現する。
更新された環境をレンダリングし、ループを反復させて progresive curriculum を生成する。
オプションとして、設定のデルタを生成する際の勾配ベースとモデルベース（LLM）アプローチの比較を議論する。

Figure 1: Embodied navigation performance is sensitive to object perturbations. Top-down view of agent trajectories (yellow to orange path) for an object navigation task with the fridge as the target object. In the training environment (left), the agent successfully navigates to the target, while in

実験結果

リサーチクエスチョン

RQ1閉ループでフィードバック駆動の環境生成ループは embodimentエージェントの学習効率を改善できるか。
RQ2細粒度の軌跡分析は意味のある現実的な環境修正をどう導くか。
RQ3LLMベースの編集と勾配ベースの環境デルタ予測のトレードオフは何か。
RQ4生成された環境は連続した編集後も解法可能性と物理的に妥当性を維持するか。

主な発見

軌跡分析をガイドとして環境摂動を行う閉ループの概念実証パイプラインは実現可能である。
環境修正は無作為な摂動ではなく、より挑戦的で現実的な場面（例：狭い道など）を創出することに焦点を当てる。
衝突回避と妥当性制約を組み込むことで実行可能な環境を維持できる。
この手法は適応的な環境設計が効率性と一般化を向上させる可能性を示すが、総合的な評価は今後の課題である。

Figure 2: Overview of the proposed adaptive environment generation framework. Starting from an original environment $e_{t}$ , an agent with policy $\pi_{t}$ is deployed to perform embodied tasks (e.g., object navigation), producing a top-down trajectory visualization $\tau^{e_{t}}$ . An analysis mod

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。