QUICK REVIEW

[論文レビュー] PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes

Xinru Cui, Linxi Feng|arXiv (Cornell University)|Jan 24, 2026

Robotic Locomotion and Control被引用数 0

ひとこと要約

PILOT は運動と操作を結合する統一的で感覚的な一段階強化学習コントローラを提示。クロスモーダルエンコーダと Mixture-of-Experts 方策を組み合わせ、複雑地形で Unitree G1 上にデモンストレーション。

ABSTRACT

Humanoid robots hold great potential for diverse interactions and daily service tasks within human-centered environments, necessitating controllers that seamlessly integrate precise locomotion with dexterous manipulation. However, most existing whole-body controllers lack exteroceptive awareness of the surrounding environment, rendering them insufficient for stable task execution in complex, unstructured scenarios.To address this challenge, we propose PILOT, a unified single-stage reinforcement learning (RL) framework tailored for perceptive loco-manipulation, which synergizes perceptive locomotion and expansive whole-body control within a single policy. To enhance terrain awareness and ensure precise foot placement, we design a cross-modal context encoder that fuses prediction-based proprioceptive features with attention-based perceptive representations. Furthermore, we introduce a Mixture-of-Experts (MoE) policy architecture to coordinate diverse motor skills, facilitating better specialization across distinct motion patterns. Extensive experiments in both simulation and on the physical Unitree G1 humanoid robot validate the efficacy of our framework. PILOT demonstrates superior stability, command tracking precision, and terrain traversability compared to existing baselines. These results highlight its potential to serve as a robust, foundational low-level controller for loco-manipulation in unstructured scenes.

研究の動機と目的

未構造環境向けの地形認識付き統一 loco-manipulation コントローラの不足を解消する。
運動と全身操作を統合した知覚型の単一段階 RL フレームワークを開発する。
注意機構ベースの知覚エンコーダと頑健な協調制御のための Mixture-of-Experts 方策を設計する。
階段、斜面、荒地での Unitree G1 人型ロボットを対象に sim-to-real 実験でアプローチを検証する。

提案手法

政策を出力する 29 次元の行動を PD コントローラ経由で関節トルクへ写像する PPO を用いた目標条件付き MDP を使用。
exteroceptive 知覚入力としてロボット中心の LiDAR 高度マップを組み込む。
予測ベースの固有受容 history と注意機構を組み合わせた cross-modal コンテキストエンコーダを提案。
複数のモーションスキルを統合して全身統一制御を実現する Mixture-of-Experts (MoE) 方策を採用。
上半身制御の残差学習により目標構成を精練し全運動学を回帰しない。
基底高さ、胴体姿勢、上半身操作目標を段階的にカバーする適応的指令カリキュラムを実装。

実験結果

リサーチクエスチョン

RQ1PILOT は複雑地形での全身協調と指令追従においてベースラインを上回るか？
RQ2注意機構付き知覚エンコーダと MoE 方策は PILOT の性能にどのように寄与するか？
RQ3シミュレーションから現実世界の Unitree G1 実験へどの程度転移できるか？
RQ4統一された知覚有効な方策は分離型アプローチと比べて loco-manipulation 中の安定性を向上させるか？

主な発見

Variant	E_v	E_ω	E_h	E_r	E_p	E_y	E_arm	E_stumble
HOMIE	0.386	0.384	0.022	-	-	0.020	0.340	-
FALCON	0.272	0.263	0.083	-	-	0.143	0.305	-
AMO	0.357	-	0.032	0.089	0.156	0.115	0.206	-
PILOT w/o vision	0.145	0.099	0.010	0.057	0.053	0.055	0.206	-
PILOT	0.148	0.102	0.009	0.055	0.056	0.068	0.218	0.006
PILOT w/o vision (full terrains)	0.201	0.137	0.013	0.062	0.070	0.071	0.225	0.087
PILOT w/o attention-based encoder	0.167	0.128	0.016	0.065	0.075	0.073	0.223	0.066
PILOT w/o MoE	0.179	0.121	0.015	0.067	0.080	0.078	0.246	0.017

PILOT は簡易地形で基準より低い平均追従誤差を示す（線形速度、角速度、基底高さ、胴体のロール/ピッチ/ヨー、腕ジョイント）。
全地形で、知覚と MoE を用いた PILOT は基底高さ追従と胴体姿勢処理で優位性を示す（E_h = 0.009–0.010）。
アブレーション実験で視覚が必須であることが示され、知覚や注意機構を削除すると性能が低下しつまずき率が有意に増加（例: w/o attention: E_stumble = 0.066）。
MoE 方策はモーションプリミティブ間で専門化された協調運動と明確なエキスパート活性化パターンを生み出す。
実世界での Unitree G1 実機試験は 50 Hz の方策更新と 500 Hz の PD 基づきトルク指令で sim-to-real 転移がゼロショットで有効であることを確認。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。