QUICK REVIEW

[論文レビュー] REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Jialong Liu, Dehan Shen|arXiv (Cornell University)|Mar 18, 2026

Robotic Locomotion and Control被引用数 0

ひとこと要約

REAL は perceptual corruption の下での頑健な四足歩行パークour のエンドツーエンドフレームワークであり、時空間ポリシー学習、FiLM によるクロスモーダルフュージョン、EKF による物理指导フィルタリング、整合性を意識したロスゲートを組み合わせて、ゼロショットの sim-to-real 移行を可能にする。

ABSTRACT

Extreme legged parkour demands rapid terrain assessment and precise foot placement under highly dynamic conditions. While recent learning-based systems achieve impressive agility, they remain fundamentally fragile to perceptual degradation, where even brief visual noise or latency can cause catastrophic failure. To overcome this, we propose Robust Extreme Agility Learning (REAL), an end-to-end framework for reliable parkour under sensory corruption. Instead of relying on perfectly clean perception, REAL tightly couples vision, proprioceptive history, and temporal memory. We distill a cross-modal teacher policy into a deployable student equipped with a FiLM-modulated Mamba backbone to actively filter visual noise and build short-term terrain memory actively. Furthermore, a physics-guided Bayesian state estimator enforces rigid-body consistency during high-impact maneuvers. Validated on a Unitree Go2 quadruped, REAL successfully traverses extreme obstacles even with a 1-meter visual blind zone, while strictly satisfying real-time control constraints with a bounded 13.1 ms inference time.

研究の動機と目的

perceptual degradation および視覚ノイズ下での頑健な四足歩行パークour を動機づける。
クロスモーダル地形推論のための二段階の teacher–student ポリシーパイプラインを開発する。
memory のための FiLM ベースの視覚-固有覚モーフィック融合と Mamba 時系列バックボーンを組み込む。
rigid-body 一貫性を担保する物理-guided Bayesianestimater（EKF）を導入する。
sim-to-real 移行を安定化させる整合性を意識したロスゲーティング機構を提案する。）

提案手法

proprioception–terrain アソシエーションのためのクロスモーダルアテンションを用いた private-teacher から deployable-student への蒸留。
FiLM で調整された視覚特徴と Mamba 時系列バックボーンを組み合わせて知覚ノイズ下の短期地形記憶を維持。
拡張カルマンフィルタ（EKF）による剛体ダイナミクスと不確定性を考慮した速度推定を統合し、物理的に一貫した状態推定を実現。
速度推定のための Huber-Gaussian ロスで値と不確定性を同時にモデリング。
蒸留中の模倣学習と強化学習を適応的にバランスさせる整合性意識のロスゲーティング。

Figure 1: Robust extreme parkour with proposed REAL framework. The robot successfully chains highly dynamic maneuvers across complex terrains with nominal vision (green box), and maintains stable locomotion even under severe visual degradation (red box).

実験結果

リサーチクエスチョン

RQ1 知覚劣化を受けた状況下で proprioception–terrain アソシエーションを活用する privileged teacher ポリシーは頑健な四足歩行を改善できるか。
RQ2 FiLM で調整されたクロスモーダルな student が Mamba バックボーンを用いて外部感覚入力が破損した場合にリアルタイムで頑健な性能を維持できるか。
RQ3 物理 guided EKF 融合は高ダイナミックな運動中の速度・状態推定を改善するか。
RQ4 適応的ロスゲートは sim-to-real 移行を安定化させ、 perceptual ノイズに対する頑健性を向上させるか。
RQ5 極端な地形とブラインドゾーン下で実機の Unitree Go2 に対するゼロショットの sim-to-real 移行は実現可能か。

主な発見

REAL は Unitree Go2 上での信頼性の高い極端なパークour を実現し、視覚的ブラインドゾーン1メートルを含む。推論時間はステップごと約13.1 ms。
FiLM–Mamba 学生は物理-guided フィルタリングと組み合わせることで知覚劣化下の安定性を維持し、極端な地形でベースラインを上回る。
EKF ベースの融合は速度推定のドリフトを低減し、衝突や滑り時の剛体一貫性を強化。
整合性を意識したロスゲーティングは訓練収束を加速し、固定重みベースのベースラインと比較して sim-to-real の頑健性を向上。
幅広なドメインランダム化設定により追加の微調整なしで現実ハードウェアへのゼロショット移行を実現。
アブレーション研究では Mamba または FiLM を除くと性能が大幅に低下し、時空間メモリとクロスモーダル融合の重要性を強調。

Figure 2: System architecture of REAL. Stage 1(Privileged Teacher Policy Learning) trains a privileged teacher policy via Proprioception-Terrain Associated Reasoning. Stage 2(Distillation Student Policy Learning) distills a deployable student policy using an onboard Mamba-FiLM spatial-temporal backb

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。