QUICK REVIEW

[论文解读] REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Jialong Liu, Dehan Shen|arXiv (Cornell University)|Mar 18, 2026

Robotic Locomotion and Control被引用 0

一句话总结

REAL 是一个端到端框架，用于在感知受损下实现鲁棒的四足跑酷，通过时空策略学习、FiLM 基于的跨模态融合、带 EKF 的物理引导滤波，以及一致性感知的损失门控实现零-shot 模拟到真实转移。

ABSTRACT

Extreme legged parkour demands rapid terrain assessment and precise foot placement under highly dynamic conditions. While recent learning-based systems achieve impressive agility, they remain fundamentally fragile to perceptual degradation, where even brief visual noise or latency can cause catastrophic failure. To overcome this, we propose Robust Extreme Agility Learning (REAL), an end-to-end framework for reliable parkour under sensory corruption. Instead of relying on perfectly clean perception, REAL tightly couples vision, proprioceptive history, and temporal memory. We distill a cross-modal teacher policy into a deployable student equipped with a FiLM-modulated Mamba backbone to actively filter visual noise and build short-term terrain memory actively. Furthermore, a physics-guided Bayesian state estimator enforces rigid-body consistency during high-impact maneuvers. Validated on a Unitree Go2 quadruped, REAL successfully traverses extreme obstacles even with a 1-meter visual blind zone, while strictly satisfying real-time control constraints with a bounded 13.1 ms inference time.

研究动机与目标

在感知降级和视觉噪声下推动鲁棒的四足跑酷。
为跨模态地形推理开发一个两阶段教师-学生策略管线。
结合 FiLM 调制的视觉-本体感知融合与 Mamba 时序骨干网以维持记忆。
引入物理引导的贝叶斯估计（EKF）以实现刚体一致性。
提出一致性感知的损失门控机制以稳定从仿真到真实的转移。

提出的方法

私有教师对可部署学生进行蒸馏，并利用跨模态注意力实现本体感知与地形的关联。
FiLM 调制的视觉特征与 Mamba 时序骨干相结合，在感知噪声下维持短期地形记忆。
将不确定性感知的神经速度估计与刚性体动力学通过扩展卡尔曼滤波（EKF）融合，实现物理一致的状态估计。
用于速度估计的Huber-Gaussian 损失，联合建模数值与不确定性。
一致性感知的损失门控在蒸馏阶段自适应平衡模仿学习与强化学习，以稳定训练。

Figure 1: Robust extreme parkour with proposed REAL framework. The robot successfully chains highly dynamic maneuvers across complex terrains with nominal vision (green box), and maintains stable locomotion even under severe visual degradation (red box).

实验结果

研究问题

RQ1利用本体感知-地形关联的特权教师策略，是否可以在感知降级下提升鲁棒的四足姿态运动？
RQ2具有 FiLM 调制的跨模态学生结合 Mamba 骨干，在外部观测输入被污染时，能否保持鲁棒的实时性能？
RQ3基于物理引导的 EKF 融合是否在高动态机动中改善速度/状态估计？
RQ4自适应损失门控是否能稳定从仿真到真实的转移并提高对感知噪声的鲁棒性？
RQ5在极端地形与盲区下，零-shot 从仿真到真实的转移在真实四足机器人（Unitree Go2）上是否可行？

主要发现

REAL 在 Unitree Go2 上实现了可靠的极限跑酷，包括一个1米的视觉盲区，单步推理时间约为13.1 ms。
FiLM–Mamba 学生结合物理引导滤波，在感知降级下保持稳定性，并在极端地形中超越基线。
基于 EKF 的融合减少速度估计漂移，并在冲击和打滑时强化刚体一致性。
一致性感知损失门控加速了训练收敛，并相较于固定权重基线提高了从仿真到真实的鲁棒性。
广泛的域随机化设置实现了对真实硬件的零-shot 转移，无需额外微调。
消融研究表明去除 Mamba 或 FiLM 会显著降低性能，凸显时空记忆和跨模态融合的重要性。

Figure 2: System architecture of REAL. Stage 1(Privileged Teacher Policy Learning) trains a privileged teacher policy via Proprioception-Terrain Associated Reasoning. Stage 2(Distillation Student Policy Learning) distills a deployable student policy using an onboard Mamba-FiLM spatial-temporal backb

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。