QUICK REVIEW

[论文解读] Task-Level Decisions to Gait Level Control: A Hierarchical Policy Approach for Quadruped Navigation

Sijia Li, Haoyu Wang|arXiv (Cornell University)|Mar 6, 2026

Robotic Locomotion and Control被引用 0

一句话总结

一个分层的 TDGC 框架将高层任务策略与基于步态的低层控制器耦合，以在混合地形和分布外地形上实现鲁棒四足导航，辅以以性能为驱动的课程学习。

ABSTRACT

Real-world quadruped navigation is constrained by a scale mismatch between high-level navigation decisions and low-level gait execution, as well as by instabilities under out-of-distribution environmental changes. Such variations challenge sim-to-real transfer and can trigger falls when policies lack explicit interfaces for adaptation. In this paper, we present a hierarchical policy architecture for quadrupedal navigation, termed Task-level Decision to Gait Control (TDGC). A low-level policy, trained with reinforcement learning in simulation, delivers gait-conditioned locomotion and maps task requirements to a compact set of controllable behavior parameters, enabling robust mode generation and smooth switching. A high-level policy makes task-centric decisions from sparse semantic or geometric terrain cues and translates them into low-level targets, forming a traceable decision pipeline without dense maps or high-resolution terrain reconstruction. Different from end-to-end approaches, our architecture provides explicit interfaces for deployment-time tuning, fault diagnosis, and policy refinement. We introduce a structured curriculum with performance-driven progression that expands environmental difficulty and disturbance ranges. Experiments show higher task success rates on mixed terrains and out-of-distribution tests.

研究动机与目标

在现实世界的四足导航中缓解高层导航决策与低层步态执行之间的尺度不匹配。
为部署时的调试、故障诊断和策略改进提供显式接口。
在不依赖密集地图或高分辨率地形重建的情况下实现鲁棒的长时域导航。
通过结构化课程，提升在混合地形和分布外地形上的训练效率与泛化能力。

提出的方法

提出一个同步的分层策略系统，通过显式的跨层接口将任务层决策与步态层执行耦合起来。
开发一个步态条件化的低层控制器，将紧凑的行为参数映射到跨多个步态（tro t、pronk、pace、bound）的可执行关节层目标。
设计一个高层策略，利用稀疏地形线索输出紧凑的行为参数向量；解码器将其转译为可执行的低层指令。
在仿真中训练低层策略以学习步态条件化的机动和对指令的鲁棒跟踪；在冻结的低层执行器之上使用强化学习对高层策略进行训练。
采用结构化的课程学习，基于性能驱动的推进来扩展环境难度和干扰范围，从而提升跨地形的鲁棒性。

实验结果

研究问题

RQ1一个具有显式跨层接口的分层策略是否能够在不进行密集地形重建的情况下改善混合地形上的长期导航性能？
RQ2步态条件化的低层控制是否能够实现更平滑的模式切换和更好的干扰拒绝，同时在任务层仍具备学习性？
RQ3基于性能驱动的课程训练对四足导航的训练效率和跨地形泛化有何影响？

主要发现

在五种地形族的最困难地形等级（等级6–10）上，平均成功率为87.4%。
TDGC在困难地形上比基线 GP 策略产生更平滑、更加一致的轨迹和更具目标导向的行为。
分层控制器产生可解释的任务到步态决策，例如用于上楼时的 trot 和跨越缝隙时的 bound，从而实现可诊断和可部署的行为。
通过结构化课程训练和跨层接口，该框架在分布外地形上表现出鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。