QUICK REVIEW

[論文レビュー] SmoothTurn: Learning to Turn Smoothly for Agile Navigation with Quadrupedal Robots

Zunzhi You, Haolan Guo|arXiv (Cornell University)|Mar 13, 2026

Robotic Path Planning Algorithms被引用数 0

ひとこと要約

SmoothTurnは連続的な目標達成報酬、先読み観測、連続目標の学習カリキュラムを用いて、連続的な局所ゴール間での momentum の変化を抑えつつ高速なナビゲーションと滑らかな転換を学習し、シミュレーションと実機の四足歩行体で単一ゴールベースラインよりも高速で滑らかな移動を達成する。

ABSTRACT

Quadrupedal robots show great potential for valuable real-world applications such as fire rescue and industrial inspection. Such applications often require urgency and the ability to navigate agilely, which in turn demands the capability to change directions smoothly while running in high speed. Existing approaches for agile navigation typically learn a single-goal reaching policy by encouraging the robot to stay at the target position after reaching there. As a result, when the policy is used to reach sequential goals that require changing directions, it cannot anticipate upcoming maneuvers or maintain momentum across the switch of goals, thereby preventing the robot from fully exploiting its agility potential. In this work, we formulate the task as sequential local navigation, extending the single-goal-conditioned local navigation formulation in prior work. We then introduce SmoothTurn, a learning-based control framework that learns to turn smoothly while running rapidly for agile sequential local navigation. The framework adopts a novel sequential goal-reaching reward, an expanded observation space with a lookahead window for future goals, and an automatic goal curriculum that progressively expands the difficulty of sampled goal sequences based on the goal-reaching performance. The trained policy can be directly deployed on real quadrupedal robots with onboard sensors and computation. Both simulation and real-world empirical results show that SmoothTurn learns an agile locomotion policy that performs smooth turning across goals, with emergent behaviors such as controlling momentum when switching goals, facing towards the future goal in advance, and planning efficient paths. We have provided video demos of the learned motions in the supplementary materials. The source code and trained policies will be made available upon acceptance.

研究の動機と目的

clutteredな環境下で四足歩行ロボットの機動性あるナビゲーションを可能にし、局所ゴール列を跨ぐ滑らかな転換を実現する。
連続する局所ゴール間の勢いと方向変化を扱うための連続的局所ナビゲーションを定式化する。
滑らかな転換運動を学習するための連続報酬・先読み観測・自動カリキュラムを備えた強化学習フレームワークを開発する。
シミュレーションと実機（Unitree Go2）での実験を通じてアプローチを示し、単一ゴールベースラインと比較する。
ゴール遷移時のモーメンタム制御や事前の方位合わせなどの出現的な挙動について洞察を提供する。

提案手法

局所ゴールの順序付き列と緩い多閾値到達条件を用いて、ゴール間の連続運動を可能にする連続的局所ナビゲーションを定式化する。
全体のゴール列を通じた徐々の進捗を割り当て、停止と発進を抑制する新規の連続ゴール到達報酬を導入する。
将来のゴールの先読みウィンドウを観測に加え、軌道認識制御とモーメンタム管理を可能にする。
ローリング成功率に基づいてゴール距離と転向難易度を拡張する自動ゴールカリキュラムを実装し、学習を安定化させる。
Isaac GymでPPOにより訓練されたRLポリシーへ入力として、47次元の固有覚（プロプリオセプティブ）バックボーン＋ nゴールの先読み窓（主要設定ではn=2）を用い、アクチュエーションにはPDコントローラを用いる。
シミュレーションと実機のUnitree Go2で、4つの連続回転タスクを対象に単一ゴールベースラインと比較評価する。

Figure 1: Composited images of SmoothTurn deployed on a Unitree Go2 performing agile navigation in an indoor office environment. The learned policy enables the robot to maintain momentum and high speed while executing turns rapidly through corridors and corners.

実験結果

リサーチクエスチョン

RQ1連続する局所ゴール列を跨いで高速かつ滑らかな転換を実現するには、連続的局所ナビゲーションをどのように定式化すべきか？
RQ2連続ゴール到達報酬と先読み観測を組み合わせると、単一ゴールポリシーと比べて転換が滑らかで移動が速くなるか？
RQ3自動カリキュラムと先読み窓が学習効率と出現的な転換挙動に与える影響は？
RQ4学習済みポリシーはシミュレーションから実機へ転移でき、実世界のナビゲーションタスクでベースラインを上回るか？

主な発見

(εxy, εθ) (εxy, εθ)	Policy	FR(%)	SR(%)	Time(s)
(0.5, π/3) (0.5, π/3)	Baseline	18.0	82.0	4.16±0.13
(0.5, π/3) (0.5, π/3)	SmoothTurn	0.4	99.6	3.74±0.08
(0.1, π/36) (0.5, π/3)	Baseline	42.6	57.4	4.57±0.10
(0.1, π/36) (0.5, π/3)	SmoothTurn	0.2	99.8	4.03±0.08
(0.2, π/6) (0.2, π/6)	Baseline	93.9	6.1	5.06±0.29
(0.2, π/6) (0.2, π/6)	SmoothTurn	6.8	93.2	4.50±0.12
(0.2, π/6) (0.5, π/3)	Baseline	13.9	86.1	5.87±0.28
(0.2, π/6) (0.5, π/3)	SmoothTurn	0.2	99.8	4.43±0.15

SmoothTurnはシミュレーションの複数の回転列で単一ゴールベースラインを上回り、転落率を低く保ちつつ成功率を高め、速度を維持。
適切な閾値設定の下、SmoothTurnはモーメンタムを維持し、特にタイトまたは急な転回時にベースラインよりもゴール列を速く完了させる。
緩いゴール到達条件を適用してもSmoothTurn変種で高い成功率を維持でき、 upcoming goals への予想的なヘディングが完了時間をさらに短縮できることを示す。
エピソードあたり2ゴールの小さな先読み窓と2ゴールずつの訓練でほぼ最適性能を達成可能であり、先読みの増大や訓練回数の増加は収益減少を示す。
Unitree Go2での実機実験はシミュレーションの成果を裏付け、4つの回転タスク全てでベースラインより短い移動時間を達成。
核心となる出現的挙動には、転回中のモーメンタム維持、今後のゴールへ事前に向く向き、滑らかな遷移のための許容差を活用した効率的経路の計画が含まれる。

Figure 2: Overview of the SmoothTurn framework. The Goal Sampler generates a sequence of segment goals based on the curriculum. The Command Updater advances the goal index upon goal reaching and provides the pose of current and future goals in the robot base frame. The Policy takes the commands and

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。