QUICK REVIEW

[논문 리뷰] SmoothTurn: Learning to Turn Smoothly for Agile Navigation with Quadrupedal Robots

Zunzhi You, Haolan Guo|arXiv (Cornell University)|2026. 03. 13.

Robotic Path Planning Algorithms인용 수 0

한 줄 요약

SmoothTurn은 순차적 목표 도달 보상을 사용하고 선행 관측 및 자동 커리큘럼을 통해 연속 로컬 내비게이션을 형성하고, 고속에서 원활하게 선회를 학습함으로써 시뮬레이션 및 실제 4족 보행에서 단일 목표 베이스라인보다 더 빠른 이동과 매끄러운 전이를 달성합니다.

ABSTRACT

Quadrupedal robots show great potential for valuable real-world applications such as fire rescue and industrial inspection. Such applications often require urgency and the ability to navigate agilely, which in turn demands the capability to change directions smoothly while running in high speed. Existing approaches for agile navigation typically learn a single-goal reaching policy by encouraging the robot to stay at the target position after reaching there. As a result, when the policy is used to reach sequential goals that require changing directions, it cannot anticipate upcoming maneuvers or maintain momentum across the switch of goals, thereby preventing the robot from fully exploiting its agility potential. In this work, we formulate the task as sequential local navigation, extending the single-goal-conditioned local navigation formulation in prior work. We then introduce SmoothTurn, a learning-based control framework that learns to turn smoothly while running rapidly for agile sequential local navigation. The framework adopts a novel sequential goal-reaching reward, an expanded observation space with a lookahead window for future goals, and an automatic goal curriculum that progressively expands the difficulty of sampled goal sequences based on the goal-reaching performance. The trained policy can be directly deployed on real quadrupedal robots with onboard sensors and computation. Both simulation and real-world empirical results show that SmoothTurn learns an agile locomotion policy that performs smooth turning across goals, with emergent behaviors such as controlling momentum when switching goals, facing towards the future goal in advance, and planning efficient paths. We have provided video demos of the learned motions in the supplementary materials. The source code and trained policies will be made available upon acceptance.

연구 동기 및 목표

잡환경에서 4족 로봇의 민첩한 내비게이션을 가능하게 하여 지역 목표의 연속적인 선회에서 매끄러운 회전을 구현합니다.
연속 로컬 내비게이션을 수립하여 연속 목표 간의 모멘텀 및 방향 변화 문제를 다룹니다.
연속 보상, 선행 관측, 자동 커리큘럼으로 학습시키는 강화학습 프레임워크를 개발하여 매끄러운 회전 동작을 훈련합니다.
시뮬레이션과 실제 Unitree Go2 로봇 실험에서 단일 목표 베이스라인과 비교합니다.
목표 전환 중 모멘텀 제어 및 사전 방향 정렬과 같은 현 emergent 행동에 대한 통찰을 제공합니다.

제안 방법

연속 로컬 내비게이션을 구성하되, 목표 시퀀스의 정렬된 로컬 목표와 느슨한 다중 임계 도달 조건으로 목표 간 연속적 행동을 가능하게 합니다.
목표 시퀀스 전체를 통해 점진적 진행을 할당하고 정지-출발 행위를 억제하는 새로운 연속 목표 달성 보상을 도입합니다.
향후 목표의 예측 창을 통해 궤적 인식 제어 및 모멘텀 관리를 가능하게 하는 관측을 보강합니다.
롤링 성공률에 기반해 목표 거리와 회전 난이도를 확장하는 자동 목표 커리큘럼을 구현하여 학습을 안정화합니다.
Isaac Gym에서 PPO로 학습된 RL 정책의 입력으로 47차원의 고유수용 피부(backbone)과 n개의 선행 목표(주 설정에서 n=2)를 사용하고, 구동은 PD 제어기를 활용합니다.
시뮬레이션과 실제 Unitree Go2에서 네 가지 연속 회전 작업에 대해 단일 목표 베이스라인 대비 평가합니다.

Figure 1: Composited images of SmoothTurn deployed on a Unitree Go2 performing agile navigation in an indoor office environment. The learned policy enables the robot to maintain momentum and high speed while executing turns rapidly through corridors and corners.

실험 결과

연구 질문

RQ1연속 로컬 내비게이션을 어떻게 형성하여 빠른 속도를 유지하면서 연속 로컬 목표 시퀀스에서 매끄러운 회전을 달성할 수 있을까?
RQ2선행 관측과 결합된 연속 목표 달성 보상이 단일 목표 정책 대비 더 매끄러운 회전과 더 빠른 이동을 가져오는가?
RQ3자동 커리큘럼과 선행 창이 학습 효율성 및 emergent 회전 행동에 어떤 영향을 미치는가?
RQ4시뮬레이션에서 실제 4족 로봇으로 정책을 전이할 수 있으며 실제 내비게이션 작업에서 베이스라인을 능가하는가?

주요 결과

SmoothTurn은 시뮬레이션에서 여러 회전 시퀀스에 걸쳐 단일 목표 베이스라인을 능가하며 추락률을 낮추고 성공률을 높인 채 속도를 유지합니다.
적절한 임계값 설정으로 SmoothTurn은 모멘텀을 유지하고 목표 시퀀스를 더 빠르게 완료하는 경향이 있으며, 특히 좁거나 급한 회전 중에 더 그렇습니다.
-relaxed 목표 달성 조건 역시 SmoothTurn 변형에서 높은 성공률을 보이며, 다가오는 목표를 향한 예측적 헤딩이 완료 시간을 추가로 줄일 수 있음을 보여줍니다.
작은 선행 창(n=2)과 에피소드당 2개의 목표로 학습하는 것이 거의 최적 수준의 성능에 충분하며, 더 큰 선행 창이나 학습 수는 수익 감소를 보입니다.
Unitree Go2에서의 실제 실험은 시뮬레이션 결과를 확인시키며, 네 가지 회전 작업 모두에서 Basel ine보다 짧은 이동 시간을 달성합니다.
핵심 emergent 행동에는 회전 중 모멘텀 유지, 다가오는 목표를 미리 향하는 방향 설정, 매끄러운 전이로의 허용 오차를 활용하는 효율적 경로 계획이 포함됩니다.

Figure 2: Overview of the SmoothTurn framework. The Goal Sampler generates a sequence of segment goals based on the curriculum. The Command Updater advances the goal index upon goal reaching and provides the pose of current and future goals in the robot base frame. The Policy takes the commands and

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.