QUICK REVIEW

[논문 리뷰] ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

Fei Xia, Chengshu Li|arXiv (Cornell University)|2020. 08. 18.

Reinforcement Learning in Robotics참고 문헌 58인용 수 38

한 줄 요약

ReLMoGen은 행동 공간을 모션 계획용 서브목표로 확장하여 모션 제너레이터와 강화 학습을 결합하고, 긴 지평선의 모바일 조작 태스크를 효율적으로 해결하며 모션 플래너 간의 강한 전달성(전이성)을 입증한다.

ABSTRACT

Many Reinforcement Learning (RL) approaches use joint control signals (positions, velocities, torques) as action space for continuous control tasks. We propose to lift the action space to a higher level in the form of subgoals for a motion generator (a combination of motion planner and trajectory executor). We argue that, by lifting the action space and by leveraging sampling-based motion planners, we can efficiently use RL to solve complex, long-horizon tasks that could not be solved with existing RL methods in the original action space. We propose ReLMoGen -- a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base. These problems are challenging because they are usually long-horizon, hard to explore during training, and comprise alternating phases of navigation and interaction. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. In all settings, ReLMoGen outperforms state-of-the-art Reinforcement Learning and Hierarchical Reinforcement Learning baselines. ReLMoGen also shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.

연구 동기 및 목표

모바일 조작 태스크에서 탐색과 긴 지평선 문제를 동기 부여하고 해결한다.
RL 루프 내에서 모션 제너레이터를 위한 행동을 서브목표로 올리는 프레임워크를 제안한다.
탐색, 인터랙티브 네비게이션, 및 모바일 조작 태스크 전반에서 향상된 성능과 샘플 효율성을 입증한다.

제안 방법

서브목표 a'가 모션 제너레이터(MG)를 안내하여 저수준 행동을 생성하도록 하는 lifted MDP를 도입한다.
연속형(SGP-R)과 이산형(SGP-D) 두 가지 서브목표 생성 정책 변형을 도입하고, 각각 SAC 또는 DQN으로 학습한다.
서브목표에 도달하기 위해 플래너(RRT-Connect 또는 PRM)와 궤적 제어기를 결합한 모션 제너레이터를 제시한다.
lifted 전이 및 보상 함수를 정의한다: MG는 저수준 행동의 시퀀스를 출력하고 R'은 시퀀스에 걸쳐 MG 보상을 누적한다.
RGB-D, LiDAR, 및 작업 정보를 바탕으로 서브목표를 예측하도록 SGP를 학습하여 네비게이션과 인터랙션을 위한 베이스 서브목표와 암 서브목표를 가능하게 한다.
재학습 없이 테스트 시 모션 플래너를 교체하여 전달 가능성을 시연한다.

실험 결과

연구 질문

RQ1ReLMoGen은 내비게이션과 조작을 포함한 광범위한 로봇 태스크를 해결할 수 있는가?
RQ2행동 공간을 서브목표로 올리는 것이 긴 지평선 모바일 조작 태스크의 탐색 및 샘플 효율성을 향상시키는가?
RQ3학습된 서브목표 생성 정책은 테스트 시 모션 플래너의 변화에 로버스트한가?
RQ4연속적 서브목표 매핑과 이산적 서브목표 매핑은 다양한 조작 수요를 가진 태스크에서 어떻게 비교되는가?

주요 결과

ReLMoGen은 일곱 가지 태스크에서 최첨단 RL 및 HRL 벤치마크 대비 높은 태스크 완수율을 달성한다.
ReLMoGen은 수렴 속도와 샘플 효율성이 더 우수하며, 그라디언트 업데이트 수가 적어 벽시계 시간 기준으로 종종 7배 더 빠른 학습을 보인다.
이 접근법은 고가치 영역이 유익한 인터랙션(예: 버튼, 찬장 문)과 정렬된 해석 가능한 서브목표 맵을 제공한다.
ReLMoGen은 테스트 시 모션 플래너를 다르게 바꿔도 성능 저하를 최소화하여 실제 로봇에 대한 강건성과 실용성을 보여준다.
SGP-D(이산 서브목표 맵)는 미세 제어가 필요한 태스크에서 우수하고, SGP-R(연속 서브목표 회귀)은 보다 넓은 네비게이션 및 인터랙션 시나리오에서 뛰어나다.
탐색 분석에서 ReLMoGen은 의미 있는 인터랙션을 탐색하고 순수하게 행동 공간 RL 벤치마크보다 더 큰 물리적·잠재 상태 영역을 커버한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.