QUICK REVIEW

[논문 리뷰] Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios

Tingxiang Fan, Pinxin Long|arXiv (Cornell University)|2018. 08. 11.

Reinforcement Learning in Robotics참고 문헌 11인용 수 69

한 줄 요약

이 논문은 다중 시나리오 다단계 심층 강화 학습으로 훈련된 다로봇 시스템용 완전 분산 센서 레벨 충돌 회피 정책을 제시하며, 하이브리드 제어 프레임워크에 통합하고 시뮬레이션 및 실제 상황에서 밀집한 군중과 대규모 로봇 팀을 포함하여 검증합니다.

ABSTRACT

In this paper, we present a decentralized sensor-level collision avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent's steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy's robustness and effectiveness. We validate the learned sensor-level collision avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller's robustness against the sim-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution to the safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. Videos are available at https://sites.google.com/view/hybridmrca

연구 동기 및 목표

부분 관측성 하에서 분산된 다로봇 시스템에서 안전하고 효율적인 충돌 회피의 도전을 해결한다.
로봇 간 통신 없이 원시 센서 데이터를 속도 명령으로 매핑하는 정책을 개발한다.
학습된 정책의 강건성 및 실제 로봇과 복잡한 시나리오에 대한 전이성을 향상시킨다.
분산 탐색 방법과 중앙 집중식 내비게이션 방법 간의 성능 격차를 줄인다.

제안 방법

온보드 센서 측정치를 로봇 간에 공유되는 정책을 사용하여 속도 명령으로 매핑하는 완전한 분산 정책을 제안한다.
다중 시나리오 다단계 강화 학습 프레임워크와 정책 기울기 업데이트로 시뮬레이션에서 정책을 훈련한다.
학습된 정책을 전통 컨트롤러와 결합한 하이브드 제어 아키텍처를 도입하여 간단하거나 출현하는 시나리오에서의 성능을 보완한다.
2D 레이저 스캐너, 상대 목표 위치, 현재 속도를 입력으로 사용하여 액션 샘플링의 속도 평균을 출력하는 신경망.
다양한 로봇과 최대 100대의 대규모 배치를 포함한 시뮬레이션 및 실제 실험에서 정책을 학습하고 검증한다.

실험 결과

연구 질문

RQ1인터로봇 간 통신 없이 완전한 분산 센서 수준 정책이 안전하고 효율적인 내비게이션을 달성할 수 있는가?
RQ2풍부한 시뮬레이션 환경에서 학습된 정책이 미지의 현실 세계 및 대규모 시나리오에 얼마나 일반화되는가?
RQ3학습된 정책을 전통 제어와 통합하는 하이브드 제어가 안전성과 강인성을 향상시키는가?
RQ4부분 관측성과 센서 노이즈가 다로봇 시스템의 충돌 회피 성능에 어떤 영향을 미치는가?

주요 결과

분산 센서 수준의 충돌 회피 정책은 원시 센서를 직접 매핑하고 로봇 간 통신 없이 작동한다.
다중 시나리오 다단계 학습 프레임워크는 서로 다른 시나리오에 일반화되는 정책을 도출하며, 이에는 이종 로봇과 대규모 그룹이 포함된다.
학습된 정책과 전통 제어를 결합한 하이브드 제어는 복잡한 작업에서 강건성과 안전성을 향상시킨다.
학습된 정책은 광범위한 매개변수 조정 없이 물리적 로봇에 배포될 수 있으며 밀집 군중으로의 전이도 가능하다.
실험은 대규모 로봇 무리에 대한 확장성과 창고 유사 환경에서의 사전 구축된 인프라 없이도 효과를 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.