[论文解读] Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios
本论文提出了一种完全去中心化的传感器级碰撞避免策略,适用于多机器人系统,通过多场景多阶段深度强化学习训练,在混合控制框架中实现集成,并在仿真和真实场景中进行验证,包括密集人群和大规模机器人队伍。
In this paper, we present a decentralized sensor-level collision avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent's steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy's robustness and effectiveness. We validate the learned sensor-level collision avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller's robustness against the sim-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution to the safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. Videos are available at https://sites.google.com/view/hybridmrca
研究动机与目标
- 在部分可观测条件下解决去中心化多机器人系统中安全高效碰撞避免的挑战。
- 开发一个将原始传感器数据映射到速度指令的策略,而无需机器人之间的通信。
- 提高学习策略对真实机器人和复杂场景的鲁棒性和可转移性。
- 缩小去中心化与集中化导航方法之间的性能差距。
提出的方法
- 提出一个完全去中心化的策略,使用在车载传感器测量值在机器人之间共享的策略将其映射到速度指令。
- 在仿真中使用多场景多阶段强化学习框架和策略梯度更新对策略进行训练。
- 结合混合控制架构,将学习到的策略与传统控制器结合,以应对简单或紧急场景。
- 以二维激光扫描仪、相对目标位置和当前速度作为输入输入到神经网络,输出动作采样的速度均值。
- 在仿真和实际实验中对策略进行训练与验证,包括异构机器人和高达100个机器人的大规模部署。
实验结果
研究问题
- RQ1完全去中心化的传感器级策略是否能够在不进行机器人间通信的情况下实现安全高效导航?
- RQ2在丰富的仿真环境中训练的策略在未见过的真实世界和大规模场景中的泛化能力如何?
- RQ3将学习策略与传统控制结合(混合控制)是否能提高安全性和鲁棒性?
- RQ4部分可观测性和传感器噪声对多机器人系统的碰撞避免性能有何影响?
主要发现
- 一个去中心化的传感器级碰撞避免策略可以直接将原始传感器数据映射到转向指令,且无需机器人之间的通信。
- 多场景多阶段训练框架产生的策略能够泛化到未见过的场景,包括异构机器人和大规模群体。
- 将学习策略与传统控制器相结合的混合控制在复杂任务中提升了鲁棒性和安全性。
- 学习得到的策略可以在物理机器人上部署且无需大量参数调优,并且能够转移到密集人群情景。
- 实验表明该策略在大规模机器人编队和类似仓库的环境中具有可扩展性且在无预建基础设施的情况下有效。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。