QUICK REVIEW

[论文解读] Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

Nico Gürtler, Sebastian Blaes|arXiv (Cornell University)|Jul 28, 2023

Reinforcement Learning in Robotics被引用 11

一句话总结

本论文引入真实机器人灵巧操控数据集（Push 和 Lift），在 TriFinger 平台上收集，用于离线 RL 基准测试，强调仿真数据与真实数据之间的性能差距以及亚最优轨迹的影响。

ABSTRACT

Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

研究动机与目标

提供真实机器人灵巧操控数据集用于离线 RL 基准测试。
在仿真和真实的 TriFinger 数据上比较离线 RL 算法。
分析数据质量、亚最优轨迹与离线 RL 性能的 sim-to-real 差距。
为未来研究提供一个可访问的远程评估设置。

提出的方法

在表征域随机化的仿真中针对两个任务（Push 与 Lift）通过在线 RL 收集专家策略数据，在 TriFinger 上。
利用域随机化的 GPU 加速并行仿真来训练专家策略，以实现 sim-to-real 转移。
创建多个数据集变体（Expert、Half-Expert、Weak&Expert、Mixed），公布真实机器人和仿真数据集。
在这些数据集上对 d3rlpy 的开源离线 RL 算法（BC、CRR、AWAC、CQL、IQL）进行基线评测，保持固定超参数和基于种子的评估。
提供评估协议，包括对真实机器人集群的远程访问以及基于 PyBullet 的仿真器，以实现可重复测试。

实验结果

研究问题

RQ1最先进的离线 RL 算法在真实机器人灵巧操控数据上相比仿真数据表现如何？
RQ2数据质量（专家 vs 混合 vs 弱数据）对 Push 和 Lift 任务的离线 RL 性能有何影响？
RQ3亚最优轨迹的存在如何影响真实数据与仿真数据上的离线 RL 学习和策略质量？
RQ4延迟、噪声和真实世界接触动力学在多大程度上解释了仿真与真实性能之间的差距？
RQ5在仿真数据上离线训练的策略能否泛化到未见的真实硬件实例？

主要发现

数据	BC	CRR	AWAC	CQL	IQL
Push-Sim-Expert	0.95	0.83±0.02	0.94±0.04	0.92±0.03	0.03±0.01	0.88±0.04
Push-Sim-Half-Expert	0.95	0.71±0.05	0.79±0.05	0.79±0.02	0.05±0.02	0.70±0.06
Push-Sim-Weak&Expert	0.53	0.53±0.09	0.88±0.03	0.83±0.05	0.17±0.03	0.66±0.14
Push-Sim-Mixed	0.76	0.53±0.04	0.09±0.10	0.84±0.06	0.02±0.01	0.69±0.07
Push-Real-Expert	0.92	0.74±0.05	0.87±0.07	0.80±0.03	0.54±0.13	0.75±0.08
Push-Real-Half-Expert	0.92	0.66±0.08	0.78±0.04	0.76±0.10	0.48±0.08	0.70±0.08
Push-Real-Weak&Expert	0.51	0.48±0.10	0.84±0.06	0.69±0.06	0.14±0.04	0.68±0.05
Push-Real-Mixed	0.49	0.29±0.06	0.30±0.06	0.61±0.09	0.02±0.02	0.66±0.08

离线 RL 方法在 Push 数据集上总体表现良好，但真实机器人数据相对于仿真数据存在性能差距。
CQL 在 Push 仿真上表现较差，但在真实数据上改善，表明真实环境中的数据分布更广。
CRR 和 AWAC 在这些数据集上通常优于其他方法，IQL 在超参数调整后具备竞争力。
在 Lift 上，即使优化后 CQL 也难以有效学习，真实机器人数据对专家水平的差距比仿真数据更大。
亚最优轨迹会分散离线 RL 算法，降低成功率，特别是在 Weak&Expert 数据的 Lift 上。
在真实数据上训练的策略相对于专家数据的性能差距比在仿真环境中更大，表明真实世界动力学是一个关键挑战。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。