[论文解读] UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data
论文提出 UltraDexGrasp 数据生成流水线,生成 UltraDexGrasp-20M,用于通用的 Dexterous 双臂抓取,以及基于点云的策略实现强鲁棒的仿真到现实转移和实际环境81.2% 的抓取成功率。
Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.
研究动机与目标
- 为双臂机器人在多种对象尺寸和形状下的通用灵活抓取提供动力
- 通过将基于优化的合成与基于规划的示范整合,创建大规模多策略数据集
- 开发一个简单而鲁棒的策略,仅使用合成数据即可对未见对象进行泛化
- 展示强大的仿真到现实转移能力和对多样抓取策略的现实鲁棒性
提出的方法
- 将基于优化的抓取综合与基于规划的示范生成相结合,以产生高质量、多样化的双臂抓取
- 生成 UltraDexGrasp-20M:在1,000个对象、跨多种抓取策略(双手、整手、两指夹、三指三点抓)下的2000万帧
- 提出基于点云的通用抓取策略,使用 PointNet++ 风格特征对场景进行编码,并采用仅解码器的 Transformer,使用单向注意力来预测有界高斯动作分布
- 以非线性双层抓取综合问题为解,低层使用用于接触力的 QP,高层通过梯度更新手部姿态,采用 cuRobo 和 GPU 加速求解器
- 使用四阶段示范生成流水线(预抓取、抓取、挤压、提升)以为双臂操作生成协同、无碰撞的轨迹
实验结果
研究问题
- RQ1 UltraDexGrasp-20M 是否能使通用的灵活抓取策略在不同对象形状、尺寸和重量上实现泛化?
- RQ2所提出的策略在仿真与真实世界设置下与基线 DP3 和 DexGraspNet 的比较结果如何?
- RQ3训练数据量对策略性能与对未见对象的泛化有何影响?
- RQ4策略中的关键设计选择(有界高斯动作分布、单向注意力)是否提升抓取成功率?
- RQ5在不进行任务特定微调的情况下,用合成数据训练的策略在现实场景中的转移表现如何?
主要发现
| 基准 | 对象尺寸 | DP3 | DexGraspNet | Ours |
|---|---|---|---|---|
| Seen Object | Small | 41.7 | 45.6 | 78.8 |
| Seen Object | Medium | 54.3 | 72.0 | 84.3 |
| Seen Object | Large | 48.5 | - | 90.4 |
| Unseen Object | Small | 37.4 | 45.6 | 76.9 |
| Unseen Object | Medium | 50.1 | 72.0 | 85.8 |
| Unseen Object | Large | 48.1 | - | 87.5 |
| Average | - | 46.7 | 58.8 | 84.0 |
- 在仿真中,基于 UltraDexGrasp-20M 的策略对 Seen 与 Unseen 对象的平均成功率为 84.0%(共 600 个对象)。
- 未见对象的平均抓取成功率为 83.4%,表明对新形状和重量具有较强的泛化能力。
- 在仿真中,所提策略平均比 DP3 高出 37.3 个百分点(84.0% 对 46.7%),并在平均水平上超越 DexGraspNet。
- 在真实世界部署中,对测试对象的平均成功率为 81.2%,显示出稳健的零-shot 仿真到现实转移。
- 消融研究显示有界高斯动作预测和单向注意力各自对性能有显著提升(相对提升超过 10%)。
- 性能随训练数据量增加而提升;超过 1M 帧后,所学策略超越数据生成基线。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。