QUICK REVIEW

[論文レビュー] UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

Sizhe Yang, Yiman Xie|arXiv (Cornell University)|Mar 5, 2026

Robot Manipulation and Learning被引用数 0

ひとこと要約

The paper proposes UltraDexGrasp, a data-generation pipeline producing UltraDexGrasp-20M for universal dexterous bimanual grasping, and a point-cloud-based policy achieving strong sim-to-real transfer and 81.2% real-world success.

ABSTRACT

Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.

研究の動機と目的

Motivate universal dexterous grasping for bimanual robots across multiple object sizes and shapes.
Create a large-scale, multi-strategy dataset by integrating optimization-based synthesis with planner-based demonstrations.
Develop a simple, robust policy that generalizes to unseen objects using synthetic data only.
Demonstrate strong sim-to-real transfer and real-world robustness for diverse grasp strategies.

提案手法

Integrates optimization-based grasp synthesis with planning-based demonstration generation to produce high-quality, diverse bimanual grasps.
Generates UltraDexGrasp-20M: 20 million frames over 1,000 objects across multiple grasp strategies (two-handed, whole-hand, two-finger pinch, three-finger tripod).
Proposes a point-cloud based universal grasp policy that encodes scenes with PointNet++-style features and uses a decoder-only transformer with unidirectional attention to predict bounded Gaussian action distributions.
Solves a nonlinear bilevel grasp synthesis problem with a lower-level QP for contact forces and an upper-level gradient-based update of hand poses, using cuRobo and GPU-accelerated solvers.
Uses a four-stage demonstration generation pipeline (pregrasp, grasp, squeeze, lift) to produce coordinated, collision-free trajectories for dual-arm manipulation.

実験結果

リサーチクエスチョン

RQ1Can UltraDexGrasp-20M enable a universal dexterous grasp policy to generalize across diverse object shapes, sizes, and weights?
RQ2How does the proposed policy compare to baselines DP3 and DexGraspNet in simulation and real-world settings?
RQ3What is the impact of training data volume on policy performance and generalization to unseen objects?
RQ4Do key design choices in the policy (bounded Gaussian action distribution, unidirectional attention) improve grasp success?
RQ5How well does the policy trained on synthetic data transfer to real-world scenarios without task-specific fine-tuning?

主な発見

Benchmark	Object Size	DP3	DexGraspNet	Ours
Seen Object	Small	41.7	45.6	78.8
Seen Object	Medium	54.3	72.0	84.3
Seen Object	Large	48.5	-	90.4
Unseen Object	Small	37.4	45.6	76.9
Unseen Object	Medium	50.1	72.0	85.8
Unseen Object	Large	48.1	-	87.5
Average	-	46.7	58.8	84.0

Policy trained on UltraDexGrasp-20M achieves 84.0% average success in simulation across seen and unseen objects (600 objects total).
Unseen objects are grasped with 83.4% success on average, indicating strong generalization to novel shapes and weights.
In simulation, the proposed policy outperforms DP3 by 37.3 percentage points on average (84.0% vs 46.7%), and surpasses DexGraspNet on average.
Real-world deployment yields 81.2% average success across tested objects, demonstrating robust zero-shot sim-to-real transfer.
Ablation studies show bounded Gaussian action prediction and unidirectional attention each contribute to substantial performance gains (over 10% relative).
Performance scales with more training data; beyond 1M frames, the learned policy surpasses data-generation baselines.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。