Skip to main content
QUICK REVIEW

[论文解读] Task-oriented grasping for dexterous robots using postural synergies and reinforcement learning

Dimitrios Dimou, José Santos-Victor|arXiv (Cornell University)|Feb 24, 2026
Robot Manipulation and Learning被引用 0
一句话总结

该论文提出了一种基于强化学习的任务导向抓取方法,使用通过 VAE 从人类抓取学习得到的姿态协同模型,使单一策略能够在基于抓取后意图的条件下抓取多种对象,成功率有所提升。

ABSTRACT

In this paper, we address the problem of task-oriented grasping for humanoid robots, emphasizing the need to align with human social norms and task-specific objectives. Existing methods, employ a variety of open-loop and closed-loop approaches but lack an end-to-end solution that can grasp several objects while taking into account the downstream task's constraints. Our proposed approach employs reinforcement learning to enhance task-oriented grasping, prioritizing the post-grasp intention of the agent. We extract human grasp preferences from the ContactPose dataset, and train a hand synergy model based on the Variational Autoencoder (VAE) to imitate the participant's grasping actions. Based on this data, we train an agent able to grasp multiple objects while taking into account distinct post-grasp intentions that are task-specific. By combining data-driven insights from human grasping behavior with learning by exploration provided by reinforcement learning, we can develop humanoid robots capable of context-aware manipulation actions, facilitating collaboration in human-centered environments.

研究动机与目标

  • Motivate humanoid grasping that aligns with human social norms and downstream task constraints.
  • Leverage human grasp data to inform robot hand postures through a synergy-based representation.
  • Train a single policy to generalize across objects and post-grasp intentions using reinforcement learning.
  • Demonstrate improved grasp success and human-like grasp configurations compared to baseline methods.

提出的方法

  • Retarget human grasps from the ContactPose dataset to the robotic hand via a fixed kinematic mapping.
  • Train a Variational Autoencoder (VAE) to learn a low-dimensional hand synergy space from retargeted grasps.
  • Train a single policy with PPO that outputs a hand synergy latent and an arm end-effector motion conditioned on a post-grasp intention.
  • Decode the synergy latent to finger joint values through the VAE to realize dexterous grasps.
  • Use a reward function combining proximity to a target grasp location, successful lifting, and rotation alignment to guide learning.
  • Evaluate against baselines including a policy using direct joint-space actions and a PCA-based synergy space.
Figure 3: Proposed agent structure for task-oriented grasping.
Figure 3: Proposed agent structure for task-oriented grasping.

实验结果

研究问题

  • RQ1Can a single policy learn to grasp multiple objects while conditioning on different post-grasp intentions?
  • RQ2Does a VAE-based synergy space yield more human-like and task-appropriate grasps than direct joint-space control or PCA-based synergies?
  • RQ3How does post-grasp intention influence grasp target selection and final hand-object positioning during execution?

主要发现

  • A VAE-based synergy space yields the highest grasp success rate among tested methods (83%).
  • The joint-action space policy achieves faster learning with higher interim rewards but lower final success than the VAE-based policy.
  • PCA-based synergy space reaches 71% success, showing degradation relative to the VAE approach.
  • Qualitatively, grasps produced via the VAE synergy space resemble human-like power grasps, unlike those from direct joint-space control.
  • Using object category as an observation does not reduce average success but is important for correct grasp targeting aligned with post-grasp intention.
  • In ablation, reducing latent dimensions below two significantly harms grasp success, while 2–5 latent dimensions perform comparably.
Figure 4: Rewards for training policies with 1) full joint control, 2) PCA synergy space, and 3) VAE synergy space. The thick line is the average among the two seeds and the shaded part denotes the standard deviation.
Figure 4: Rewards for training policies with 1) full joint control, 2) PCA synergy space, and 3) VAE synergy space. The thick line is the average among the two seeds and the shaded part denotes the standard deviation.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。