QUICK REVIEW

[論文レビュー] Task-oriented grasping for dexterous robots using postural synergies and reinforcement learning

Dimitrios Dimou, José Santos-Victor|arXiv (Cornell University)|Feb 24, 2026

Robot Manipulation and Learning被引用数 0

ひとこと要約

この論文は、人間の把握から学習した後屈連携モデルを用いたタスク指向把持に対する強化学習アプローチを提案し、VAEを介して後把持意図に基づく単一ポリシーで複数の物体を把持でき、成功率が向上します。

ABSTRACT

In this paper, we address the problem of task-oriented grasping for humanoid robots, emphasizing the need to align with human social norms and task-specific objectives. Existing methods, employ a variety of open-loop and closed-loop approaches but lack an end-to-end solution that can grasp several objects while taking into account the downstream task's constraints. Our proposed approach employs reinforcement learning to enhance task-oriented grasping, prioritizing the post-grasp intention of the agent. We extract human grasp preferences from the ContactPose dataset, and train a hand synergy model based on the Variational Autoencoder (VAE) to imitate the participant's grasping actions. Based on this data, we train an agent able to grasp multiple objects while taking into account distinct post-grasp intentions that are task-specific. By combining data-driven insights from human grasping behavior with learning by exploration provided by reinforcement learning, we can develop humanoid robots capable of context-aware manipulation actions, facilitating collaboration in human-centered environments.

研究の動機と目的

人間の社会的規範と下流タスク制約に整合したヒューマノイド把持を動機付ける。
連携ベースの表現を通じてロボットの手の姿勢を人間の把握データから活用する。
強化学習を用いて単一のポリシーを学習し、物体と後把持意図を横断して一般化する。
ベースライン手法と比較して、把持成功と人間らしい把握配置を改善する。

提案手法

ContactPoseデータセットからロボットハンドへのリターゲットを固定運動学マッピングで行う。
リターゲットされた把持から低次元の手の連携空間を学習するVAEを訓練する。
後把持意図を条件付けした手の連携潜在と腕の先端末の動きを出力するPPOを用いた単一ポリシーを訓練する。
VAEを介して指関節値を連携潜在からデコードし、高度な把持を実現する。
ターゲット把持位置への近さ、持ち上げ成功、回転整合性を組み合わせた報酬関数で学習を導く。
直接関節空間操作を用いるポリシーやPCAベースの連携空間を含むベースラインと比較評価する。

Figure 3: Proposed agent structure for task-oriented grasping.

実験結果

リサーチクエスチョン

RQ1単一のポリシーで、異なる後把持意図に条件付けて複数の物体を把持できるか？
RQ2VAEベースの連携空間は、直接関節空間制御やPCAベースの連携よりも人間らしくタスク適合な把持を生むか？
RQ3後把持意図は実行中の把持対象の選択と最終的な手-物体の位置決めにどのように影響するか？

主な発見

Method	Average grasp success rate
Joint action space	66%
PCA action space	71%
VAE action space (ours)	83%

VAEベースの連携空間は、試験した方法の中で最も高い把持成功率を示す（83%）。
関節空間の共同作用ポリシーは、学習が速く報酬の途中値は高いが最終的な成功率はVAEベースより低い。
PCAベースの連携空間は71%の成功率で、VAEアプローチと比較して劣る。
定性的には、VAE連携空間で生成された把持は、直接関節空間制御由来のものより人間的なパワーグラスに似ている。
物体カテゴリを観測として使用しても平均成功率を低下させず、 Post-grasp intention に整合した正しい把持 targeting に重要。
アブレーションでは、潜在次元を2未満に削減すると把持成功率が著しく低下する一方、2–5次元は同等の性能を示す。

Figure 4: Rewards for training policies with 1) full joint control, 2) PCA synergy space, and 3) VAE synergy space. The thick line is the average among the two seeds and the shaded part denotes the standard deviation.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。