QUICK REVIEW

[論文レビュー] SPARR: Simulation-based Policies with Asymmetric Real-world Residuals for Assembly

Yijie Guo, Iretiayo Akinola|arXiv (Cornell University)|Feb 26, 2026

Robot Manipulation and Learning被引用数 0

ひとこと要約

SPARRはシミュレーションで訓練したベース方針と現実世界の視覚条件付き残差を組み合わせて組立タスクを自動適応させ、サイクル時間を短縮しつつ現実世界でほぼ完璧な成功を達成します。

ABSTRACT

Robotic assembly presents a long-standing challenge due to its requirement for precise, contact-rich manipulation. While simulation-based learning has enabled the development of robust assembly policies, their performance often degrades when deployed in real-world settings due to the sim-to-real gap. Conversely, real-world reinforcement learning (RL) methods avoid the sim-to-real gap, but rely heavily on human supervision and lack generalization ability to environmental changes. In this work, we propose a hybrid approach that combines a simulation-trained base policy with a real-world residual policy to efficiently adapt to real-world variations. The base policy, trained in simulation using low-level state observations and dense rewards, provides strong priors for initial behavior. The residual policy, learned in the real world using visual observations and sparse rewards, compensates for discrepancies in dynamics and sensor noise. Extensive real-world experiments demonstrate that our method, SPARR, achieves near-perfect success rates across diverse two-part assembly tasks. Compared to the state-of-the-art zero-shot sim-to-real methods, SPARR improves success rates by 38.4% while reducing cycle time by 29.7%. Moreover, SPARR requires no human expertise, in contrast to the state-of-the-art real-world RL approaches that depend heavily on human supervision.

研究の動機と目的

接触を伴うタスクにおける sim-to-real ギャップに対処し、堅牢なロボット組立を動機づける。
シミュレーションベースと実世界の残差を用いてダイナミクスと知覚変動に適応するハイブリッド方針を提案する。
人間の監督を必要とせず自律的な現実世界適応を可能にする。
姿勢推定ノイズへ頑健で、見慣れないタスクへの一般化を示す。

提案手法

PPOと密な模倣報酬を用いてシミュレーションで状態ベースのベース方針を事前学習する。
現実世界では姿勢推定を用いてゴール姿勢を設定し、状態推定ノイズを導入してベース方針を展開する。
視覚条件付きの残差方針を導入し、増分的な姿勢補正を出力する。
実世界でスパース報酬とRLPDを用い、ベース方針のロールアウトから収集したデモで残差方針を訓練する。
各タイムステップでベースと残差の行動を足し合わせて最終行動を形成する。
成功経路でシードされたデモンストレーションバッファを用い、訓練中により高品質な経験で更新する。

実験結果

リサーチクエスチョン

RQ1SPARRは人間の監督なしで現実世界の組立タスクに対してシミュレーション訓練済み方針をほぼ完璧に適応できるか。
RQ2SPARRは現実世界の展開時に姿勢変動と姿勢推定誤差に対して頑健か。
RQ3SPARRはAutoMateライクなタスクから見知らぬNIST組立タスクへ一般化できるか。
RQ4ベース方針を残差方釈への入力として含めることが適応性能に与える影響は何か。

主な発見

SPARRは人間の監督なしで2部品組立タスクに対して現実世界で95–100%の成功率を達成。
Zero-shotのAutoMateライクなベースラインと比較して、SPARRは成功率を38.4%向上させ、サイクル時間を29.7%短縮。
ソケットの姿勢変動と姿勢推定ノイズに対して頑健で、状態ベースの残差方針を上回る。
SPARRは見慣れないNISTタスクへ一般化し、成功率とサイクル時間の改善が顕著（タスク間でそれぞれ74.5%と36.5%の改善）。
残差方針への入力としてベース方針を含めることで意味のあるコンテキストを提供し、それなしの派生案より性能が向上。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。