[論文レビュー] Self-Curriculum Model-based Reinforcement Learning for Shape Control of Deformable Linear Objects
The paper presents a two-stage framework that combines model-based reinforcement learning with online visual servoing to achieve efficient and precise shape control of deformable linear objects, including opposite-curvature large deformations, with zero-shot sim-to-real transfer.
Precise shape control of Deformable Linear Objects (DLOs) is crucial in robotic applications such as industrial and medical fields. However, existing methods face challenges in handling complex large deformation tasks, especially those involving opposite curvatures, and lack efficiency and precision. To address this, we propose a two-stage framework combining Reinforcement Learning (RL) and online visual servoing. In the large-deformation stage, a model-based reinforcement learning approach using an ensemble of dynamics models is introduced to significantly improve sample efficiency. Additionally, we design a self-curriculum goal generation mechanism that dynamically selects intermediate-difficulty goals with high diversity through imagined evaluations, thereby optimizing the policy learning process. In the small-deformation stage, a Jacobian-based visual servo controller is deployed to ensure high-precision convergence. Simulation results show that the proposed method enables efficient policy learning and significantly outperforms mainstream baselines in shape control success rate and precision. Furthermore, the framework effectively transfers the policy trained in simulation to real-world tasks with zero-shot adaptation. It successfully completes all 30 cases with diverse initial and target shapes across DLOs of different sizes and materials. The project website is available at: https://anonymous.4open.science/w/sc-mbrl-dlo-EB48/
研究の動機と目的
- Address the challenge of precise shape control for deformable linear objects (DLOs) under large deformations and opposite-curvature configurations.
- Improve sample efficiency in RL for DLO manipulation using model-based learning and ensemble dynamics models.
- Develop a self-curriculum goal generation mechanism to balance goal difficulty and diversity during training.
- Ensure high-precision convergence in small-deformation regimes via online Jacobian-based visual servoing.
- Demonstrate sim-to-real transfer without additional real-world training across multiple DLOs.
提案手法
- Two-stage framework: large-deformation stage with model-based RL and self-curriculum goals; small-deformation stage with online Jacobian-based visual servoing.
- An ensemble of Bi-LSTM dynamics models predicts DLO state transitions; elite models generate synthetic data to augment SAC-based policy training.
- Observation includes current DLO shape X, end-effector poses r, and target shape Xd; action is displacement increment Δr.
- Self-curriculum goal generation uses imagined evaluations to identify intermediate-difficulty goals, combined with Weighted Farthest Point Sampling to ensure diversity.
- Jacobian matrix is online-estimated for the visual servo controller in the small-deformation stage, providing precise convergence.
- Policy training switches stages when shape error e falls below thresholds; RL optimizes for closure to the target while the visual servo ensures precision.
実験結果
リサーチクエスチョン
- RQ1Can model-based RL with ensemble dynamics achieve sample-efficient learning for complex large-deformation DLO shaping?
- RQ2Does a self-curriculum goal generation strategy improve policy learning when initial and target shapes vary widely?
- RQ3Can the proposed two-stage framework generalize from simulation to real-world DLO manipulation across different sizes and materials?
- RQ4How does the integrated Jacobian-based visual servoing stage affect final precision in small-deformation regimes?
主な発見
- The proposed method achieves the highest success rate and lowest average minimum shape error across straight and diverse initial conditions in simulation.
- The self-curriculum mechanism with difficulty filtering and diversity sampling significantly improves training stability and policy generalization.
- Model-based RL with ensemble dynamics markedly improves sample efficiency compared with model-free baselines.
- The two-stage approach demonstrates robust sim-to-real transfer, completing all real-world tasks across three DLOs without online retraining.
- Compared to MPC, Visual Servo, and RL-only baselines, the proposed method yields better accuracy and faster convergence in most scenarios.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。