[论文解读] Sim-to-Real Reinforcement Learning for Deformable Object Manipulation
该论文在仿真中使用领域随机化通过增强DDPG框架训练可变形物体操作策略(布料),然后无需进一步训练就把它们转移到真实世界,并在三个任务上展示成功。
We have seen much recent progress in rigid object manipulation, but interaction with deformable objects has notably lagged behind. Due to the large configuration space of deformable objects, solutions using traditional modelling approaches require significant engineering work. Perhaps then, bypassing the need for explicit modelling and instead learning the control in an end-to-end manner serves as a better approach? Despite the growing interest in the use of end-to-end robot learning approaches, only a small amount of work has focused on their applicability to deformable object manipulation. Moreover, due to the large amount of data needed to learn these end-to-end solutions, an emerging trend is to learn control policies in simulation and then transfer them over to the real world. To-date, no work has explored whether it is possible to learn and transfer deformable object policies. We believe that if sim-to-real methods are to be employed further, then it should be possible to learn to interact with a wide variety of objects, and not only rigid objects. In this work, we use a combination of state-of-the-art deep reinforcement learning algorithms to solve the problem of manipulating deformable objects (specifically cloth). We evaluate our approach on three tasks --- folding a towel up to a mark, folding a face towel diagonally, and draping a piece of cloth over a hanger. Our agents are fully trained in simulation with domain randomisation, and then successfully deployed in the real world without having seen any real deformable objects.
研究动机与目标
- Motivate deformable object manipulation as a challenge beyond rigid-object manipulation.
- Develop a fully simulated RL pipeline for cloth tasks with minimal reward shaping.
- Enable sim-to-real transfer of deformable-object policies via domain randomisation.
- Evaluate on multiple cloth manipulation tasks and analyze transfer performance.
提出的方法
- Use an improved Deep Deterministic Policy Gradients (DDPG) framework with demonstrations and multiple extensions to learn continuous control policies.
- Train in simulation with three deformable-object tasks (tape folding, hanging, diagonal folding) using a sparse reward structure.
- Incorporate demonstrations (DDPGfD) and behavioural cloning with Q-filter, N-step returns, and TD3-inspired targets to stabilise learning.
- Apply domain randomisation to textures, colors, lighting, geometry, and camera parameters to enable sim-to-real transfer.
- Utilise an asymmetric actor-critic setup where the actor uses high-dimensional RGB observations while the critic uses low-dimensional state information.
- Employ auxiliary prediction losses to help the network recognise key scene features (cloth corners, tape position, hanger position).
- Evaluate transfer to a real Kinova Mico arm with a low-cost camera without additional real-world training.
实验结果
研究问题
- RQ1Can end-to-end RL with domain randomisation transfer deformable-object manipulation policies from simulation to the real world without real-object training?
- RQ2What RL improvements (demonstrations, N-step returns, BC, TD3-style targets, etc.) most effectively enable learning for cloth manipulation under sparse rewards?
- RQ3How do domain randomisation settings affect sim-to-real transfer for cloth tasks?
- RQ4What are the main failure modes during real-world execution of learned policies for cloth manipulation?
- RQ5Which factors limit transfer performance and how can they be mitigated?
主要发现
| Task | Sim Success Rate (Table 1) | Real-World Metrics (Table 2 context) |
|---|---|---|
| Diagonal Folding | 90% | Notable gripper success and not-crumpled/not-crumpled related metrics; full success 46.6% in Hanging; 40-90% range for intermediate metrics across tasks |
- Achieved 3 real-sim transferable tasks after training in simulation with domain randomisation.
- In simulation, the integrated method reached 90% for diagonal folding, 77% for hanging, and 86% for tape folding (success rates).
- On real-world trials, the policies achieved notable success with grasp, near-tape or drape objectives, and full success varying by task (e.g., 46.6% full success on Hanging in real world, 40-90% for intermediate metrics across tasks).
- Auxiliary predictions, behavioural cloning, and demonstration prioritisation provided positive contributions to performance; reset-to-demonstration and removing low-dimensional actor input were less beneficial.
- Heavy randomisation can hinder transfer performance; camera randomisation is essential for successful sim-to-real transfer; precise grasping remains a primary failure mode due to limited depth perception and cloth variability.
- The approach demonstrates that sim-to-real transfer for deformable objects is feasible with end-to-end RL and domain randomisation, addressing a gap in deformable-object manipulation research.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。