[论文解读] Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
本文提出一种将仿真中训练的策略转移到现实世界的方法,通过在目标领域学习一个深度逆向动力学模型,利用仿真器预测下一个观测并相应地调整动作。
Developing control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties. During execution, at each time step our approach computes what the simulation-based control policy would do, but then, rather than executing these controls on the real robot, our approach computes what the simulation expects the resulting next state(s) will be, and then relies on a learned deep inverse dynamics model to decide which real-world action is most suitable to achieve those next states. Deep models are only as good as their training data, and we also propose an approach for data collection to (incrementally) learn the deep inverse dynamics model. Our experiments shows our approach compares favorably with various baselines that have been developed for dealing with simulation to real world model discrepancy, including output error control and Gaussian dynamics adaptation.
研究动机与目标
- 在目标领域(通常为现实环境)中利用一个有效的源域策略,即使存在仿真与现实之间的差异,也能表现良好。
- 利用高层策略行为可迁移的思想,而低层控制细节由于摩擦、接触和其他动力学差异而不同。
- 开发一种在线数据收集策略,用于训练在目标域中自适应动作的深度逆向动力学模型。
- 通过 Sim1→Sim2 和 Sim→Real 实验(包括接触丰富的任务)来证明转移有效性。
- 与通过输出误差控制或高斯动力学自适应来处理模型不匹配的基线方法进行比较。
提出的方法
- 在每个时间步,计算源域动作 a_source = pi_source(tau_-k:).
- 预测下一个源域观测 o_next_hat = o(T_source(tau_-k:, a_source)).
- 使用学习得到的逆向动力学模型 phi(tau_-k:, o_next_hat) 以选择目标域动作 a_target.
- 训练 phi 将 (oHistory, aHistory, o_next) 映射到实现转移的前一个动作。
- 引入历史窗口 H 以捕获动力学中的时序依赖和潜在因素。
- 通过执行一个初步的目标域策略、使用选择性探索噪声,并迭代地对 phi 进行改进以收集训练数据。
实验结果
研究问题
- RQ1在目标域学习的深度逆向动力学模型能否实现从源域策略到目标域的有效转移?
- RQ2在仿真到现实的转移中,使用预测的下一个观测和逆模型是否优于直接策略转移或正向动力学自适应,尤其是在具有丰富接触的动力学中?
- RQ3带历史信息的逆向动力学学习如何影响数据效率和适应性能?
- RQ4在不同动力学条件下,所提方法与输出误差控制和高斯动力学自适应基线的对比性能如何?
- RQ5仅通过动作自适应就能在没有状态/观测自适应的情况下实现鲁棒的 Sim-to-Real 转移吗?
主要发现
- The proposed method achieves compelling transfer from simulation to real world, including challenging contact-rich dynamics.
- Adaptation outperforms baseline methods (output error control and Gaussian dynamics adaptation) in both Sim1→Sim2 and Sim→Real settings.
- Using history in the inverse dynamics model reduces data requirements and improves convergence.
- Learning with targeted, task-relevant data collection yields faster convergence than random exploration.
- In Sim→Real Fetch experiments, the method significantly reduces deviation from the simulated trajectory compared with a PD baseline.
- The approach remains effective across variations in gravity and motor noise, and handles discontinuities arising from contacts.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。