QUICK REVIEW

[论文解读] Interactive Differentiable Simulation

Eric Heiden, David Millard|arXiv (Cornell University)|May 26, 2019

Reinforcement Learning in Robotics参考文献 33被引用 34

一句话总结

IDS 是一个可微分物理引擎，能够从视觉、面向任务的设计以及自适应 MPC 学习物理参数，相对于无模型方法在采样效率方面有所提升。

ABSTRACT

Intelligent agents need a physical understanding of the world to predict the impact of their actions in the future. While learning-based models of the environment dynamics have contributed to significant improvements in sample efficiency compared to model-free reinforcement learning algorithms, they typically fail to generalize to system states beyond the training data, while often grounding their predictions on non-interpretable latent variables. We introduce Interactive Differentiable Simulation (IDS), a differentiable physics engine, that allows for efficient, accurate inference of physical properties of rigid-body systems. Integrated into deep learning architectures, our model is able to accomplish system identification using visual input, leading to an interpretable model of the world whose parameters have physical meaning. We present experiments showing automatic task-based robot design and parameter estimation for nonlinear dynamical systems by automatically calculating gradients in IDS. When integrated into an adaptive model-predictive control algorithm, our approach exhibits orders of magnitude improvements in sample efficiency over model-free reinforcement learning algorithms on challenging nonlinear control domains.

研究动机与目标

介绍 Interactive Differentiable Simulation (IDS)，一个用于刚体动力学、具有可解释物理参数的可微分物理引擎。
通过将 IDS 集成到神经网络架构和优化流程中，实现端到端学习与控制。
展示利用基于物理的瓶颈从视觉输入进行系统辨识与参数估计。
展示在自动机器人设计和自适应模型预测控制（MPC）中的应用。

提出的方法

使用牛顿-欧拉方程对刚体动力学建模，并采用关节体算法（ABA）进行正向动力学计算以实现 O(n) 加速。
实现半隐式欧拉积分以更新速度和位置，并使用递归牛顿-欧拉进行力传播。
将物理引擎表示为可微分层，通过反向模式自动微分（Stan Math）获得相对于输入、力和参数的梯度。
将 IDS 层集成在基于视觉的编码/解码器之间，以预测未来状态并通过端到端训练学习物理参数。
使用自编码器瓶颈设置，在学习神经编码器/解码器的同时学习 IDS 参数 theta_phy（例如连杆长度），并使用三元组损失。
通过对 DH 参数和正向运动学进行可微分以应用 IDS 到自动机器人设计，以梯度为基础的优化最小化末端执行器跟踪误差。
通过将 IDS 动力学拟合到真实转变并使用 iLQR 在短时 horizon 内进行轨迹优化，来实现自适应模型预测控制（AMPC）。

实验结果

研究问题

RQ1IDS 是否能够从高维视觉输入中准确推断出物理上有意义的参数？
RQ2集成可微分物理层是否改善预测时界和超出训练数据的泛化能力？
RQ3基于 IDS 的 AMPC 是否在非线性控制任务中实现比无模型强化学习更好的采样效率？
RQ4在多大程度上 IDS 能通过可微分的 DH/运动学促进机器人臂的自动设计？
RQ5在自适应控制环路中，可微分引擎将其模型适应于现实世界动力学的能力如何？

主要发现

IDS 能够学习物理上有意义的参数（例如摆的连杆长度收敛于真实值）。
基于 IDS 的自编码器带物理瓶颈在预测性能上与直觉物理基线相匹配，并在长时预测上超越了完全学习的模型。
在评估环境中的单摆和双摆任务中，带有 IDS 的自适应 MPC 相较于 SAC 和 DDPG 具有更高的样本效率。
通过 IDS 的梯度优化使 DH 参数优化能够使臂设计更接近任务空间轨迹。
具有可微分动力学的 AMPC 在若干条目内迅速收敛到准确的系统模型（例如在单摆/ cartpole 实验中显示的收敛）。
IDS 提供可解释的参数和守恒定律的一致性，便于与经典控制和估计方法的集成。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。