QUICK REVIEW

[论文解读] Learning Dexterous Manipulation Policies from Experience and Imitation

Vikash Kumar, Abhishek Gupta|arXiv (Cornell University)|Nov 15, 2016

Robot Manipulation and Learning参考文献 30被引用 39

一句话总结

本文提出一种混合学习方法，通过基于传感器数据导出的局部线性模型进行轨迹优化，实现对五指机械手的灵巧操作策略训练。该方法结合了人类遥操作的模仿学习与基于最近邻或深度学习的泛化策略，在仿真和真实硬件上均实现了鲁棒的灵巧操作，且仅需极少数据——证明了在约60次试验数据上训练的局部控制器可通过插值生成全局策略。

ABSTRACT

We explore learning-based approaches for feedback control of a dexterous five-finger hand performing non-prehensile manipulation. First, we learn local controllers that are able to perform the task starting at a predefined initial state. These controllers are constructed using trajectory optimization with respect to locally-linear time-varying models learned directly from sensor data. In some cases, we initialize the optimizer with human demonstrations collected via teleoperation in a virtual environment. We demonstrate that such controllers can perform the task robustly, both in simulation and on the physical platform, for a limited range of initial conditions around the trained starting state. We then consider two interpolation methods for generalizing to a wider range of initial conditions: deep learning, and nearest neighbors. We find that nearest neighbors achieve higher performance. Nevertheless, the neural network has its advantages: it uses only tactile and proprioceptive feedback but no visual feedback about the object (i.e. it performs the task blind) and learns a time-invariant policy. In contrast, the nearest neighbors method switches between time-varying local controllers based on the proximity of initial object states sensed via motion capture. While both generalization methods leave room for improvement, our work shows that (i) local trajectory-based controllers for complex non-prehensile manipulation tasks can be constructed from surprisingly small amounts of training data, and (ii) collections of such controllers can be interpolated to form more global controllers. Results are summarized in the supplementary video: https://youtu.be/E0wmO6deqjo

研究动机与目标

解决在不依赖人工设计控制器的前提下，学习复杂、高维机械手灵巧操作策略的挑战。
开发一种可扩展的方法，利用少量经验数据与人类示范，训练非抓握类操作任务的局部反馈控制器。
通过插值技术将局部控制器泛化至更广泛的初始状态，评估深度学习与最近邻方法在鲁棒性与性能方面的表现。
证明仅依赖本体感觉与触觉反馈即可实现盲操作的可行性，避免对视觉输入的依赖。

提出的方法

通过从传感器数据（关节状态、气缸压力、物体动力学）直接学习的时变线性高斯模型，利用轨迹优化训练局部控制器。
利用在虚拟环境中通过遥操作收集的人类示范初始化优化过程，以提升样本效率与收敛速度。
采用两种插值方法泛化局部策略：(1) 一个从触觉与本体感觉反馈中学习时不变策略的深度神经网络；(2) 基于初始物体状态选择局部控制器的最近邻方法。
使用动作捕捉系统感知初始物体状态，以在执行开始时切换至时变局部控制器。
在ADROIT机械手平台上，利用高维100维状态空间（24个关节、40个气压值、物体位姿/速度）与40维控制空间（阀门指令）。
应用正则化与系统辨识技术，尽管气动执行与肌腱驱动动力学复杂，仍能学习到精确的数据驱动模型。

实验结果

研究问题

RQ1能否从少量经验与人类示范中有效学习复杂非抓握类操作的局部轨迹控制器？
RQ2具体而言，深度学习与最近邻等泛化技术在将局部控制器扩展至更广泛初始状态方面表现如何？
RQ3在仅依赖本体感觉与触觉传感、不使用视觉反馈的情况下，灵巧操作策略能在多大程度上被学习？
RQ4与时间可变的最近邻切换策略相比，时不变神经网络策略在鲁棒性与成功率方面的表现如何？

主要发现

在物理ADROIT平台上基于约60次试验数据训练的局部控制器，可成功执行复杂非抓握类操作任务（如物体旋转），但仅限于有限的初始状态范围。
最近邻泛化方法在多样化初始状态下的成功率与鲁棒性方面优于深度神经网络。
深度神经网络控制器实现了盲操作——仅依赖触觉与本体感觉反馈完成任务，无需关于物体的视觉信息。
神经网络学习到了时不变策略，而最近邻方法则基于初始状态选择时变控制器，体现出泛化能力与适应性之间的权衡。
两种泛化方法均展现出潜力，但在处理不稳定或高维任务方面仍有改进空间。
结果表明，将基于模型的轨迹优化与数据驱动泛化相结合，可在极少数据条件下实现灵巧操作技能的实际学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。