QUICK REVIEW

[论文解读] Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

Abhishek Gupta, Coline Devin|arXiv (Cornell University)|Mar 8, 2017

Reinforcement Learning in Robotics参考文献 21被引用 117

一句话总结

该论文通过使用代理任务和深度嵌入，在形态差异的代理之间学习不变量特征空间，从而通过共享特征空间和塑形奖励实现技能迁移。

ABSTRACT

People can learn a wide range of tasks from their own experience, but can also learn from observing other creatures. This can accelerate acquisition of new skills even when the observed agent differs substantially from the learning agent in terms of morphology. In this paper, we examine how reinforcement learning algorithms can transfer knowledge between morphologically different agents (e.g., different robots). We introduce a problem formulation where two agents are tasked with learning multiple skills by sharing information. Our method uses the skills that were learned by both agents to train invariant feature spaces that can then be used to transfer other skills from one agent to another. The process of learning these invariant feature spaces can be viewed as a kind of "analogy making", or implicit learning of partial correspondences between two distinct domains. We evaluate our transfer learning algorithm in two simulated robotic manipulation skills, and illustrate that we can transfer knowledge between simulated robotic arms with different numbers of links, as well as simulated arms with different actuation mechanisms, where one robot is torque-driven while the other is tendon-driven.

研究动机与目标

激发具有不同形态的代理之间的迁移学习，以加速技能获得。
使用共享的代理技能为两个代理制定一个共同的不变量特征空间。
开发神经嵌入与对齐方法，以学习域之间的非特异对应关系。
在若干机器人任务中通过强化学习利用不变量空间演示技能迁移。

提出的方法

定义一个将代理特异状态映射到共享空间的共同潜在特征空间 f 和 g。
使用由两者学习的代理任务来获得域之间的对应(P)。
在成对的代理状态上用相似性（对比）损失训练 f 和 g：L_sim = ||f(s_Sp) - g(s_Tp)||^2。
添加自编码器解码器以确保嵌入保留信息：L_AE_S 和 L_AE_T。
使用基于 DTW 的对齐或基于时间的对齐来估计对应关系并迭代地改进嵌入（EM 风格）。
在迁移中，用迁移项 r_transfer = alpha * ||f(s_Sr) - g(s_Tr)||^2 来增强目标代理的奖励以指导学习。

实验结果

研究问题

RQ1具有不同形态的两个代理是否能够从共同的代理技能中学习一个共享的不变量特征空间？
RQ2在周期性任务允许时间扭曲或速率差异时，如何在跨域对齐状态？
RQ3在不变量空间中学习是否比直接映射或不进行迁移更提升新任务的迁移效率？

主要发现

基于嵌入的迁移使具有不同链接数量和不同驱动机制的机器人之间实现知识共享。
使用多个代理任务比任何单一代理任务都能提升迁移性能。
一种 EM 风格的对齐（DTW）在对应关系方面优于简单的基于时间的对齐，并提升迁移。
直接的状态对状态映射在迁移中表现不如学习一个共同嵌入空间。
在腱驱动与力矩驱动的臂部迁移中，基于嵌入的方法实现更快学习，且在交互有限的情况下也能达到较高成功率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。