QUICK REVIEW

[论文解读] Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Anusha Nagabandi, Chelsea Finn|arXiv (Cornell University)|Dec 18, 2018

Domain Adaptation and Few-Shot Learning参考文献 33被引用 48

一句话总结

MOLe 将在线 SGD 与使用 Chinese restaurant process 的元训练网络混合结合，以在基于模型的强化学习中对非平稳任务进行持续自适应。元学习为快速在线适应和任务回忆提供了有利的先验。

ABSTRACT

Humans and animals can learn complex predictive models that allow them to accurately and reliably reason about real-world phenomena, and they can adapt such models extremely quickly in the face of unexpected changes. Deep neural network models allow us to represent very complex functions, but lack this capacity for rapid online adaptation. The goal in this paper is to develop a method for continual online learning from an incoming stream of data, using deep neural network models. We formulate an online learning procedure that uses stochastic gradient descent to update model parameters, and an expectation maximization algorithm with a Chinese restaurant process prior to develop and maintain a mixture of models to handle non-stationary task distributions. This allows for all models to be adapted as necessary, with new models instantiated for task changes and old models recalled when previously seen tasks are encountered again. Furthermore, we observe that meta-learning can be used to meta-train a model such that this direct online adaptation with SGD is effective, which is otherwise not the case for large function approximators. In this work, we apply our meta-learning for online learning (MOLe) approach to model-based reinforcement learning, where adapting the predictive model is critical for control; we demonstrate that MOLe outperforms alternative prior methods, and enables effective continuous adaptation in non-stationary task distributions such as varying terrains, motor failures, and unexpected disturbances.

研究动机与目标

在非平稳环境中推动深度模型的快速、持续在线适应。
开发一种基于在线 EM 的程序，以维护和更新任务特定模型的混合体。
利用元学习得到的先验，以在少量梯度步数下实现有效的在线适应。
通过 CRP 先验实例化新任务来处理未知的任务边界。

提出的方法

将在线学习表述为对每个任务 T 使用 SGD 更新 theta(T)，由 P(T_t | x_t, y_t) 指导。
使用 EM 估计任务职责并在线更新模型参数。
用 Chinese restaurant process 对任务分布建模，以在需要时实例化新任务。
用 MAML 对先验 theta* 进行元训练，以实现从少量数据的快速适应。
在在线 M 步中，使用一个梯度步长乘以 P_t(T_t=T_i|x_t,y_t) 来更新 theta_t+1(T_i)。
通过从过去的 K 次跃迁预测下一个状态并使用更新后模型的控制器进行规划，将 MOLe 应用于基于模型的强化学习。

实验结果

研究问题

RQ1MOLe 能否在非平稳数据流中自主发现任务结构？
RQ2MOLe 能否适应超出训练分布更远的任务，超过 k-shot 方法的能力？
RQ3MOLe 能否在线识别并回退到先前看到的任务？
RQ4MOLe 是否避免对最近任务过拟合，并在任务切换时保留过去的技能？
RQ5在非平稳设置中，MOLe 是否优于其他在线学习和元学习基线？

主要发现

MOLe 能在基于模型的 RL 中实现新任务的在线实例化以及对分布外任务的适应。
MOLe 能在线回忆并回退到先前看到的任务。
MOLe 在使用连续梯度更新或无元学习的基线方案中，优于 k-shot 元RL。
通过 MAML 的元训练先验为非平稳多任务设置中的持续在线适应提供了有效的初始化。
使用 CRP 先验可以对任务进行软分配，并在没有显式任务界定的情况下自然而然地出现专业化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。