QUICK REVIEW

[论文解读] Online Meta-Learning

Chelsea Finn, Aravind Rajeswaran|arXiv (Cornell University)|Feb 22, 2019

Domain Adaptation and Few-Shot Learning被引用 101

一句话总结

本文提出一种在线元学习框架以及 Follow The Meta Leader (FTML) 算法，将 MAML 扩展到有序任务序列，并具备 O(log T) 的后悔界限，同时在视觉任务上展示出显著的经验提升。

ABSTRACT

A central capability of intelligent systems is the ability to continuously build upon previous experiences to speed up and enhance learning of new tasks. Two distinct research paradigms have studied this question. Meta-learning views this problem as learning a prior over model parameters that is amenable for fast adaptation on a new task, but typically assumes the set of tasks are available together as a batch. In contrast, online (regret based) learning considers a sequential setting in which problems are revealed one after the other, but conventionally train only a single model without any task-specific adaptation. This work introduces an online meta-learning setting, which merges ideas from both the aforementioned paradigms to better capture the spirit and practice of continual lifelong learning. We propose the follow the meta leader algorithm which extends the MAML algorithm to this setting. Theoretically, this work provides an $\mathcal{O}(\log T)$ regret guarantee with only one additional higher order smoothness assumption in comparison to the standard online setting. Our experimental evaluation on three different large-scale tasks suggest that the proposed algorithm significantly outperforms alternatives based on traditional online learning approaches.

研究动机与目标

通过将元学习和在线学习的理念统一起来，激发持续的终身学习。
给出任务按序到达、先前经验用于促成适应的在线元学习问题的形式化。
提出 Follow The Meta Leader (FTML) 算法作为一种在线元学习方法。
给出理论上的后悔保证以及面向大规模任务的实用深度学习实现。

提出的方法

定义在评估每个任务 t 之前应用更新 U_t(w) 的在线元学习。
提出 FTML：w_{t+1} = argmin_w sum_{k=1}^t f_k(U_k(w)).
使用一步梯度更新 U_t(w) = w - α ∇f̂_t(w) 以得到类似 MAML 的目标函数。
在标准的光滑性和凸性假设下，证明复合函数是凸的、光滑的，并产生 O(log T) 的后悔界。
给出一个受 MAML 启发的面向深度网络的实用随机优化实例化（内部循环和外部循环，以及 Grad/Adam 更新）。
证明当内部更新增强后的损失在适当的凸性下，FTML 继承可与 Follow The Leader 相比的后悔保证。

实验结果

研究问题

RQ1是否可以将在线元学习表述为利用过去任务来实现对顺序到达的新任务的快速适应？
RQ2FTML 是否实现相对于事后最佳元学习者的次线性后悔？
RQ3FTML 是否能有效地在大规模视觉任务的深度神经网络上实现？
RQ4在实际任务中，在线元学习与传统在线学习和联合训练基线相比表现如何？

主要发现

在给定假设下，FTML 相对于事后最佳元学习者实现 O(log T) 的后悔界。
在适当步长下，类似 MAML 的目标 f_i(w - α ∇f̂_i(w)) 是凸的，便于高效优化。
FTML 相对于 Train On Everything (TOE) 和联合训练基线提供经验提升，尤其在数据高效的场景。
在 Rainbow MNIST 中，FTML 随着新增任务的增加更高效地学习新任务，优于其他方法。
在 CIFAR-100 中，FTML 使任务学习更快，并且受益于对所有层进行自适应，而不仅仅是最后一层。
在序列对象姿态预测设定中，FTML 相较基线方法实现更快的学习和更好的迁移。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。