QUICK REVIEW

[论文解读] Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

Matthew Riemer, Ignacio Cases|arXiv (Cornell University)|Oct 28, 2018

Domain Adaptation and Few-Shot Learning被引用 345

一句话总结

本论文提出 Meta-Experience Replay (MER)，一种将经验回放与基于优化的元学习相结合的方法，以在监督学习与强化学习的连续学习中最大化前向迁移、最小化干扰，并且无需任务标签。

ABSTRACT

Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.

研究动机与目标

以时间上对称的迁移–干扰视角来理解持续学习在前向与后向时间方向上的关系。
开发一个基于元学习的算法，学习调整梯度动态以促进迁移、减少干扰。
利用经验回放来近似非平稳数据流中的平稳分布。
在多样化的持续学习基准和非平稳 RL 环境中评估 MER，展示稳健的性能提升。

提出的方法

用梯度对齐来定义两示例之间的迁移和干扰。
提出一个目标，鼓励随机数据点之间的梯度点积较高，以促进共享的有用表示（Equation 4）。
将经验回放与基于优化的元学习相结合，创建 MER (Algorithm 1)，在内存样本上优化 Reptile 风格的目标。
使用水 reservoir sampling 来维护一个内存缓冲区，以近似已看到数据的平稳分布（Appendix F）。
采用一阶元学习（Reptile）以避免二阶微分计算，并实现在线持续学习（Equations 6–7）。

实验结果

研究问题

RQ1一个时间对称的迁移–干扰框架是否能在非平稳分布上改进持续学习？
RQ2Meta-Experience Replay (MER) 是否能够有效最大化前向迁移，同时最小化对过去与未来数据的干扰？
RQ3MER 与现有基线（EWC、GEM、Online 等）在监督持续终身学习基准上的表现有何差异？
RQ4MER 在内存有限（缓冲区较小）和日益非平稳的强化学习环境中是否具有鲁棒性？

主要发现

MER 在监督持续终身学习基准 MNIST Rotations 与 MNIST Permutations 上持续优于强基线（如 GEM、EWC、Online）。
MER 能获得更高的保留准确性以及在迁移与干扰之间的更好平衡，尤其当非平稳性增加时。
MER 在更小的内存缓冲区下显示出更强的增益，即使 GEM 使用显著更大的缓冲区也超过 GEM。
在日益非平稳的设置（Many Permutations、Omniglot）中，较基线，MER 大幅提高了保持和学习速度。
在非平稳的强化学习测试中，使用 DQN 的非平稳 Catcher 与 Flappy Bird 的连续强化学习中，MER 减少了遗忘并改善跨任务表现，相比标准带经验回放的 DQN。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。