QUICK REVIEW

[论文解读] Gradient based sample selection for online continual learning

Rahaf Aljundi, Min Lin|arXiv (Cornell University)|Mar 20, 2019

Domain Adaptation and Few-Shot Learning参考文献 26被引用 79

一句话总结

论文将回放缓冲区的募集问题形式化为约束规约问题，并引入基于梯度的代理来最大化存储样本的多样性，从而在没有任务边界的情况下实现在线持续学习。它给出一个精确解（IQP）和一个便宜的贪婪方法，在多个基准数据集上展示了有竞争力的结果。

ABSTRACT

A continual learning agent learns online with a non-stationary and never-ending stream of data. The key to such learning process is to overcome the catastrophic forgetting of previously seen data, which is a well known problem of neural networks. To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal. Previous works often depend on task boundary and i.i.d. assumptions to properly select samples for the replay buffer. In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning. The goal is to select a fixed subset of constraints that best approximate the feasible region defined by the original constraints. We show that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. We further develop a greedy alternative that is cheap and efficient. The advantage of the proposed method is demonstrated by comparing to other alternatives under the continual learning setting. Further comparisons are made against state of the art methods that rely on task boundaries which show comparable or even better results for our method.

研究动机与目标

在在线、非独立同分布数据流且没有任务边界的情況下，激发持续学习。
将回放缓冲区募集形式化为一个约束规约问题。
引入基于梯度多样性的代理目标，以近似可行域的最小化。
提供高效的在线算法（IQP 与贪婪）用于缓冲区的募集。
在没有任务边界假设的基准持续学习数据集上展示有竞争力的性能。

提出的方法

将持续学习表述为对模型参数的受限优化问题，约束来自于过去数据。
通过梯度内积来表达可行域，并目标是在固定大小的回放缓冲区中维持它。
提出一个代理目标，最小化归一化梯度内积之和以最大化多样性（Eq. 7）。
将代理与实心角最小化相关联，并展示等价于最大化梯度方向方差（Eq. 8）。
提供一个精确的基于梯度的 IQP 方法（Algorithm 1），用于选择约束（样本）子集以最小化代理目标。
给出一个更便宜的贪婪备选方案（Algorithm 2），通过对随机子集的最大余弦相似度对样本进行打分并以概率替换缓冲区项。
讨论再演练（正则化）如何与受限优化相关，并在实证中比较各种方法。

实验结果

研究问题

RQ1在不知道任务边界或 i.i.d. 假设的情况下，是否可以有效地填充回放缓冲区？
RQ2基于梯度的多样性标准是否能可靠地近似原始约束所暗示的可行域最小化？
RQ3在线基于梯度的选择方法（IQP 与贪婪）是否在计算上可行且在标准持续学习基准上具有竞争力？
RQ4在非平稳数据流下，与储备抽样和考虑任务边界的回放基线相比，这些方法的表现如何？

主要发现

基于梯度的代理与可行域的实心角度量呈单调相关，支持将其作为目标函数使用。
在没有任务边界设置下，在线基于梯度的方法在 MNIST 与 CIFAR-10 的任务上优于随机采样和若干聚类基线。
贪婪变体（GSS-Greedy）具有较高的计算效率，表现与或优于其他策略，特别是在 CIFAR-10 上。
与非平衡数据序列上的储备抽样相比，所提出的方法实现了更高的平均准确度和对未充分表示任务的鲁棒性。
尽管未使用任务边界信息，这些方法的性能与某些考虑任务边界的回放基线（如 GEM、iCaRL）相当或更好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。