QUICK REVIEW

[论文解读] Overcoming Catastrophic Forgetting by Incremental Moment Matching

Sang-Woo Lee, Jin-Hwa Kim|arXiv (Cornell University)|Mar 24, 2017

Domain Adaptation and Few-Shot Learning被引用 293

一句话总结

IMM 增量矩匹配后验以缓解灾难性遗忘；mean-IMM 与 mode-IMM 在旧/新任务之间取得平衡，结合权重迁移、L2-迁移和 drop-迁移等迁移技术，在多数据集上实现最先进的连续学习性能。

ABSTRACT

Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network.

研究动机与目标

推动持续学习以及改进深度神经网络中的灾难性遗忘。
引入一个贝叶斯启发的框架，用高斯混合来近似顺序任务的后验。
提出两种矩量匹配变体（mean-IMM 与 mode-IMM），用于合并任务特定的后验。
通过迁移技术扩展 IMM 的搜索空间，以产生更平滑、类似凸形的优化路径。
在多样数据集（MNIST、CIFAR-10、Caltech-UCSD Birds、Lifelog）上展示经验提升。

提出的方法

将网络参数的后验建模为高斯分布，并用单个高斯分布 q(θ|μ,Σ) 来近似任务后验的混合。
Mean-IMM: 最小化加权 KL 发散和 KL(qk||q1:K)，得到 μ* = ∑k αk μk 和 Σ* = ∑k αk(Σk + (μk−μ*)(μk−μ*)T)。
Mode-IMM：使用Laplace近似来近似混合的模态，得到 μ* = Σ* (∑k αk Σk−1 μk) 以及 Σ* = (∑k αk Σk−1)−1。
应用迁移技术（权重迁移、L2-迁移、drop-迁移）使任务后验之间的优化路径变得光滑并具有凸性样的特征。
假设对角协方差以降低复杂度，并使用费舍尔信息来定义 mode-IMM 中的 Σk。

实验结果

研究问题

RQ1如何将顺序任务网络的后验矩合并以防止遗忘？
RQ2mean-IMM 和 mode-IMM 能否在不同数据集上有效地平衡旧任务与新任务的性能？
RQ3迁移技术（权重迁移、L2-迁移、drop-迁移）能否通过平滑损失表面来提升 IMM 的性能？
RQ4贝叶斯矩量匹配视角是否能解释并引导深度网络中的持续学习？
RQ5当任务在尺度和数据分布上存在差异时，IMM 的实际局限性有哪些？

主要发现

Mean-IMM 和 mode-IMM 在多个基准上实现与最先进的持续学习性能相当的表现。
Drop-迁移和 L2-迁移显著提升 IMM 的性能，并提高旧任务与新任务之间权衡的稳定性。
Mode-IMM 显示出对迁移技术的鲁棒性，通常在任务尺度差异时超过 mean-IMM。
IMM 可以通过在线调整 αt，显式地平衡任务重要性，实现旧信息与新信息的动态加权。
在 ImageNet 到 CUB 的迁移中，IMM 变体对先前的 LwF 基线取得了适度提升，表明其对异质任务对的适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。