QUICK REVIEW

[论文解读] Representational Continuity for Unsupervised Continual Learning

Divyam Madaan, Jaehong Yoon|arXiv (Cornell University)|Oct 13, 2021

Domain Adaptation and Few-Shot Learning被引用 28

一句话总结

本论文表明无监督持续学习（UCL）在表示更鲁棒、遗忘更少方面优于监督持续学习，并引入 Lump，一种基于 mixup 的简单方法以进一步缓解 UCL 的遗忘。

ABSTRACT

Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where we learn the feature representations on an unlabelled sequence of tasks and show that reliance on annotated data is not necessary for continual learning. We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-of-distribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (LUMP), a simple yet effective technique that interpolates between the current task and previous tasks' instances to alleviate catastrophic forgetting for unsupervised representations.

研究动机与目标

将无监督持续学习作为现实世界未标注数据流中监督持续学习的可扩展替代方案的动机。
系统分析无监督表示在顺序任务设置中的行为及其为何可能对遗忘更鲁棒。
评估 UCL 表示对分布外任务和小样本场景的泛化与迁移性。
提出一种简单有效的技术（Lump），在不增加额外超参数或对现有方法进行大改动的情况下缓解遗忘。

提出的方法

将 SimSiam 和 BarlowTwin 自监督目标扩展到 UCL 设置，并研究适用于无监督学习的 Finetune 与 DER 风格基线。
提出 Lifelong Unsupervised Mixup（Lump），在当前任务样本与 past replay-buffer 样本之间进行插值以减少遗忘。
在固定的 ResNet-18 主干和 KNN 评估下，比较 UCL 与监督持续学习基线（正则化、结构、回放为基础）在 Split CIFAR-10、CIFAR-100 与 Tiny-ImageNet 上的表现。
通过 centered kernel alignment（CKA）和参数空间距离分析特征表示，以理解 UCL 与 SCL 之间在鲁棒性与损失景观方面的差异。
提供 DER 的无监督适配到 UCL（UCL-DER），通过使用回放缓冲区样本对表示轨迹进行正则化。

实验结果

研究问题

RQ1无监督持续学习是否在标准 CL 基准上产生对灾难性遗忘更鲁棒的表示，相对于监督持续学习？
RQ2相对于 SCL，UCL 表示如何迁移到分布外任务和小样本场景？
RQ3无标签的简单回放基策略是否可以改进 UCL，混合插值是否改善遗忘？
RQ4对于 UCL 与 SCL 学到的表示的本质，特征相似性（CKA）和损失景观分析揭示了什么？
RQ5Lump 是否在多个数据集和任务中有效缓解 UCL 的遗忘？

主要发现

无监督表示在 Split CIFAR-10、CIFAR-100 和 Tiny-ImageNet 上相比监督表示，遗忘更低、准确率同等或更高。
使用 UCL 的微调通常优于许多 SCL 策略，Lump 还带来额外提升（例如在某些设置下 CIFAR-100 提升 2.8%、Tiny-ImageNet 提升 5.9%）。
BarlowTwins 与基于 SimSiam 的 UCL 表示在各数据集上遗忘显著低于 SCL 基线。
CKA 分析表明 UCL 模型在较低层具有较高的特征相似性，UCL 与 SCL 的表示主要在高层分歧，UCL 倾向于学习更符合人类感知的特征。
UCL 产生更平坦、光滑的损失景观，暗示更强的优化稳定性和泛化。
Lump，一种简单的基于 mixup 的在当前任务样本与回放缓冲区样本之间的插值，在 UCL 中能有效缓解遗忘，计算开销小且无额外超参数，在多数据集上超越若干基线。
UCL 表示对分布外数据集（MNIST、FMNIST、SVHN）有更好的泛化，在小样本场景也有优势，Lump 维持强劲表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。