QUICK REVIEW

[论文解读] Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

Vinay Ramasesh, Ethan Dyer|arXiv (Cornell University)|Jul 14, 2020

Domain Adaptation and Few-Shot Learning参考文献 39被引用 23

一句话总结

本文通过分析顺序训练如何扭曲隐藏表征，研究了神经网络中的灾难性遗忘，发现深层网络是遗忘的主要来源。研究发现，语义相似度居中的任务遗忘最严重，并表明诸如经验回放和弹性权重固化等缓解方法通过稳定深层表征来发挥作用，从而通过表征相似性分析为这些方法的有效性提供了统一解释。

ABSTRACT

A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks. Despite the ubiquity of catastrophic forgetting, there is limited understanding of the underlying process and its causes. In this paper, we address this important knowledge gap, investigating how forgetting affects representations in neural network models. Through representational analysis techniques, we find that deeper layers are disproportionately the source of forgetting. Supporting this, a study of methods to mitigate forgetting illustrates that they act to stabilize deeper layers. These insights enable the development of an analytic argument and empirical picture relating the degree of forgetting to representational similarity between tasks. Consistent with this picture, we observe maximal forgetting occurs for task sequences with intermediate similarity. We perform empirical studies on the standard split CIFAR-10 setup and also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.

研究动机与目标

理解灾难性遗忘在深度神经网络中的潜在机制，特别是其对隐藏表征的影响。
研究早期任务是否在所有网络层中均匀遗忘，或是否存在某些层对遗忘的贡献更大。
考察顺序任务之间表征相似性如何影响遗忘程度。
通过其对隐藏表征的影响，评估并比较常见遗忘缓解技术（如经验回放、EWC）的有效性。
开发一个能捕捉输入分布偏移这一现实世界常见原因的现实基准任务。

提出的方法

使用表征相似性度量（如CKA）和逐层消融（冻结/重置）方法，对表征变化进行经验分析。
提出一种基于CIFAR-100的新任务，利用分层标签结构模拟输入分布偏移，以建模现实世界的遗忘。
使用简化分析模型（冻结特征和线性读出头）推导遗忘对特征重叠和任务相似度的依赖关系。
通过旋转矩阵显式旋转第二阶段任务的特征，以控制表征相似性并检验其对遗忘的影响。
应用两种主要缓解技术——经验回放缓冲区和弹性权重固化（EWC）——以研究其对深层表征稳定性的影响。
通过早期任务准确率下降以及训练前后表征之间的CKA相似性，对遗忘进行定量评估。

实验结果

研究问题

RQ1灾难性遗忘如何影响神经网络不同层的隐藏表征？
RQ2早期任务是否在所有网络参数中均匀遗忘，还是某些层更容易受影响？
RQ3顺序任务之间的语义相似性如何影响灾难性遗忘的程度？
RQ4像经验回放和EWC这样的成熟缓解方法在多大程度上通过稳定深层表征来减少遗忘？
RQ5能否通过建模输入分布偏移的现实基准任务更好地捕捉现实持续学习中遗忘的动力学？

主要发现

在神经网络中，深层是灾难性遗忘的主要来源，因为它们在顺序训练过程中表现出最大的表征变化。
遗忘程度在任务语义相似度居中时达到最大，而非在非常相似或非常不相似的任务之间，这与分析模型的预测一致。
经验回放缓冲区和弹性权重正则化均通过稳定深层表征来缓解遗忘，为它们的成功提供了统一解释。
即使最终层权重发生显著变化，只要初始任务和第二阶段任务数据之间的表征重叠为零，遗忘仍可避免，这表明权重稳定性并非防止遗忘的必要条件。
基于CIFAR-100并采用分层标签结构的新任务成功模拟了现实的输入分布偏移，并揭示了与标准基准相似的遗忘动力学。
实证结果表明，CKA度量的表征相似性与遗忘强相关：相似性越低，遗忘越少，尤其在特征正交时表现更明显。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。