QUICK REVIEW

[论文解读] Continual Normalization: Rethinking Batch Normalization for Online Continual Learning

Quang Pham, Chenghao Liu|arXiv (Cornell University)|Mar 30, 2022

Domain Adaptation and Few-Shot Learning被引用 22

一句话总结

Continual Normalization (CN) 在在线持续学习中替代 Batch Normalization，以在小批量归一化和空间归一化之间取得平衡，减少跨任务遗忘，同时保持知识迁移。

ABSTRACT

Existing continual learning methods use Batch Normalization (BN) to facilitate training and improve generalization across tasks. However, the non-i.i.d and non-stationary nature of continual learning data, especially in the online setting, amplify the discrepancy between training and testing in BN and hinder the performance of older tasks. In this work, we study the cross-task normalization effect of BN in online continual learning where BN normalizes the testing data using moments biased towards the current task, resulting in higher catastrophic forgetting. This limitation motivates us to propose a simple yet effective method that we call Continual Normalization (CN) to facilitate training similar to BN while mitigating its negative effect. Extensive experiments on different continual learning algorithms and online scenarios show that CN is a direct replacement for BN and can provide substantial performance improvements. Our implementation is available at \url{https://github.com/phquang/Continual-Normalization}.

研究动机与目标

研究 Batch Normalization 在在线持续学习中的跨任务归一化效应
识别持续学习中归一化层的理想属性
提出 Continual Normalization (CN) 以平衡训练促进与忘记抑制
证明 CN 作为可替换 BN 的直接替代，在在线协议中实现改进

提出的方法

CN 首先对空间特征进行归一化，应用 Group Normalization (GN)，不带仿Affine参数
然后对 GN 的输出应用 Batch Normalization (BN)，带有仿Affine参数：a_CN = gamma * BN(a_GN) + beta
CN 使用 GN 来纳入空间信息，使用 BN 来保留迁移能力，从而在无需额外测试时输入的情况下实现自适应归一化
CN 并不引入除 BN 的 gamma 与 beta 之外的新的可学习参数，并保持与现有骨干网络的兼容性
CN 被认为在小批量归一化与样本内归一化之间取得平衡，以减少跨任务归一化效应
在在线持续学习基准上，与 BN、BRN、IN、GN 和 SN 进行比较

实验结果

研究问题

RQ1BN 是否在在线 CL 中提升前向知识迁移，但由于跨任务归一化而带来更高的遗忘？
RQ2CN 是否能够在各种在线持续学习协议中优于 BN 及其他归一化层？
RQ3CN 是否是一个直接的、测试时自适应的 BN 替代品，开销最小？
RQ4CN 在任务增量、类别增量和任务自由的在线 CL 设置中的表现如何？

主要发现

在 Split CIFAR-100 和 Split Mini IMN 的在线任务增量实验中，与 BN、BRN、IN、GN、SN 相比，CN 始终获得最佳总体 ACC
CN 平衡遗忘（FM）与学习准确度（LA），在若干设置中相较 BN 提供更好的稳定性和迁移
在 DER++ 下的在线类别增量设置中，CN 在 Split CIFAR-10 与 Split Tiny IMN 上相较 BN 显示出改进的性能，在许多配置中具有更高的 ACC 和更低的 FM
CN 在多种分组配置（G=8、G=32）和不同内存大小下显示出有竞争力或更优的结果，CN 通常比 BN 更稳定
在长尾分布的在线持续学习基准（COCOseq、NUS-WIDEseq）中，CN 相对 BN 持续带来改进，尤其是在降低遗忘方面

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。