QUICK REVIEW

[论文解读] FOSTER: Feature Boosting and Compression for Class-Incremental Learning

Fuyun Wang, Da-Wei Zhou|arXiv (Cornell University)|Apr 10, 2022

Domain Adaptation and Few-Shot Learning被引用 20

一句话总结

FOSTER 引入了一种两阶段学习范式，通过一个残差拟合模块提升新类别的学习，然后使用蒸馏将扩展后的模型压缩回单个骨干网，在 CIFAR-100 和 ImageNet-100/1000 的类别增量学习中达到最前沿的性能。

ABSTRACT

The ability to learn new concepts continually is necessary in this ever-changing world. However, deep neural networks suffer from catastrophic forgetting when learning new categories. Many works have been proposed to alleviate this phenomenon, whereas most of them either fall into the stability-plasticity dilemma or take too much computation or storage overhead. Inspired by the gradient boosting algorithm to gradually fit the residuals between the target model and the previous ensemble model, we propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively. Specifically, we first dynamically expand new modules to fit the residuals between the target and the output of the original model. Next, we remove redundant parameters and feature dimensions through an effective distillation strategy to maintain the single backbone model. We validate our method FOSTER on CIFAR-100 and ImageNet-100/1000 under different settings. Experimental results show that our method achieves state-of-the-art performance. Code is available at: https://github.com/G-U-N/ECCV22-FOSTER.

研究动机与目标

激励持续/增量学习并解决深度网络中的灾难性遗忘。
提出一个两阶段的 FOSTER 框架，结合特征增强和后续压缩以保留一个单一骨干网。
利用梯度提升的理念，在通过蒸馏控制复杂度的同时拟合旧模型到新模型的残差。
在 CIFAR-100 和 ImageNet-100/1000 的多种增量设定下展示最前沿的性能。

提出的方法

两阶段学习：(1) 通过扩展一个新的残差拟合模块并附加到冻结的旧模型上进行提升；(2) 通过蒸馏进行压缩，去除冗余参数和维度。
将 F_t 分解为冻结基底 F_{t-1} 和可训练的提升模块 F_t，其包含 phi_t 和 W_t，形成一个扩展的分类器 W_t 和拼接特征 Phi_t。
使用基于 KL 散度的目标和对数 logits 对齐来平衡旧/新类别并促进对旧概念的学习（L_KD、L_FE、L_LA）。
通过 Logits Alignment 对齐对旧/新类别 logits 的比例进行校准，Feature Enhancement 训练新特征以能够分类所有已看到的类别，以及 Balanced Distillation 针对不平衡数据的蒸馏进行训练。
通过知识蒸馏将特征压缩为单一骨干网，包括 Balanced Distillation (BKD) 和无标签数据蒸馏以裁剪冗余特征，同时尽量保持性能。

实验结果

研究问题

RQ1受梯度提升启发的残差学习器是否能在参数不过度增长的情况下提升新类的可塑性？
RQ2通过蒸馏把提升的扩展压缩成单一骨干网是否能保持性能并实现长期增量学习？
RQ3在增量更新过程中，如何通过对齐 logits、特征增强、平衡蒸馏等校准技术缓解旧新类别之间的偏差？
RQ4在 CIFAR-100 和 ImageNet-100/1000 上的实验在各种增量设定下是否展示了最前沿的结果？

主要发现

方法	平均增量准确率 (%)	CIFAR-100 B0 10 steps	CIFAR-100 B0 20 steps	CIFAR-100 B50 10 steps	CIFAR-100 B50 25 steps	ImageNet-1000?
Bound	80.40	80.41	81.49	81.74	-	-
iCaRL	64.42	63.50	53.78	50.60	-	-
BiC	65.08	62.37	53.21	48.96	-	-
WA	67.08	64.64	57.57	54.10	-	-
COIL	65.48	62.98	59.96	-	-	-
PODNet	55.22	47.87	63.19	60.72	-	-
DER	69.74	67.98	66.36	-	-	-
Ours (FOSTER)	72.90	70.65	67.95	63.83	-	-
Improvement	(+3.06)	(+2.67)	(+1.59)	(+3.11)	-	-
ImageNet-100 (B0)	-	-	-	-	-	-
Ours (FOSTER) - ImageNet-100/1000	-	-	-	-	68.34	-

FOSTER 在 CIFAR-100 的多种设定下实现了最先进的平均增量准确率（例如 base 0/50，步数不同）相较于之前的方法。
在 CIFAR-100 上，FOSTER 在长期和大步增量设定中比现有方法提高最多 3.11 个点。
在 ImageNet-100/1000 上，FOSTER 在大多数设定中持续优于竞争方法，在若干步数配置中有显著提升。
由于基于蒸馏的压缩策略，压缩阶段有效地将模型缩减为单一骨干网，性能损失可以忽略。
消融研究表明，logits alignment、feature enhancement 和 balanced distillation 对性能均有显著贡献，其中 LOGITS ALIGNMENT 相较于替代方案带来显著提升。
Grad-CAM 可视化表明，与冻结基底相比，提升模块学习了更广泛和更全面的特征区域，支持残差拟合的直觉。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。