QUICK REVIEW

[论文解读] Compacting, Picking and Growing for Unforgetting Continual Learning

Steven C. Y. Hung, Cheng-Hao Tu|arXiv (Cornell University)|Oct 15, 2019

Domain Adaptation and Few-Shot Learning参考文献 45被引用 134

一句话总结

本文提出 CPG，一种持续学习框架，通过剪枝紧凑化模型、通过一个可微分掩码挑选关键的旧权重、并在需要时才增长网络，在多任务下实现了紧凑增长的无遗忘学习。它优于若干基线并为未来任务维持一个紧凑的知识库。

ABSTRACT

Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manner, we introduce an incremental learning method that is scalable to the number of sequential tasks in a continual learning process. Our approach is easy to implement and owns several favorable characteristics. First, it can avoid forgetting (i.e., learn new tasks while remembering all previous tasks). Second, it allows model expansion but can maintain the model compactness when handling sequential tasks. Besides, through our compaction and selection/expansion mechanism, we show that the knowledge accumulated through learning previous tasks is helpful to build a better model for the new tasks compared to training the models independently with tasks. Experimental results show that our approach can incrementally learn a deep model tackling multiple tasks without forgetting, while the model compactness is maintained with the performance more satisfiable than individual task training.

研究动机与目标

Motivate continual lifelong learning that avoids catastrophic forgetting while remaining scalable across many sequential tasks.
Propose a simple, effective framework that combines model compression, critical weight selection, and progressive network expansion.
Show that reusing knowledge from past tasks improves learning of new tasks compared to training independently.
Demonstrate that the approach can keep model size compact while supporting unlimited sequential tasks.

提出的方法

Apply gradual pruning to compress the current task model while preserving performance.
Introduce a learnable binary mask to pick a subset of old-task weights to reuse for the new task.
Reuse released (extra) weights for the new task, and optionally expand the architecture if accuracy goals are not met.
Fix old-task weights to avoid forgetting while training new-task weights and the picking mask together with any released weights.
After training for a new task, further prune the newly added weights to obtain a compact representation for that task.
Iteratively repeat compaction, picking, and possible growing for subsequent tasks.

实验结果

研究问题

RQ1Can a compacting-picking-growing cycle prevent forgetting while enabling scalable growth across an unlimited sequence of tasks?
RQ2Does reusing a morally compact set of old-task weights via a learnable mask improve new-task performance compared to training from scratch or full sharing?
RQ3How does the proposed method compare to related continual learning approaches (e.g., ProgressiveNet, PackNet, DEN) in accuracy and model size?
RQ4What level of architectural expansion is necessary to achieve target accuracy without excessive growth?
RQ5Is the learned knowledge base beneficial for future task performance when compared to independent task training?

主要发现

方法	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	平均	扩展	减少
PackNet	66.4	80.0	76.2	78.4	80.0	79.8	67.8	61.4	68.8	77.2	79.0	59.4	66.4	57.2	36.0	54.2	51.6	58.8	67.8	83.2	67.5	1	0
PAE	67.2	77.0	78.6	76.0	84.4	81.2	77.6	80.0	80.4	87.8	85.4	77.8	79.4	79.6	51.2	68.4	68.6	68.6	83.2	88.8	77.1	2	0
CPG	65.2	76.6	79.8	81.4	86.6	84.8	83.4	85.0	87.2	89.2	90.8	82.4	85.6	85.2	53.2	74.4	70.0	73.4	88.8	94.8	80.9	1.5	0.41

CPG maintains exact old-task performance while incrementally learning new tasks.
Compared to baselines, CPG achieves better or comparable accuracy while keeping a compact model and modest expansion.
Using a critical-weights mask reduces unnecessary old-task weights and yields improved performance on subsequent tasks.
CPG expands less than some baselines (e.g., DEN, ProgressiveNet) yet maintains or improves accuracy across multiple tasks.
The approach builds a reusable knowledge base that enhances learning of new tasks relative to training tasks independently.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。