Skip to main content
QUICK REVIEW

[论文解读] Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

Xilai Li, Yingbo Zhou|arXiv (Cornell University)|Mar 31, 2019
Domain Adaptation and Few-Shot Learning参考文献 30被引用 80
一句话总结

本论文提出一个学习增长(learn-to-grow)框架,将神经结构优化与参数学习分离,以应对连续学习中的灾难性遗忘,利用 NAS 在共享基底网络之上生长任务特定结构。

ABSTRACT

Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks. Despite recent remarkable progress in state-of-the-art deep learning, deep neural networks (DNNs) are still plagued with the catastrophic forgetting problem. This paper presents a conceptually simple yet general and effective framework for handling catastrophic forgetting in continual learning with DNNs. The proposed method consists of two components: a neural structure optimization component and a parameter learning and/or fine-tuning component. By separating the explicit neural structure learning and the parameter estimation, not only is the proposed method capable of evolving neural structures in an intuitively meaningful way, but also shows strong capabilities of alleviating catastrophic forgetting in experiments. Furthermore, the proposed method outperforms all other baselines on the permuted MNIST dataset, the split CIFAR100 dataset and the Visual Domain Decathlon dataset in continual learning setting.

研究动机与目标

  • Motivate continual learning and the catastrophic forgetting problem in deep networks.
  • Propose a framework that explicitly learns task-specific structures while sharing components.
  • Decouple structure learning from parameter learning to improve performance and manage model size.

提出的方法

  • Introduce a two-component framework: neural structure optimization via NAS and parameter learning/fine-tuning on top of the current structure.
  • Use a super network S to manage shareable layers and task-specific additions, with options to reuse, adapt, or create new components.
  • Formulate a penalized loss that combines task loss with structure regularization and parameter regularization to bound model size (Eq. 4).
  • Relax discrete architectural choices into a differentiable Softmax to enable continuous NAS (in the style of DARTS).
  • Optimize architecture weights alpha on a validation set while updating network parameters on a training set, via alternating updates.
  • Describe how to implement structure optimization with reuse/adaptation/new operations and how to update the super model after each task.

实验结果

研究问题

  • RQ1Can explicit continual structure learning yield sensible task-specific architectures while sharing components across tasks?
  • RQ2Does separating structure learning from parameter learning reduce catastrophic forgetting compared to baselines?
  • RQ3How does the learned structure adapt when tasks are similar vs. dissimilar?

主要发现

  • The structure optimization tends to share layers for similar tasks and spawn new parameters when tasks are very different (e.g., ImageNet vs Omniglot).
  • For permuted MNIST, learned structures perform better or comparably to baselines, with strong forgetting control when reusing layers.
  • On Visual Domain Decathlon, the method achieves best average performance across tasks and maintains comparable total parameter counts to adapters.
  • Fine-tuning reused layers with regularization or small learning rates significantly mitigates forgetting and preserves previous task performance.
  • The method outperforms several state-of-the-art continual learning approaches on permuted MNIST and split CIFAR-100 in reported experiments.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。