[Paper Review] Continual Classification Learning Using Generative Models
This paper proposes a continual classification learning method using generative models that prevents catastrophic forgetting by employing a student-teacher architecture with a variational autoencoder (VAE) framework. The model generates synthetic data from past tasks via a teacher network to augment current task training, enabling joint generative and discriminative learning without storing past data or models, and achieves stable performance across sequential tasks on permuted MNIST and FashionMNIST benchmarks.
Continual learning is the ability to sequentially learn over time by accommodating knowledge while retaining previously learned experiences. Neural networks can learn multiple tasks when trained on them jointly, but cannot maintain performance on previously learned tasks when tasks are presented one at a time. This problem is called catastrophic forgetting. In this work, we propose a classification model that learns continuously from sequentially observed tasks, while preventing catastrophic forgetting. We build on the lifelong generative capabilities of [10] and extend it to the classification setting by deriving a new variational bound on the joint log likelihood, $\log p(x; y)$.
Motivation & Objective
- To address catastrophic forgetting in continual learning for classification tasks.
- To enable continual learning without storing previous data or task-specific models.
- To jointly optimize for generative reconstruction and discriminative classification in a sequential learning setup.
- To develop a method that leverages generative modeling to preserve knowledge of past tasks.
- To achieve stable performance across sequential tasks using only a distilled summary of past distributions.
Proposed method
- The method uses a variational autoencoder (VAE) with a joint latent variable model for input $x$ and label $y$, factorizing $p(x,y,z) = p(x|z)p(y|z)p(z)$.
- A new variational bound on $\log p(x,y)$ is derived, decomposing into an ELBO for reconstruction and a classification loss over the latent space.
- A student-teacher distillation framework is employed: the student learns on current data and generated data from the teacher, which summarizes past task distributions.
- The loss function includes a KL divergence term to preserve posterior representations of previous tasks and a negative information gain regularizer to align latent representations with generated data.
- The model avoids storing past data or models by generating past-task samples via the teacher, enabling continual learning under strict memory constraints.
- The objective function is optimized end-to-end using mini-batch stochastic gradient descent with early stopping.
Experimental results
Research questions
- RQ1Can a generative model be effectively adapted to continual classification learning while preventing catastrophic forgetting?
- RQ2To what extent can a distilled teacher model replace the need to store past data or models in continual learning?
- RQ3How does the joint optimization of reconstruction and classification loss affect performance on sequential tasks?
- RQ4Does the proposed method maintain high accuracy and low reconstruction error across multiple sequential tasks without forgetting?
- RQ5How does the method compare to baselines like VAE with classifier and EWC in terms of forgetting and accuracy retention?
Key findings
- The proposed CCL-GM model maintains high average classification accuracy across all tasks in the permuted MNIST experiment, even after multiple sequential tasks.
- The model achieves low average negative reconstruction ELBO, indicating strong generative performance across all learned tasks.
- In contrast, the vanilla VAE with classifier (vae-cl) suffers a dramatic performance drop when transitioning to the first permuted task, showing severe forgetting.
- The EWC baseline shows less degradation than vae-cl but still exhibits significant forgetting, highlighting the limitations of regularization without data augmentation.
- On a three-task sequence including MNIST, FashionMNIST, and a permuted MNIST task, CCL-GM outperforms both baselines in classification accuracy and reconstruction quality.
- The method successfully mitigates catastrophic forgetting by generating past-task data via the teacher, enabling continual learning without access to or storage of past data or models.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.