[Paper Review] A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning
CN-DPM introduces a Dirichlet process mixture of neural experts to enable task-free continual learning. It expands experts online under a Bayesian nonparametric framework and handles both discriminative and generative tasks without task boundaries.
Despite the growing interest in continual learning, most of its contemporary works have been studied in a rather restricted setting where tasks are clearly distinguishable, and task boundaries are known during training. However, if our goal is to develop an algorithm that learns as humans do, this setting is far from realistic, and it is essential to develop a methodology that works in a task-free manner. Meanwhile, among several branches of continual learning, expansion-based methods have the advantage of eliminating catastrophic forgetting by allocating new resources to learn new data. In this work, we propose an expansion-based approach for task-free continual learning. Our model, named Continual Neural Dirichlet Process Mixture (CN-DPM), consists of a set of neural network experts that are in charge of a subset of the data. CN-DPM expands the number of experts in a principled way under the Bayesian nonparametric framework. With extensive experiments, we show that our model successfully performs task-free continual learning for both discriminative and generative tasks such as image classification and image generation.
Motivation & Objective
- Motivate and develop a continual learning method that does not rely on explicit task boundaries.
- Propose an expansion-based approach that grows model capacity adaptively to new data via Bayesian nonparametrics.
- Enable both discriminative and generative tasks within a unified CN-DPM framework.
- Prevent catastrophic forgetting by using separate generative and discriminative components per expert.
- Demonstrate competitive performance on standard task-free CL benchmarks (MNIST, SVHN, CIFAR) compared to baselines.
Proposed method
- Formulate task-free continual learning as online variational inference in a Dirichlet process mixture (DPM) of neural experts.
- Each expert contains both a discriminative component p(y|x; φ^D) and a generative component p(x; φ^G) to model p(x,y|z) jointly.
- Use Sequential Variational Approximation (SVA) to update responsibilities and expert parameters online as data arrive.
- Expand the model by creating new experts when incoming data have low responsibility under existing experts, guided by a short-term memory (STM) buffer to collect enough data before expansion.
- Share parameters across experts via lateral connections to mitigate unbounded growth and enable positive transfer, while freezing new expert gradients to preserve prior knowledge.
- Incorporate a gating mechanism derived from p(x; φ^G) and p(z) to infer the expert responsible for a given input, enabling a mixture-of-experts prediction for p(y|x).
Experimental results
Research questions
- RQ1Can task-free continual learning be achieved with an expansion-based approach that automatically determines when to add new experts?
- RQ2How can a gating mechanism infer the appropriate expert without task labels and avoid catastrophic forgetting?
- RQ3Is it possible to support both discriminative and generative tasks within a single CN-DPM framework?
- RQ4Does a Bayesian nonparametric expansion (DPM) scale to multiple benchmarks with non-iid data streams?
- RQ5What practical strategies (STM, parameter sharing, temperature scaling) improve CN-DPM performance and stability?
Key findings
- CN-DPM consistently outperforms competitive baselines in task-free CL across Split-MNIST, MNIST-SVHN, Split-CIFAR10, and Split-CIFAR100 scenarios.
- The model maintains low forgetting, with task-wise classifier performance remaining high after learning all tasks.
- CN-DPM’s expansion-driven growth adapts to data complexity, creating multiple experts as needed to capture new distributions.
- Gating accuracy (via VAEs) is an area for improvement, indicating potential gains from better density estimation for expert selection.
- Parameter-sharing via lateral connections and controlled training of new experts mitigates model bloat and enables positive transfer across tasks.
- CN-DPM demonstrates strong performance in large task-count settings (e.g., Split-CIFAR100) where replay-only methods struggle, and avoids prominent replay-induced overfitting.
- The approach applies to both discriminative (classification) and generative (generation) tasks, illustrating the versatility of the CN-DPM framework.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.