[Paper Review] Toward Understanding Catastrophic Forgetting in Continual Learning
The paper introduces a general procedure to study how task-sequence properties relate to catastrophic forgetting, applying it to total complexity and sequential heterogeneity using Task2Vec embeddings, and reports correlations with final error rates for SI, VCL, and coreset VCL on MNIST and CIFAR-10.
We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlation analysis to specify and analyze the properties we are interested in. As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity. We show that error rates are strongly and positively correlated to a task sequence's total complexity for some state-of-the-art algorithms. We also show that, surprisingly, the error rates have no or even negative correlations in some cases to sequential heterogeneity. Our findings suggest directions for improving continual learning benchmarks and methods.
Motivation & Objective
- Understand which properties of a sequence of tasks influence continual learning error rates.
- Propose a general procedure to quantify task-sequence properties via task-space embeddings.
- Apply the procedure to two properties: total complexity and sequential heterogeneity.
- Analyze correlations between these properties and final error rates of state-of-the-art continual learning methods.
Proposed method
- Use Task2Vec to map tasks to embedding vectors from a pre-trained probe network.
- Define task-level complexity C(t) as the distance to a trivial task, C(t)=d(e_t,e_0).
- Define total complexity C(T)=sum_t C(t) for a task sequence T.
- Define sequential heterogeneity F(T)=sum of pairwise dissimilarities of consecutive tasks, F(t_i,t_{i+1})=d(e_{t_i},e_{t_{i+1}}).
- Measure actual hardness H_A(T) as the final error rate of a continual learning algorithm A trained on the sequence.
- Compute Pearson correlations between (C(T),F(T)) and H_A(T) across multiple sequences; control for sequence length and complexity as appropriate.
- Experiment with SI, VCL, and coreset VCL on MNIST and CIFAR-10 using multiple task sequences and multi-head settings.
Experimental results
Research questions
- RQ1Which properties of a task sequence (e.g., total complexity, sequential heterogeneity) correlate with the hardness of continual learning?
- RQ2Does task sequence complexity primarily drive forgetting, or do dissimilarity between consecutive tasks play a significant role?
- RQ3How do modern continual learning algorithms (SI, VCL, coreset VCL) respond to variations in these sequence properties?
Key findings
| Variable | Algorithm | MNIST-256^2 | MNIST-50 | MNIST-20 | CIFAR-10 |
|---|---|---|---|---|---|
| (a) Total complexity | SI | 0.24 (p<0.01) | 0.22 (p<0.05) | 0.36 (p<0.01) | 0.86 (p<0.01) |
| (a) Total complexity | VCL | 0.05 (p=0.59) | 0.17 (p=0.07) | 0.21 (p<0.05) | 0.69 (p<0.01) |
| (a) Total complexity | Coreset VCL | 0.28 (p<0.01) | 0.41 (p<0.01) | 0.37 (p<0.01) | - |
| (b) Sequential heterogeneity | SI | -0.01 (p=0.86) | 0.05 (p=0.55) | 0.07 (p=0.48) | 0.30 (p<0.01) |
| (b) Sequential heterogeneity | VCL | 0.04 (p=0.69) | 0.01 (p=0.88) | 0.05 (p=0.58) | 0.21 (p<0.05) |
| (b) Sequential heterogeneity | Coreset VCL | 0.09 (p=0.31) | 0.12 (p=0.18) | 0.18 (p=0.05) | - |
| (c) Normalized sequential heterogeneity | SI | -0.07 (p=0.43) | -0.04 (p=0.65) | 0.05 (p=0.58) | -0.25 (p<0.01) |
| (c) Normalized sequential heterogeneity | VCL | 0.03 (p=0.76) | -0.20 (p<0.05) | -0.21 (p<0.05) | -0.17 (p=0.06) |
| (c) Normalized sequential heterogeneity | Coreset VCL | -0.08 (p=0.37) | -0.26 (p<0.01) | -0.16 (p=0.07) | - |
- Total complexity shows strong positive correlations with final error rates for CIFAR-10 across SI, VCL, and coreset VCL (e.g., SI r=0.86, p<0.01).
- On MNIST, correlations between total complexity and error rate are weaker but still positive, strengthening as model capacity decreases.
- Sequential heterogeneity has weaker or mixed correlations with error rates, sometimes showing negative correlations when using normalized sequential heterogeneity.
- Negative correlations for normalized sequential heterogeneity suggest that greater dissimilarity between consecutive tasks can in some cases improve continual learning performance.
- Coreset VCL exhibits strong positive correlation between error rate and total complexity across all tested CIFAR-10 and MNIST configurations (e.g., MNIST-256^2: 0.28, p<0.01; CIFAR-10: 0.69, p<0.01).
- The results imply that task complexity should be considered when designing benchmarks and algorithms, and that transfers between tasks may benefit from customization to task pairs.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.