[论文解读] RanPAC: Random Projections and Pre-trained Models for Continual Learning
RanPAC 引入一个冻结的预训练模型特征与类别原型输出头之间的训练-free 随机投影层,通过去相关化原型并使用 Gram-matrix 基于标定在没有回放记忆的情况下提升持续学习。
Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 20% and 62% on seven class-incremental benchmarks, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast CL has not hitherto been fully tapped. Code is at github.com/RanPAC/RanPAC.
研究动机与目标
- 探索在预训练模型下,训练-free 的随机投影是否能改善基于类别原型的持续学习。
- 展示去相关化类别原型如何降低任务之间的分布偏移。
- 证明与参数高效迁移学习(PETL)和首次会话自适应的兼容性。
- 在不使用回放记忆的情况下,在类增量和域增量基准上评估性能。
提出的方法
- 在预训练特征表示与基于CP的输出头之间插入一个冻结的随机投影(RP)层,并带有非线性激活。
- 使用基于 Gram 矩阵的校准和岭回归来计算类别分数,跨任务增量更新 G 和 C。
- 去相关化类别原型以降低类间相关性并提高可分离度。
- 将 RP 与在首个会话上训练后再冻结的 PETL 方法结合,以弥合域差。
- 通过冻结 RP 权重并使用闭式岭似解来最终得分(G + lambda I)^{-1} * C,保持训练效率。
实验结果
研究问题
- RQ1在预训练模型的持续学习中,冻结的随机投影层是否能提高类别原型的线性可分性?
- RQ2在域和类别增量下,类别原型的去相关化如何影响 CP 基持续学习中的校准和准确性?
- RQ3RP 加 PETL 能否在不同的持续学习基准上实现接近或超过联合训练的无需回放的性能?
- RQ4RanPAC 是否与不同骨干网络(ViT、ResNet、CLIP)以及在各种CL场景中的 PETL 方法兼容?
主要发现
| Method | CIFAR100 | IN-R | IN-A | CUB | OB | VTAB | Cars |
|---|---|---|---|---|---|---|---|
| Joint linear probe | 87.9% | 72.0% | 56.6% | 88.7% | 78.5% | 86.7% | 51.7% |
| L2P | 84.6% | 72.4% | 42.5% | 65.2% | 64.7% | 77.1% | 38.2%* |
| DualPrompt | 84.1% | 71.0% | 45.4% | 68.5% | 65.5% | 81.2% | 40.1%* |
| CODA-Prompt | 86.3% | 75.5% | 44.5% | 79.5% | 68.7% | 87.4% | 43.2% |
| ADaM | 87.6% | 72.3% | 52.6% | 87.1% | 74.3% | 84.3% | 41.4% |
| Ours ( Algorithm 1 ) | 92.2% | 77.9% | 62.4% | 90.3% | 79.9% | 92.2% | 77.5% |
- RanPAC 在类增量基准上对 CP 基线的错误率下降显著(11%–28%),配合 PETL。
- 对于 ViT-B/16 模型,RanPAC 在若干类增量和域增量数据集上实现了最高的无需回放的准确率,相较于现有 CP 方法。
- 使用充足维度(M)和非线性激活的非线性 RP 层显著提升类分离度和最终准确率。
- 通过 Gram 基方法去相关化类别原型,降低类间相关性并使 CP 与联合训练的线性探针对齐;这提升了校准。
- RanPAC 在各种 CL 场景(类增量、域增量、任务无关)中有效,且与任意特征向量(ViT、ResNet、CLIP)兼容。
- 与CPU 密集或记忆 heavier 的回放方法相比,RanPAC 在不保留过去数据的情况下也能取得强性能。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。