[论文解读] Revisiting Weight Regularization for Low-Rank Continual Learning
这篇论文提出了 EWC-LoRA,一种基于权重正则化的低秩持续学习方法,针对大规模预训练模型在不随任务增加而增加内存的情况下实现了较好的稳定性–可塑性权衡。
Continual Learning (CL) with large-scale pre-trained models (PTMs) has recently gained wide attention, shifting the focus from training from scratch to continually adapting PTMs. This has given rise to a promising paradigm: parameter-efficient continual learning (PECL), where task interference is typically mitigated by assigning a task-specific module during training, such as low-rank adapters. However, weight regularization techniques, such as Elastic Weight Consolidation (EWC)-a key strategy in CL-remain underexplored in this new paradigm. In this paper, we revisit weight regularization in low-rank CL as a new perspective for mitigating task interference in PECL. Unlike existing low-rank CL methods, we mitigate task interference by regularizing a shared low-rank update through EWC, thereby keeping the storage requirement and inference costs constant regardless of the number of tasks. Our proposed method EWC-LoRA leverages a low-rank representation to estimate parameter importance over the full-dimensional space. This design offers a practical, computational- and memory-efficient solution for CL with PTMs, and provides insights that may inform the broader application of regularization techniques within PECL. Extensive experiments on various benchmarks demonstrate the effectiveness of EWC-LoRA, achieving a stability-plasticity trade-off superior to existing low-rank CL approaches. These results indicate that, even under low-rank parameterizations, weight regularization remains an effective mechanism for mitigating task interference. Code is available at: https://github.com/yaoyz96/low-rank-cl.
研究动机与目标
- Motivate the use of weight regularization in parameter-efficient continual learning (PECL) with large pre-trained models.
- Propose a principled way to apply Elastic Weight Consolidation (EWC) within a low-rank adaptation framework.
- Develop EWC-LoRA to regulate a shared low-rank update using full-dimensional Fisher information.
- Demonstrate improved stability-plasticity trade-offs and practical efficiency over existing low-rank CL methods.
提出的方法
- Represent the weight update as a low-rank product Delta W = AB to limit trainable parameters.
- Regularize the low-rank update in the full-dimensional space using a diagonal Fisher information matrix calculated on the full W-space, not per-task subspaces.
- Estimate the Fisher information F_t over the full-dimensional space for W_t* and accumulate it across tasks to form F_t^{cum}.
- Merge the learned low-rank update into the backbone after each task so memory stays constant with the number of tasks.
- Evaluate EWC-LoRA on both vision (CIFAR-100, DomainNet, ImageNet-R, ImageNet-A) and language (T5-large, LLaMA-3.2) benchmarks, comparing against other LoRA-based and PECL methods.

实验结果
研究问题
- RQ1Can weight regularization (EWC) be effectively integrated with low-rank adaptations in PTM-based continual learning?
- RQ2Does estimating Fisher information in the full parameter space, while updating a low-rank update, yield better stability-plasticity trade-offs than naive regularization in the low-rank space?
- RQ3How does EWC-LoRA perform compared to state-of-the-art PECL methods in terms of accuracy, stability, plasticity, and efficiency on diverse datasets?
主要发现
| Methods | CIFAR-100: A10 (↑) | CIFAR-100: Avg (↑) | DomainNet: A5 (↑) | DomainNet: Avg (↑) | ImageNet-R: A10 (↑) | ImageNet-R: Avg (↑) | ImageNet-A: A10 (↑) | ImageNet-A: Avg (↑) |
|---|---|---|---|---|---|---|---|---|
| EWC-LoRA | 87.91 | 92.27 | 73.46 | 79.58 | 72.86 | 78.95 | 59.89 | 68.33 |
- EWC-LoRA achieves higher final accuracy than vanilla LoRA on multiple datasets, with an average improvement of 8.92%.
- Across four datasets, EWC-LoRA often achieves the best final accuracy and competitive stability and plasticity, outperforming task-specific low-rank methods.
- EWC-LoRA demonstrates a favorable stability–plasticity trade-off and remains memory-efficient, since it uses a single shared LoRA module and only stores a diagonal Fisher for regularization.
- Using a unified regularization strength (lambda = 1e7) yields robust performance without dataset-specific tuning.
- On language tasks, EWC-LoRA provides comparable or superior results to LO-RA-based baselines when tested with T5-large and LLaMA-3.2-1B-Instruct.
- Ablation studies show that regularizing the full-dimensional W-space via the AB product outperforms per-component regularization of A and B or using a precomputed F_W.

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。