[논문 리뷰] Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models
PAM은 사전 학습된 ResNet의 대부분을 고정시키고 희소하고 가지치기된, 작업별 마지막 레이어를 추가하여 연속 학습을 수행하며, FM 기반 베이스라인보다 훨씬 적은 학습 가능한 파라미터와 전체 파라미터로도 강한 정확도를 달성합니다. 여러 벤치마크에서 최신 FM 기반 CIL 방법을 지속적으로 능가합니다.
The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substantial methodological gap, making it difficult to assess whether recent FM-based CIL progress reflects genuine advances or merely the absence of rigorous baselines. To address this gap, we introduce Pruned Adaptation Modules (PAM), a simple yet effective method that freezes the vast majority of the pre-trained ResNet while enabling scalable continual adaptation through sparse task-specific layers. PAM yields up to a ~5x reduction in trainable parameters and a ~6x reduction in total parameters, significantly reducing the cost of continual updates. Across diverse benchmarks, PAM consistently mitigates catastrophic forgetting and outperforms state-of-the-art FM-based CIL approaches. Our findings position PAM as a strong and transparent baseline that helps bridge the gap between traditional and FM-based CIL, guiding future research for a more accurate assessment of true progress in continual adaptation. The code can be found at: https://github.com/ElifCerenGokYildirim/PAM.
연구 동기 및 목표
- Bridge the gap between traditional ConvNet-based continual learning and foundation-model based methods by providing a lightweight, strong baseline.
- Demonstrate parameter efficiency through freezing most of the backbone and pruning task-specific adaptation modules.
- Show that PAM achieves competitive or superior accuracy across diverse CIL benchmarks while reducing trainable and total parameter counts.
제안 방법
- Freeze the first three layers of a pre-trained ResNet as a shared extractor Φ.
- Attach a task-specific adaptation module γ_b per task and a unified classifier Wᵀ to map to the current task's classes.
- Apply structured pruning to each γ_b after the first training epoch, removing least informative channels based on L1 norm saliency s_c = sum |W_c^i|.
- Replace γ_b with a pruned adaptation module 𝒮_b while keeping Φ and Wᵀ fixed during training on task b.
- Train only 𝒮_b and Wᵀ with cross-entropy loss, preserving prior knowledge in Φ.
- Inference selects the most confident pruned module 𝒮_b by evaluating p_b(x_test) = σ(Wᵀ 𝒮_b(Φ(x_test))) across all tasks and choosing the highest average batch confidence.

실험 결과
연구 질문
- RQ1Can a prune-and-freeze strategy with small task-specific modules outperform modern FM-based continual learning methods?
- RQ2What are the effects of pruning schedule and pruning magnitude on performance and parameter efficiency in PAM?
- RQ3How well does PAM scale across datasets and backbone sizes, and how close can it get to a task-incremental upper bound with implicit task identification?
주요 결과
| Method | Trainable Params Per Task | Total Params After All Tasks | Final Accuracy [%] |
|---|---|---|---|
| L2P | 300 K | 92 M | 80.06 ± 1.1 |
| DualPrompt | 600 K | 98 M | 79.92 ± 0.4 |
| CODA-Prompt | 3 M | 146 M | 81.46 ± 0.3 |
| APER-Adapter | 100 K | 86 M | 84.91 ± 0.2 |
| EASE | 1.2 M | 110 M | 85.97 ± 0.6 |
| PAM (RN18) | 600 K | 15 M | 88.51 ± 3.4 |
| PAM (RN50) | 600 K | 21 M | 92.50 ± 2.1 |
| PAM (RN101) | 600 K | 40 M | 93.05 ± 1.7 |
| PAM (RN152) | 600 K | 56 M | 93.79 ± 1.7 |
- PAM achieves up to 2–5x reduction in trainable parameters and 2–6x reduction in total parameters versus state-of-the-art FM-based CIL methods.
- PAM consistently outperforms adapter- and prompt-based methods across CIFAR-100, CUB-200, ImageNet-R, and Cars-196 benchmarks.
- Using ResNet152 (RN152) as backbone, PAM reaches final accuracies of 93.79% on Cars, 93.05% on ImageNet-R, and 93.03%+ on other settings, with strong stability over long sequences of tasks.
- PAM’s single-module inference (most confident 𝒮_b) often surpasses ensemble strategies, and the approach remains robust as task numbers grow on challenging datasets like ImageNet-R.
- Across parameter sizes, PAM with RN backbones requires far fewer trainable parameters per task (600K) and total parameters (up to 56M for RN152) while delivering competitive or superior final accuracy compared to ViT-based baselines.
- Ablations show early pruning (at epoch 1) and a pruning magnitude around 0.96 yield best results, and confidence-based module selection outperforms distance-based strategies for inference.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.