QUICK REVIEW

[论文解读] Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models

Elif Ceren Gok Yildirim, Murat Onur Yildirim|arXiv (Cornell University)|Mar 22, 2026

Domain Adaptation and Few-Shot Learning被引用 0

一句话总结

PAM 在大多数预训练 ResNet 上冻结，大部分变为稀疏、剪枝、面向任务的最后层，以实现持续学习，在可 trainable 和总参数显著少于基于 FM 的基线的情况下仍保持强准确性。它在若干基准上持续超越最先进的基于 FM 的 CIL 方法。

ABSTRACT

The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substantial methodological gap, making it difficult to assess whether recent FM-based CIL progress reflects genuine advances or merely the absence of rigorous baselines. To address this gap, we introduce Pruned Adaptation Modules (PAM), a simple yet effective method that freezes the vast majority of the pre-trained ResNet while enabling scalable continual adaptation through sparse task-specific layers. PAM yields up to a ~5x reduction in trainable parameters and a ~6x reduction in total parameters, significantly reducing the cost of continual updates. Across diverse benchmarks, PAM consistently mitigates catastrophic forgetting and outperforms state-of-the-art FM-based CIL approaches. Our findings position PAM as a strong and transparent baseline that helps bridge the gap between traditional and FM-based CIL, guiding future research for a more accurate assessment of true progress in continual adaptation. The code can be found at: https://github.com/ElifCerenGokYildirim/PAM.

研究动机与目标

弥合传统基于卷积网络的持续学习与基础模型方法之间的差距，提供一个轻量且性能强的基线。
通过冻结大部分骨干网络并对任务特定的适配模块进行剪枝，展示参数效率。
在多样化的 CIL 基准上展示 PAM 达到具有竞争力或更优的准确性，同时减少可训练和总参数数量。

提出的方法

冻结预训练 ResNet 的前 3 层，作为共享特征提取器 Φ。
为每个任务附加一个任务特定的适配模块 γ_b，并使用统一分类器 Wᵀ，将输出映射到当前任务的类别。
在第一轮训练周期后，对每个 γ_b 进行结构化剪枝，基于 L1 范数显著性 s_c = sum |W_c^i|，移除信息量最少的通道。
在任务 b 的训练过程中，用剪枝后的适配模块 𝒮_b 替换 γ_b，同时保持 Φ 和 Wᵀ 固定。
仅训练 𝒮_b 和 Wᵀ，使用交叉熵损失，保留 Φ 中的先验知识。
推理阶段通过在所有任务上评估 p_b(x_test) = σ(Wᵀ 𝒮_b(Φ(x_test)))，选择最有信心的剪枝模块 𝒮_b。

Figure 1: PAM is a simple yet powerful bridge that challenges the progress in FM–based CIL. It achieves better accuracy with ResNets, which significantly reduces runtime and parameters.

实验结果

研究问题

RQ1剪枝并冻结策略配合小型任务特定模块，能否超越现代基于 FM 的持续学习方法？
RQ2剪枝计划与剪枝幅度对 PAM 的性能与参数效率有何影响？
RQ3PAM 在不同数据集和骨干尺寸上的扩展性如何，且在隐式任务识别下能否达到任务增量上界的接近程度？

主要发现

Method	Trainable Params Per Task	Total Params After All Tasks	Final Accuracy [%]
L2P	300 K	92 M	80.06 ± 1.1
DualPrompt	600 K	98 M	79.92 ± 0.4
CODA-Prompt	3 M	146 M	81.46 ± 0.3
APER-Adapter	100 K	86 M	84.91 ± 0.2
EASE	1.2 M	110 M	85.97 ± 0.6
PAM (RN18)	600 K	15 M	88.51 ± 3.4
PAM (RN50)	600 K	21 M	92.50 ± 2.1
PAM (RN101)	600 K	40 M	93.05 ± 1.7
PAM (RN152)	600 K	56 M	93.79 ± 1.7

PAM 相较于最先进的基于 FM 的 CIL 方法，在可训练参数上实现 2–5 倍的减少，在总参数上实现 2–6 倍的减少。
PAM 在 CIFAR-100、CUB-200、ImageNet-R 与 Cars-196 等基准上持续超越基于适配器和提示的方法。
以 ResNet152（RN152）为骨干，PAM 在 Cars 上达到最终准确度 93.79%，在 ImageNet-R 上达到 93.05%，在其他设置上达到 93.03%+，且对较长任务序列表现稳定。
PAM 的单模块推理（最有信心的 𝒮_b）通常超过集成策略，在任务数量增加的挑战性数据集如 ImageNet-R 上也保持鲁棒。
在参数规模方面，使用 RN 骨干的 PAM 每任务的可训练参数远少于 600K，总参数最多达到 56M（RN152），并且最终准确性与基于 ViT 的基线相比具有竞争力或更优。
消融分析表明，早期剪枝（在第 1 轮）和剪枝幅度约为 0.96 能带来最佳结果，基于置信度的模块选择在推理中优于基于距离的策略。

Figure 2: PAM freezes the first three layers of a pre-trained ResNet to preserve general knowledge while dynamically adding a task-specific last layer for each new task. To improve parameter efficiency, each last layer is structurally pruned to become ‘slim’ before training on its corresponding task

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。