[论文解读] Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality
本论文提出 HiDe-Prompt,一种层次化分解方法,明确在任务内预测、任务身份推断和任务自适应预测方面进行优化,用于基于提示的持续学习,在自监督预训练下达到最先进的结果。
Prompt-based continual learning is an emerging direction in leveraging pre-trained knowledge for downstream continual learning, and has almost reached the performance pinnacle under supervised pre-training. However, our empirical research reveals that the current strategies fall short of their full potential under the more realistic self-supervised pre-training, which is essential for handling vast quantities of unlabeled data in practice. This is largely due to the difficulty of task-specific knowledge being incorporated into instructed representations via prompt parameters and predicted by uninstructed representations at test time. To overcome the exposed sub-optimality, we conduct a theoretical analysis of the continual learning objective in the context of pre-training, and decompose it into hierarchical components: within-task prediction, task-identity inference, and task-adaptive prediction. Following these empirical and theoretical insights, we propose Hierarchical Decomposition (HiDe-)Prompt, an innovative approach that explicitly optimizes the hierarchical components with an ensemble of task-specific prompts and statistics of both uninstructed and instructed representations, further with the coordination of a contrastive regularization strategy. Our extensive experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning (e.g., up to 15.01% and 9.61% lead on Split CIFAR-100 and Split ImageNet-R, respectively). Our code is available at \url{https://github.com/thu-ml/HiDe-Prompt}.
研究动机与目标
- 激励在现实自监督预训练设置下研究基于提示的持续学习。
- 从理论上将持续学习目标分解为分层组件:任务内预测、任务身份推断、以及任务自适应预测。
- 提出 HiDe-Prompt,通过任务特异性提示和表示统计显式优化分层组件。
- 引入对比正则化策略以协调分层组件。
- 在多个基准上展示经验增益,显示对预训练范式的鲁棒性。
提出的方法
- 问题被设定为无复现回放的持续学习,包含一个冻结的预训练主干网络和任务特异性提示。
- 回顾并比较基于提示的方法(ProT 与 PreT),强调如何从无指示的表示中推断任务身份。
- HiDe-Prompt 扩展一个任务特异性提示池,并使用提示集合将知识迁移到新任务,同时缓解遗忘。
- 通过专用分支优化 WTP、TII、TAP:WTP 使用带对比正则项的交叉熵,利用旧任务统计;TII 使用辅助的持续自适应输出层从未指示表示中预测任务身份;TAP 使用对所有已见类别适配的输出头。
- 将每个类别的表征统计建模(以高斯为中心),以实现基于分布的预测;交叉熵损失 H_WTP、H_TII、H_TAP 指导分层优化(方程6–12)。
- 测试时,模型通过辅助的 TII 路径选择任务身份,然后通过任务特定提示预测标签。
实验结果
研究问题
- RQ1预训练范式(自监督 vs 监督)如何影响基于提示的持续学习的有效性?
- RQ2在自监督预训练下,将持续学习目标分解为 WTP、TII、TAP 的分层是否能带来更好表现?
- RQ3如何组织并规范化任务特异性提示,以在实现知识迁移的同时避免灾难性遗忘?
- RQ4对未指示/指示表示进行统计建模(如高斯分布)是否能实现跨任务的有效任务身份和类别预测?
主要发现
| PTM | Method | Split CIFAR-100 FAA | Split CIFAR-100 CAA | Split CIFAR-100 FFM | Split ImageNet-R FAA | Split ImageNet-R CAA | Split ImageNet-R FFM |
|---|---|---|---|---|---|---|---|
| Sup-21K | HiDe-Prompt (Ours) | 92.61 ± 0.28 | 94.03 ± 0.01 | 3.16 ± 0.10 | 75.06 ± 0.12 | 76.60 ± 0.01 | 2.17 ± 0.19 |
| Sup-21K | L2P [41] | 83.06 ± 0.17 | 88.25 ± 0.01 | 6.58 ± 0.40 | 63.65 ± 0.12 | 67.25 ± 0.02 | 7.51 ± 0.17 |
| Sup-21K | DualPrompt [40] | 86.60 ± 0.19 | 90.64 ± 0.01 | 4.45 ± 0.16 | 68.79 ± 0.31 | 71.96 ± 0.04 | 4.49 ± 0.14 |
| Sup-21K | S-Prompt++ [39] | 88.81 ± 0.18 | 92.25 ± 0.03 | 3.87 ± 0.05 | 69.68 ± 0.12 | 72.50 ± 0.04 | 3.29 ± 0.05 |
| Sup-21K | CODA-Prompt [30] ∗ | 86.94 ± 0.63 | 91.57 ± 0.75 | 4.04 ± 0.18 | 70.03 ± 0.47 | 74.26 ± 0.24 | 5.17 ± 0.22 |
| iBOT-21K | HiDe-Prompt (Ours) | 93.02 ± 0.15 | 94.56 ± 0.05 | 1.33 ± 0.24 | 70.83 ± 0.17 | 73.23 ± 0.08 | 2.46 ± 0.21 |
| iBOT-21K | L2P [41] | 79.00 ± 0.28 | 85.13 ± 0.05 | 5.55 ± 0.36 | 55.35 ± 0.28 | 58.62 ± 0.05 | 3.73 ± 0.53 |
| iBOT-21K | DualPrompt [40] | 78.76 ± 0.23 | 86.16 ± 0.02 | 9.84 ± 0.24 | 54.55 ± 0.53 | 58.69 ± 0.01 | 5.38 ± 0.70 |
| iBOT-21K | S-Prompt++ [39] | 79.14 ± 0.65 | 85.85 ± 0.17 | 9.17 ± 1.33 | 55.16 ± 0.83 | 58.48 ± 0.18 | 4.07 ± 0.16 |
| iBOT-21K | CODA-Prompt [30] | 80.83 ± 0.27 | 87.02 ± 0.20 | 7.50 ± 0.25 | 61.22 ± 0.35 | 66.76 ± 0.37 | 9.66 ± 0.20 |
| iBOT-21K | HiDe-Prompt (Ours) | 93.68 ± 0.15 | 94.56 ± 0.05 | 1.21 ± 0.24 | 71.33 ± 0.21 | 73.62 ± 0.13 | 2.79 ± 0.26 |
| iBOT-1K | HiDe-Prompt (Ours) | 93.48 ± 0.11 | 95.02 ± 0.01 | 1.00 ± 0.24 | 71.33 ± 0.21 | 73.62 ± 0.13 | 2.79 ± 0.26 |
| iBOT-1K | L2P [41] | 75.57 ± 0.41 | 82.69 ± 0.06 | 7.23 ± 0.93 | 60.97 ± 0.26 | 65.95 ± 0.02 | 4.07 ± 0.66 |
| iBOT-1K | DualPrompt [40] | 76.63 ± 0.05 | 85.08 ± 0.12 | 8.41 ± 0.40 | 61.51 ± 1.05 | 67.11 ± 0.08 | 5.02 ± 0.52 |
| iBOT-1K | S-Prompt++ [39] | 77.53 ± 0.56 | 85.66 ± 0.16 | 8.07 ± 0.97 | 60.82 ± 0.68 | 66.03 ± 0.91 | 4.16 ± 0.14 |
| iBOT-1K | CODA-Prompt [30] | 79.11 ± 1.02 | 86.21 ± 0.49 | 7.69 ± 1.57 | 66.56 ± 0.68 | 73.14 ± 0.57 | 7.22 ± 0.38 |
| iBOT-1K | HiDe-Prompt (Ours) | 93.56 ± 0.12 | 94.95 ± 0.04 | 1.12 ± 0.21 | 71.21 ± 0.20 | 73.50 ± 0.12 | 2.65 ± 0.25 |
| DINO-1K | HiDe-Prompt (Ours) | 92.51 ± 0.11 | 94.25 ± 0.01 | 0.99 ± 0.21 | 68.11 ± 0.18 | 71.70 ± 0.01 | 3.11 ± 0.17 |
| MoCo-1K | HiDe-Prompt (Ours) | 91.57 ± 0.20 | 93.70 ± 0.01 | 1.19 ± 0.18 | 63.77 ± 0.49 | 68.26 ± 0.01 | 3.57 ± 0.96 |
- 若仅优化提示而不进行分层协调,基于提示的持续学习在自监督预训练下会退化。
- HiDe-Prompt 在多个基准下达到最先进结果,如 Split CIFAR-100 和 Split ImageNet-R,在不同预训练范式下。
- 与强基线相比,HiDe-Prompt 显著提升,例如在 Split CIFAR-100 上 FAA 提升最高 15.01%、在 Split ImageNet-R 上提升 9.61%。
- 使用任务特异性提示集合、提示集合和旧任务统计,以及对比正则化,有助于提升 WTP,并使 TAP 与旧任务对齐。
- 辅助 TII 和适应的 TAP 头持续改善身份推断和跨任务类别预测,提升 CIL 性能。
- 在诸如 Sup-21K、iBOT-21K、iBOT-1K、DINO-1K、MoCo-1K 等 PTM 下,HiDe-Prompt 始终优于 L2P、DualPrompt、S-Prompt++、CODA-Prompt(见表1)。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。