QUICK REVIEW

[论文解读] Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality

Liyuan Wang, Jingyi Xie|arXiv (Cornell University)|Oct 11, 2023

Domain Adaptation and Few-Shot Learning被引用 18

一句话总结

本论文提出 HiDe-Prompt，一种层次化分解方法，明确在任务内预测、任务身份推断和任务自适应预测方面进行优化，用于基于提示的持续学习，在自监督预训练下达到最先进的结果。

ABSTRACT

Prompt-based continual learning is an emerging direction in leveraging pre-trained knowledge for downstream continual learning, and has almost reached the performance pinnacle under supervised pre-training. However, our empirical research reveals that the current strategies fall short of their full potential under the more realistic self-supervised pre-training, which is essential for handling vast quantities of unlabeled data in practice. This is largely due to the difficulty of task-specific knowledge being incorporated into instructed representations via prompt parameters and predicted by uninstructed representations at test time. To overcome the exposed sub-optimality, we conduct a theoretical analysis of the continual learning objective in the context of pre-training, and decompose it into hierarchical components: within-task prediction, task-identity inference, and task-adaptive prediction. Following these empirical and theoretical insights, we propose Hierarchical Decomposition (HiDe-)Prompt, an innovative approach that explicitly optimizes the hierarchical components with an ensemble of task-specific prompts and statistics of both uninstructed and instructed representations, further with the coordination of a contrastive regularization strategy. Our extensive experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning (e.g., up to 15.01% and 9.61% lead on Split CIFAR-100 and Split ImageNet-R, respectively). Our code is available at \url{https://github.com/thu-ml/HiDe-Prompt}.

研究动机与目标

激励在现实自监督预训练设置下研究基于提示的持续学习。
从理论上将持续学习目标分解为分层组件：任务内预测、任务身份推断、以及任务自适应预测。
提出 HiDe-Prompt，通过任务特异性提示和表示统计显式优化分层组件。
引入对比正则化策略以协调分层组件。
在多个基准上展示经验增益，显示对预训练范式的鲁棒性。

提出的方法

问题被设定为无复现回放的持续学习，包含一个冻结的预训练主干网络和任务特异性提示。
回顾并比较基于提示的方法（ProT 与 PreT），强调如何从无指示的表示中推断任务身份。
HiDe-Prompt 扩展一个任务特异性提示池，并使用提示集合将知识迁移到新任务，同时缓解遗忘。
通过专用分支优化 WTP、TII、TAP：WTP 使用带对比正则项的交叉熵，利用旧任务统计；TII 使用辅助的持续自适应输出层从未指示表示中预测任务身份；TAP 使用对所有已见类别适配的输出头。
将每个类别的表征统计建模（以高斯为中心），以实现基于分布的预测；交叉熵损失 H_WTP、H_TII、H_TAP 指导分层优化（方程6–12）。
测试时，模型通过辅助的 TII 路径选择任务身份，然后通过任务特定提示预测标签。

实验结果

研究问题

RQ1预训练范式（自监督 vs 监督）如何影响基于提示的持续学习的有效性？
RQ2在自监督预训练下，将持续学习目标分解为 WTP、TII、TAP 的分层是否能带来更好表现？
RQ3如何组织并规范化任务特异性提示，以在实现知识迁移的同时避免灾难性遗忘？
RQ4对未指示/指示表示进行统计建模（如高斯分布）是否能实现跨任务的有效任务身份和类别预测？

主要发现

PTM	Method	Split CIFAR-100 FAA	Split CIFAR-100 CAA	Split CIFAR-100 FFM	Split ImageNet-R FAA	Split ImageNet-R CAA	Split ImageNet-R FFM
Sup-21K	HiDe-Prompt (Ours)	92.61 ± 0.28	94.03 ± 0.01	3.16 ± 0.10	75.06 ± 0.12	76.60 ± 0.01	2.17 ± 0.19
Sup-21K	L2P [41]	83.06 ± 0.17	88.25 ± 0.01	6.58 ± 0.40	63.65 ± 0.12	67.25 ± 0.02	7.51 ± 0.17
Sup-21K	DualPrompt [40]	86.60 ± 0.19	90.64 ± 0.01	4.45 ± 0.16	68.79 ± 0.31	71.96 ± 0.04	4.49 ± 0.14
Sup-21K	S-Prompt++ [39]	88.81 ± 0.18	92.25 ± 0.03	3.87 ± 0.05	69.68 ± 0.12	72.50 ± 0.04	3.29 ± 0.05
Sup-21K	CODA-Prompt [30] ∗	86.94 ± 0.63	91.57 ± 0.75	4.04 ± 0.18	70.03 ± 0.47	74.26 ± 0.24	5.17 ± 0.22
iBOT-21K	HiDe-Prompt (Ours)	93.02 ± 0.15	94.56 ± 0.05	1.33 ± 0.24	70.83 ± 0.17	73.23 ± 0.08	2.46 ± 0.21
iBOT-21K	L2P [41]	79.00 ± 0.28	85.13 ± 0.05	5.55 ± 0.36	55.35 ± 0.28	58.62 ± 0.05	3.73 ± 0.53
iBOT-21K	DualPrompt [40]	78.76 ± 0.23	86.16 ± 0.02	9.84 ± 0.24	54.55 ± 0.53	58.69 ± 0.01	5.38 ± 0.70
iBOT-21K	S-Prompt++ [39]	79.14 ± 0.65	85.85 ± 0.17	9.17 ± 1.33	55.16 ± 0.83	58.48 ± 0.18	4.07 ± 0.16
iBOT-21K	CODA-Prompt [30]	80.83 ± 0.27	87.02 ± 0.20	7.50 ± 0.25	61.22 ± 0.35	66.76 ± 0.37	9.66 ± 0.20
iBOT-21K	HiDe-Prompt (Ours)	93.68 ± 0.15	94.56 ± 0.05	1.21 ± 0.24	71.33 ± 0.21	73.62 ± 0.13	2.79 ± 0.26
iBOT-1K	HiDe-Prompt (Ours)	93.48 ± 0.11	95.02 ± 0.01	1.00 ± 0.24	71.33 ± 0.21	73.62 ± 0.13	2.79 ± 0.26
iBOT-1K	L2P [41]	75.57 ± 0.41	82.69 ± 0.06	7.23 ± 0.93	60.97 ± 0.26	65.95 ± 0.02	4.07 ± 0.66
iBOT-1K	DualPrompt [40]	76.63 ± 0.05	85.08 ± 0.12	8.41 ± 0.40	61.51 ± 1.05	67.11 ± 0.08	5.02 ± 0.52
iBOT-1K	S-Prompt++ [39]	77.53 ± 0.56	85.66 ± 0.16	8.07 ± 0.97	60.82 ± 0.68	66.03 ± 0.91	4.16 ± 0.14
iBOT-1K	CODA-Prompt [30]	79.11 ± 1.02	86.21 ± 0.49	7.69 ± 1.57	66.56 ± 0.68	73.14 ± 0.57	7.22 ± 0.38
iBOT-1K	HiDe-Prompt (Ours)	93.56 ± 0.12	94.95 ± 0.04	1.12 ± 0.21	71.21 ± 0.20	73.50 ± 0.12	2.65 ± 0.25
DINO-1K	HiDe-Prompt (Ours)	92.51 ± 0.11	94.25 ± 0.01	0.99 ± 0.21	68.11 ± 0.18	71.70 ± 0.01	3.11 ± 0.17
MoCo-1K	HiDe-Prompt (Ours)	91.57 ± 0.20	93.70 ± 0.01	1.19 ± 0.18	63.77 ± 0.49	68.26 ± 0.01	3.57 ± 0.96

若仅优化提示而不进行分层协调，基于提示的持续学习在自监督预训练下会退化。
HiDe-Prompt 在多个基准下达到最先进结果，如 Split CIFAR-100 和 Split ImageNet-R，在不同预训练范式下。
与强基线相比，HiDe-Prompt 显著提升，例如在 Split CIFAR-100 上 FAA 提升最高 15.01%、在 Split ImageNet-R 上提升 9.61%。
使用任务特异性提示集合、提示集合和旧任务统计，以及对比正则化，有助于提升 WTP，并使 TAP 与旧任务对齐。
辅助 TII 和适应的 TAP 头持续改善身份推断和跨任务类别预测，提升 CIL 性能。
在诸如 Sup-21K、iBOT-21K、iBOT-1K、DINO-1K、MoCo-1K 等 PTM 下，HiDe-Prompt 始终优于 L2P、DualPrompt、S-Prompt++、CODA-Prompt（见表1）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。