QUICK REVIEW

[论文解读] Divide and not forget: Ensemble of selectively trained experts in Continual Learning

Grzegorz Rypeść, Sebastian Cygert|arXiv (Cornell University)|Jan 18, 2024

Domain Adaptation and Few-Shot Learning被引用 11

一句话总结

SEED 引入了一种无示例的持续学习方法，该方法将多个专家进行集成，并且在每个新任务中仅微调一个专家，使用高斯类表示来选择最佳专家并对任务无关和任务感知设置的预测进行集成。

ABSTRACT

Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.

研究动机与目标

推动无示例的类别增量学习（CIL），在减轻遗忘的同时保持可塑性。
提出 SEED，一个固定专家的集成，其中每个任务仅对一个专家进行微调，以最小化遗忘。
在潜在空间中用多元高斯分布来表示专家中的每个类别，以实现专家选择与推理。
倡导专家之间的多样化，以在分布漂移和跨任务中提升性能。

提出的方法

SEED 使用 K 个深度网络专家 g_k ∘ f，具有共享的初始层 f；f 在第一个任务后被冻结。
每个专家在其潜在空间中对每个类别 c 拥有一个高斯分布 G_k^c = (μ_k^c, Σ_k^c)。
推断在每个专家的类别高斯下计算潜在表示的对数似然，并对各专家的对数似然经 softmax 处理后取平均以得到预测。
在训练过程中，对于每个新任务 t，选择潜在类别分布重叠最少的专家（通过对称化 KL 散度），仅对该专家进行微调，使用交叉熵损失加特征蒸馏（L_KD）。
专家选择使用基于 KL 的准则，在该任务的类别集合内最大化类别间分布距离。
完整的 SEED 流程包括：(i) 在每个专家的潜在空间中计算每个类别的高斯分布，(ii) 为新任务选择最佳用于微调的专家，(iii) 更新所选专家的高斯分布，(iv) 在第一个任务后固定 f，以防止跨任务漂移。

Figure 1: Exemplar-free Class Incremental Learning methods evaluated on CIFAR100 divided into eleven tasks for two different data distributions.

实验结果

研究问题

RQ1一个无示例的 CIL 方法是否能够通过仅针对每个任务选择性地训练一个专家来达到最先进的准确率？
RQ2在不同任务划分和领域漂移下，强制固定专家集合的多样性是否能改善稳定性-可塑性权衡？
RQ3在每个专家内基于高斯的类别表示如何帮助在跨任务中进行专家选择和鲁棒推理？
RQ4共享特征层数量和专家数量对性能和参数效率有何影响？

主要发现

SEED 在多项基准与任务划分中实现了无示例 CIL 方法的最先进准确率。
在等分任务情景和显著领域漂移（DomainNet）下，该方法显著超越竞争对手。
五专家的 SEED 设置，具有共享层和选择性微调，在每个任务使用相对较少的参数的情况下表现出色。
消融研究表明多元高斯表示和基于 KL 的专家选择对 SEED 的性能至关重要，完整设计带来最佳结果。
多样性自然涌现：每个专家专注于不同任务，集成始终优于最强的单一专家。

Figure 2: SEED comprises $K$ deep network experts $g_{k}\circ f$ (here $K=2$ ), sharing the initial layers $f$ for higher computational performance. $f$ are frozen after the first task. Each expert contains one Gaussian distribution per class $c\in C$ in his unique latent space. In this example, we

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。