QUICK REVIEW

[論文レビュー] Divide and not forget: Ensemble of selectively trained experts in Continual Learning

Grzegorz Rypeść, Sebastian Cygert|arXiv (Cornell University)|Jan 18, 2024

Domain Adaptation and Few-Shot Learning被引用数 11

ひとこと要約

SEED は exemplar-free 連続学習法であり、複数の専門家をアンサンブルし、新しいタスクごとに1つの専門家のみを微調整する。 Gaussian クラス表現を用いて最適な専門家を選択し、タスク非依存とタスク依存の設定でアンサンブル予測を行う。

ABSTRACT

Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.

研究の動機と目的

忘却を抑制しつつ可塑性を維持するための exemplar-free クラス逐次学習(CIL) の動機づけ。
忘却を最小化するように各タスクで1つだけ微調整する固定専門家のアンサンブル SEED を提案。
潜在空間における各専門家内の各クラスを多変量ガウスで表現し、専門家の選択と推論を可能にする。
分布シフトやタスク間での性能向上のために専門家間の多様化を促進。

提案手法

SEED は f を共有初期層とする K 個の深層ネットワーク専門家 g_k ∘ f を用い、最初のタスク後に f を凍結する。
各専門家は潜在空間のクラス c 毎にガウス G_k^c = (μ_k^c, Σ_k^c) を持つ。
推論は各専門家のクラスガウス下での潜在表現の対数尤度を計算し、専門家間で正規化した対数尤度の平均を予測に用いる。
訓練時には新しいタスク t ごとに、潜在的クラス分布が最も重なり合わない（対称 KL 発散による）専門家を選択し、その専門家のみをクロスエントロピー損失＋特徴蒸留（L_KD）で微調整する。
専門家選択は、タスク内のクラス集合におけるクラス間分布距離を最大化する KL ベースの基準を用いる。
完全な SEED パイプラインには次を含む： (i) 各専門家の潜在空間でクラスごとのガウス分布を計算、 (ii) 新タスクの微調整に最適な専門家を選択、 (iii) 選択された専門家のガウス分布を更新、 (iv) タスク間のドリフトを防ぐために最初のタスク後 f を固定。

Figure 1: Exemplar-free Class Incremental Learning methods evaluated on CIFAR100 divided into eleven tasks for two different data distributions.

実験結果

リサーチクエスチョン

RQ1 exemplar-free CIL 法がタスクごとに1つの専門家を選択的に訓練することで最先端の精度を達成できるか？
RQ2固定アンサンブル内の多様性を強制することで、さまざまなタスク分割とドメインシフトでの安定-可塑性のトレードオフが改善されるか？
RQ3各専門家内の Gaussian ベースのクラス表現が専門家選択と堅牢な推論にどのように寄与するか？
RQ4共有特徴層と専門家数が性能とパラメータ効率に与える影響は？

主な発見

SEED は exemplar-free CIL 手法の中で複数のベンチマークとタスク分割において最先端の精度を達成。
等分割タスクシナリオおよび大幅なドメインシフト（DomainNet）下で競合他手法を大きく上回る。
共有層を用い選択的微調整を行う5専門家の SEED 設定は、相対的に少ないタスク当たりのパラメータ数で高い性能を示す。
アブレーション研究は、多変量ガウス表現と KL ベースの専門家選択が SEED の性能にとって決定的であり、完全な設計が最良の結果をもたらすことを示す。
多様性は自然と生まれる：各専門家が異なるタスクを専門化し、アンサンブルは常に最良の単一専門家を上回る。

Figure 2: SEED comprises $K$ deep network experts $g_{k}\circ f$ (here $K=2$ ), sharing the initial layers $f$ for higher computational performance. $f$ are frozen after the first task. Each expert contains one Gaussian distribution per class $c\in C$ in his unique latent space. In this example, we

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。