QUICK REVIEW

[論文レビュー] Continual Learning in Low-rank Orthogonal Subspaces

Arslan Chaudhry, Naeemullah Khan|arXiv (Cornell University)|Oct 22, 2020

Domain Adaptation and Few-Shot Learning参考文献 54被引用数 30

ひとこと要約

この論文は、継続学習における干渉を最小化するため神経ネットワーク内の直交部分空間を提案し、Stiefel多様体最適化を通じて重み行列を直交化することでタスク間の干渉を抑制し、メモリベースのベースラインに対して高性能を達成する。

ABSTRACT

In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the continual learning experience is finished. The prior art in CL uses episodic memory, parameter regularization or extensible network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space. We believe this invariably leads to interference among different tasks. We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Further, to keep the gradients of different tasks coming from these subspaces orthogonal to each other, we learn isometric mappings by posing network training as an optimization problem over the Stiefel manifold. To the best of our understanding, we report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning. The code is made publicly available.

研究の動機と目的

継続学習における干渉を減らす動機づけとして、共有ベクトル空間ではなく正交な部分空間で各タスクを学習する。
各タスクを区別された低ランクの部分空間に割り当て、正交性を保証する投影スキームを導入する。
Stiefel多様体上で重みを最適化して層間の正交性を保ちつつ、等尺性勾配変換を強制する。
MNIST、CIFAR、ImageNet由来タスクで経験リプレイベースの強力なベースラインに対して経験的な利得を示す。

提案手法

各タスクを R^{m×m} におけるランク r の投影 P_t を介して低ランク部分空間に割り当て、P_t^T P_t = I かつ P_t^T P_k = 0 for k ≠ t。
projection 層で異なるタスクからの勾配が直交するように促し、Stiefel多様体上で重みを更新してこの性質を等尺性変換で保持する（W_l^T W_l = I）。
学習中に重み行列を正正交化させるため、接空間射影と Cayley変換ベースのリトラクションを用いて Stiefel多様体上で最適化する。
過去のタスク用の小さなリプレイバッファを維持し、現在と過去のタスクの勾配を計算して有効な勾配 g_l = g_l^t + g_l^k を形成する。
ネットワークをランダムな正交重みで初期化し、投影基底 O_t をオフラインで構築して P_t = O_t O_t^T を作る。
アルゴリズムの詳細（Alg. 1）を提供し、 Cayley変換による適応学習で orthog-subspace の訓練を行う。

実験結果

リサーチクエスチョン

RQ1各タスクを独立した低ランク部分空間で学習することは、継続学習における干渉と忘却を減らすことができるか。
RQ2Stiefel多様体最適化を通じて重み行列の正交性を強制することは、層間の勾配の直交性を保持し、限られたエピソードメモリでの性能を改善するか。
RQ3orthog-subspace は標準的な継続学習ベースラインと比較してどうか。
RQ4浅いネットワークと深いネットワークでこのアプローチはどの程度利得が異なるか。

主な発見

方法	Permuted MNIST Accuracy	Forgetting	Rotated MNIST Accuracy	Forgetting	Split CIFAR Accuracy	Forgetting	Split miniImageNet Accuracy	Forgetting
Finetune	50.6 (±2.57)	0.44 (±0.02)	43.1 (±1.20)	0.55 (±0.01)	-	-	-	-
EWC (Kirkpatrick et al., 2016)	68.4 (±0.76)	0.25 (±0.01)	43.6 (±0.81)	0.53 (±0.01)	-	-	-	-
VCL (Nguyen et al., 2018)	51.8 (±1.54)	0.44 (±0.01)	48.2 (±0.99)	0.50 (±0.01)	-	-	-	-
VCL-Random (Nguyen et al., 2018)	52.3 (±0.66)	0.43 (±0.01)	54.4 (±1.44)	0.44 (±0.01)	-	-	-	-
AGEM (Chaudhry et al., 2019a)	78.3 (±0.42)	0.15 (±0.01)	60.5 (±1.77)	0.36 (±0.01)	-	-	-	-
MER (Riemer et al., 2019)	78.6 (±0.84)	0.15 (±0.01)	68.7 (±0.38)	0.28 (±0.01)	-	-	-	-
ER-Ring (Chaudhry et al., 2019b)	79.5 (±0.31)	0.12 (±0.01)	70.9 (±0.38)	0.24 (±0.01)	-	-	-	-
orthog-subspace (ours)	86.6 (±0.91)	0.04 (±0.01)	80.1 (±0.95)	0.14 (±0.01)	-	-	-	-
Multitask	91.3	0.0	94.3	0.0	-	-	-	-
Split CIFAR (Memory=	64.3 (±0.59)	0.07 (±0.01)	-	-	-	-	-	-
Split miniImageNet (Memory=	51.4 (±1.46)	0.10 (±0.01)	-	-	-	-	-	-

orthog-subspace は深いネットワークで強力な経験リプレイベースラインより高精度を最大で 4.7 ポイント改善し、CIFAR-100 と miniImageNet における忘却を memory 使用時に最大で 50% 減少させる。
浅いネットワーク（Permuted/Rotated MNIST）では memory なしで平均精度を 7.1–9.2 ポイント、忘却を 42–66% 減少させる。
系統的な消去法では、投影だけでは弱く、メモリを追加すると効果が高まり、Stiefel多様体の正交性を課すと最も高い精度と最も低い忘却を得られる。
Stiefel制約下での勾配分析は、タスク間の勾配内積がほぼゼロ付近に集中し、干渉が低減していることを確認。
tiny episodic memory では orthog-subspace が ER-Ring を上回り、メモリサイズが増えると利得は縮む。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。