QUICK REVIEW

[論文レビュー] Orthogonal Gradient Descent for Continual Learning

Mehrdad Farajtabar, Navid Azizan|arXiv (Cornell University)|Oct 15, 2019

Domain Adaptation and Few-Shot Learning参考文献 79被引用数 39

ひとこと要約

要約: 本論文は Orthogonal Gradient Descent (OGD) を提案し、新しいタスクの勾配を過去のタスクの勾配方向に直交する部分空間へ射影することで崩壊的忘却を緩和し、過去データを保持せずに新しいタスクを学習可能とする。

ABSTRACT

Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method.

研究の動機と目的

Sequential task learning における崩壊的忘却に対処する。
Rawデータを保存せず、過去タスクの勾配方向を用いて更新を制約する。
ニューラルネットワークの高い表現力を活かして prior 知識を維持しつつ新しいタスクを学習する。
効果を維持しつつメモリ要件を低減する実用的な OG D バリアントを提案する。

提案手法

以前のタスク予測の勾配方向 [31m{∇f_j(x; w)}[0m を計算する。
Gram-Schmidt を用いて、以前のタスク勾配（データ点ごとおよびクラスごと）から直交基底 S を構築する。
現在のタスクの勾配 g を [31mtilde{g} = g - sum_{v in S} proj_v(g)[0m によって修正し、更新が以前の方向に直交するようにする。
直交化された勾配 [31mtilde{g}[0m を用いてバックプロパゲーションを適用する。
OGD-GTL バリアントとして各以前のタスクから勾配の一部を保存し、メモリ使用量を制限する。
メモリ削減の二つのバリアントを説明する: OGD-AVE（平均ロジット）と OGD-GTL（グラウンドトゥルースロジット）。
Orthogonal Gradients Descent のアルゴリズム 1 を提供し、勾配計算、射影、パラメータ更新、および勾配部分空間の構築を含む。

実験結果

リサーチクエスチョン

RQ1新しいタスクの勾配更新を制約して、過去データを保存せずに既存タスクへの干渉を最小化できるか。
RQ2過去タスク勾配の直交空間への射影は、学習効率と複数の MNIST ベースの連続学習ベンチマークにおける最終精度にどのように影響するか。
RQ3メモリ削減バリアント（OGD-GTL, OGD-AVE）は、保存要件を下げつつ性能を維持できるか。

主な発見

OGD は Permuted MNIST、Rotated MNIST、Split MNIST のいずれにおいても最先端の連続学習ベースラインと比較して競争力がある、あるいはそれを上回る。
OGD は複数の連続タスクにまたいでタスク特異的な性能を維持し、可能であればマルチタスク学習に近づき、いくつかの設定で SGD や EWC のベースラインを上回る。
メモリを削減する OGD-GTL などのバリアントは一般的に良好な性能を示し、多くの実験で A-GEM より優れるか同等を達成することがある。
この手法は損失ではなくモデル出力（ロジット）の勾配を用いる点に依存しており、以前学習した予測を直接的に保持する手段を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。