QUICK REVIEW

[論文レビュー] Measuring Catastrophic Forgetting in Neural Networks

Ronald Kemker, Marc McClure|arXiv (Cornell University)|Aug 7, 2017

Multimodal Machine Learning Applications被引用数 190

ひとこと要約

この論文は、incremental learning における catastrophic forgetting を緩和する5つのメカニズムを概説し、新しいベンチマークと指標を導入し、実世界の画像および音声データセットで比較する。パラダイムを横断して忘却を完全には解決していないことを示す。

ABSTRACT

Deep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, e.g., bird classification, it cannot easily be trained to do new tasks, e.g., incrementally learning to recognize additional bird species or learning an entirely different task such as flower recognition. When new tasks are added, typical deep neural networks are prone to catastrophically forgetting previous tasks. Networks that are capable of assimilating new information incrementally, much like how humans form new memories over time, will be more efficient than re-training the model from scratch each time a new task needs to be learned. There have been multiple attempts to develop schemes that mitigate catastrophic forgetting, but these methods have not been directly compared, the tests used to evaluate them vary considerably, and these methods have only been evaluated on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics and benchmarks for directly comparing five different mechanisms designed to mitigate catastrophic forgetting in neural networks: regularization, ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on real-world images and sounds show that the mechanism(s) that are critical for optimal performance vary based on the incremental training paradigm and type of data being used, but they all demonstrate that the catastrophic forgetting problem has yet to be solved.

研究の動機と目的

DNN における崩壊的 forgetting なしの incremental learning の必要性を動機づける。
MNIST を超えて 100–200 クラスの実世界データセットへとスケールする新しいベンチマークと指標を提案する。
five 機構 — regularization, ensembling, rehearsal, dual-memory, and sparse coding — を、 varied incremental paradigms の下で比較する。
異なるデータモダリティとタスク設定が forgetting と method の性能に与える影響を評価する。

提案手法

study sessions を伴う incremental learning の設定と、過去データの外部メモリを任意に使用する。
データのパーミュテーション、incremental class learning、multi-modal learning の3つの新規ベンチマークを開発する。
five 機構（EWC, PathNet, GeppNet, GeppNet+STM, FEL）を、固定パラメータ数のベースラインと比較して評価する。
忘却に関連する3つの指標 Omega_base, Omega_new, Omega_all を導入し、保持と獲得を定量化する。
モデル間での訓練時間とメモリフットプリントを分析する。

実験結果

リサーチクエスチョン

RQ1実世界のデータセットに対して、異なる incremental learning パラダイム下で忘却抑制機構5つはどう比較されるのか。
RQ2100–200 クラスタスクおよびクロスモーダルデータへ拡張した場合、既存の解決策は崩壊的 forgetting を完全に解決しているのか。
RQ3メモリ使用量、モデル容量、 sparsity など、どの要因がタスク間の forgetting と保持に最も影響を与えるのか。
RQ4Omega_base, Omega_new, Omega_all の指標は、データセット間の保持と学習のトレードオフをどう捉えるのか。

主な発見

どの tested method も、すべてのタスクとデータセットに対して崩壊的 forgetting を完全には解決していない。
Omega_all は MNIST の方が CUB-200 や AudioSet より一般的に高く、データセット依存の性能を示す。
GeppNet および GeppNet+STM は incremental class learning で強い性能を示し、GeppNet がしばしば最良。GeppNet+STM は base 知識を保持するが、一部データセットで新クラスの学習が難しい。
EWC は multi-modal learning において、第一モダリティを保持しつつ第二を獲得する点で優れている。
PathNet は data permutation タスクで最も良い性能を示すが、各セッションごとに別の出力を必要とし、特徴を共有すると飽和する可能性がある。
FEL は新しいクラスの学習には優れるが base セットを忘れ、スパース性のみを機構とする場合にはメモリフットプリントが大幅に増加する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。