QUICK REVIEW

[論文レビュー] Statistical Mechanical Analysis of Catastrophic Forgetting in Continual Learning with Teacher and Student Networks

Haruka Asanuma, Shiro Takagi|arXiv (Cornell University)|May 16, 2021

Domain Adaptation and Few-Shot Learning参考文献 17被引用数 8

ひとこと要約

本稿は、教師・生徒ニューラルネットワーク設定を用いて、継続的学習における深刻な忘却を理論的に分析する統計力学的枠組みを構築する。入力分布の重なりが小さく、タスク間の関係性が極めて類似している場合に忘却が最小化されることを示し、継続的学習中に一時的な忘却の後、古いタスクの性能が回復する現象である「オーバーシュート」という新しい現象を同定する。

ABSTRACT

When a computational system continuously learns from an ever-changing environment, it rapidly forgets its past experiences. This phenomenon is called catastrophic forgetting. While a line of studies has been proposed with respect to avoiding catastrophic forgetting, most of the methods are based on intuitive insights into the phenomenon, and their performances have been evaluated by numerical experiments using benchmark datasets. Therefore, in this study, we provide the theoretical framework for analyzing catastrophic forgetting by using teacher-student learning. Teacher-student learning is a framework in which we introduce two neural networks: one neural network is a target function in supervised learning, and the other is a learning neural network. To analyze continual learning in the teacher-student framework, we introduce the similarity of the input distribution and the input-output relationship of the target functions as the similarity of tasks. In this theoretical framework, we also provide a qualitative understanding of how a single-layer linear learning neural network forgets tasks. Based on the analysis, we find that the network can avoid catastrophic forgetting when the similarity among input distributions is small and that of the input-output relationship of the target functions is large. The analysis also suggests that a system often exhibits a characteristic phenomenon called overshoot, which means that even if the learning network has once undergone catastrophic forgetting, it is possible that the network may perform reasonably well after further learning of the current task.

研究の動機と目的

現在、実験的・ヒューリスティックな手法が支配的である継続的学習における深刻な忘却を理論的に分析する枠組みを確立すること。
教師・生徒学習設定において、入力分布の重なりと入出力関係の類似性（重み空間の類似性）を用いてタスクの類似性をモデル化すること。
逐次的タスク学習において、生徒ネットワークが深刻な忘却を回避する条件を解析的に導出すること。
一般化誤差の軌道における非単調な忘却ダイナミクス、例えばオーバーシュートの出現を調査すること。

提案手法

研究では、2つの逐次的タスク上で確率的勾配降下法（SGD）で訓練される1層線形生徒ネットワークを用いる。
教師ネットワークは各タスクの真の入出力関係を定義し、その重みを内積を用いて重み空間の類似性を計算するために用いる。
入力空間の類似性は、入力空間の共通部分空間内での入力データ分布の重なりによって定量化される。
一般化誤差は、入力がガウス分布に従い、重みが等方的（isotropic）に初期化されると仮定して、統計力学的手法を用いて解析的に導出される。
隠れ多様体モデルを用いて非ガウス入力へと枠組みを拡張し、一般化に影響を与えるのは入力次元ではなく、内在次元に置き換えられる。
理論的分析は数値シミュレーションを通じて検証され、特に過パラメータ化と学習率が忘却ダイナミクスに与える影響が検討される。

実験結果

リサーチクエスチョン

RQ12つの逐次的タスクを学習する際、生徒ネットワークが深刻な忘却を回避する条件は何か？
RQ2入力分布の類似性と入出力関係の類似性（重み空間の類似性）が、忘却行動にどのように共同で影響を与えるか？
RQ3一般化誤差が一時的に上昇し、その後回復する「オーバーシュート」現象の原因は何か？
RQ4実世界のデータの非ガウス的性質を隠れ多様体でモデル化した場合、忘却と回復の理論的予測はどのように変化するか？

主な発見

入力分布の類似性が低く、タスク間の関係性（重み空間の類似性）が高い場合、深刻な忘却が最小化される。
学習率が大きいとオーバーシュート現象が発生し、古いタスクの性能が一時的に低下した後、継続的学習によって回復する。
過パラメータ化は生徒ネットワークの一般化誤差を増加させるが、初期に誤差が最小化されていなければ、その結果、表面的な忘却が減少する可能性がある。
隠れ多様体モデルでは、入力次元よりも内在次元が一般化に与える影響が顕著になるため、多様体構造が忘却ダイナミクスを調整することが示唆される。
非ガウス的入力下でも、一般化誤差の収束に関する理論的結果は定性的に頑健であるが、学習速度が速いため、オーバーシュートのダイナミクスは強化される可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。