QUICK REVIEW

[論文レビュー] Critical Learning Periods in Deep Neural Networks

Alessandro Achille, Matteo Rovere|arXiv (Cornell University)|Nov 24, 2017

Neural Networks and Applications参考文献 29被引用数 66

ひとこと要約

本論文は深層ネットワークが一時的な欠陥で性能が低下する臨界学習期を示し、Fisher Informationを用いて2つの学習段階と情報可塑性を明らかにし、転移学習と表現の頑健性への影響を論じている。

ABSTRACT

Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the impairment depends on the onset and length of the deficit window, as in animal models, and on the size of the neural network. Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training. To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training. Counterintuitively, information rises rapidly in the early phases of training, and then decreases, preventing redistribution of information resources in a phenomenon we refer to as a loss of "Information Plasticity". Our analysis suggests that the first few epochs are critical for the creation of strong connections that are optimal relative to the input data distribution. Once such strong connections are created, they do not appear to change during additional training. These findings suggest that the initial learning transient, under-scrutinized compared to asymptotic behavior, plays a key role in determining the outcome of the training process. Our findings, combined with recent theoretical results in the literature, also suggest that forgetting (decrease of information in the weights) is critical to achieving invariance and disentanglement in representation learning. Finally, critical periods are not restricted to biological systems, but can emerge naturally in learning systems, whether biological or artificial, due to fundamental constrains arising from learning dynamics and information processing.

研究の動機と目的

生物学的な臨界期と類似したDNNの初期学習ダイナミクスの研究を動機づける。
DNNにおける一時的な感覚欠損が最終性能にどう影響するかを調べる。
Fisher Informationを用いて訓練中のネットワーク層間の進化する結合を定量化する。
初期の記憶化と後の一般化、そして不変性のための忘却の潜在的利点を結びつける。

提案手法

CIFAR-10とMNISTで訓練されたCNNに対して、初期エポック中に画像劣化欠陥（例：ぼかし）を用いて臨界期を誘導する。
重みのFisher Information Matrix（FIM）を、扱いやすいトレースベースの推定量を用いて推定し、層間の結合度を測る。
情報可塑性を、訓練中の情報の層間再配分として定義・測定する。
アーキテクチャ、最適化手法、データ分布を比較して臨界期現象のロバスト性を評価する。
欠陥のタイミングと持続期間が感度とどう相関するかを、スライディングウィンドウ法を用いて分析する。
FIMダイナミクスを損失面のボトルネックおよび記憶/忘却フェーズと関連付ける。

実験結果

リサーチクエスチョン

RQ1訓練中に一時的な欠陥を加えた場合、深層ニューラルネットワークは臨界学習期を示すか。
RQ2欠陥のタイミングと持続期間が、アーキテクチャやデータセット全体の最終性能にどう影響するか。
RQ3Fisher Informationのダイナミクスとネットワークの欠陥感度（情報可塑性）との関係は何か。
RQ4層ごとの情報再編成は観察された臨界期を説明し、転移学習の効果を解釈するのに役立つか。

主な発見

DNNは臨界期を示す：ウィンドウ内に欠陥を取り除くと（約40–60エポック）、最終性能が恒久的に低下する。
ぼかし欠陥を早期に導入すると最終精度がより低下し、早期の急速学習段階で感度のピークを迎える。
Fisher Informationは初期に上昇し、統合過程で低下する。記憶化に続く忘却/再編成フェーズを反映している。
欠陥感度は全体および層別のFisher Informationを追跡し、欠陥下で情報可塑性の喪失を示唆している。
層別分析は欠陥が高次層への依存を移し、早期除去は中間層への部分的再編成を許すことを示す。
臨界期間はアーキテクチャ（All-CNN、ResNet、MNIST、CIFAR-10）および最適化方式（SGD、Adam）を横断して持続するが、形状と持続期間は深さとハイパーパラメータにより異なる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。