QUICK REVIEW

[論文レビュー] Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

Charles H. Martin, Michael W. Mahoney|arXiv (Cornell University)|Oct 2, 2018

Statistical Mechanics and Entropy参考文献 67被引用数 74

ひとこと要約

本論文はランダム行列理論を用いてDNNの重み行列を分析し、訓練が暗黙の自己正則化を誘導することと、一般化ギャップやバッチサイズ効果を説明する heavy-tailed regimeを含む5+1の位相分類を特定する。

ABSTRACT

Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of Self-Regularization. The empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization. These phases can be observed during the training process as well as in the final learned DNNs. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a "size scale" separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of Heavy-Tailed Self-Regularization, similar to the self-organization seen in the statistical physics of disordered systems. This results from correlations arising at all size scales, which arises implicitly due to the training process itself. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates that---all else being equal---DNN optimization with larger batch sizes leads to less-well implicitly-regularized models, and it provides an explanation for the generalization gap phenomena.

研究の動機と目的

ドロップアウトや重みノルムのような明示的手法を超えた深層学習の実践的な正則化理論を動機づける。
DNNs のエネルギー地形をRMT由来の指標で層の重み行列を分析して特徴付ける。
実際的に定義された訓練の位相を導入して自己正則化の増大に対応づける。
訓練ノブ（例: バッチサイズ）が位相遷移と一般化にどう影響するかを示す。

提案手法

各DNN層の重み行列WをW = W_rand + Δsigとして、ランダム成分と信号成分を分離する。
X = (1/N) W^T W の経験的スペクトル密度ESDを分析し、Marchenko-Pastur (MP) 理論とheavy-tailed普遍性クラスでフィットする。
スペクトルからキャパシティ指標を定義・計算する：Hard Rank、Matrix Entropy、Stable Rank、MP Soft Rank。
5+1の位相分類を開発・検証し、暗黙の正則化レベルに対応付ける。
小型モデルで訓練ノブを操作して位相遷移を示し、事前学習済みの大規模モデルと比較する。

実験結果

リサーチクエスチョン

RQ1Can Random Matrix Theory explain how DNN training induces regularization without explicit penalties?
RQ2What are the spectral signatures (ESD) of weight matrices that reflect different levels of implicit self-regularization?
RQ3How do training parameters, especially batch size, drive transitions between the identified phases and affect generalization?

主な発見

Older/smaller models show weak, Tikhonov-like implicit regularization with a signal-noise separation in MP terms.
Modern large models exhibit heavy-tailed self-regularization with no clean signal-noise separation and finite spectral support.
Phases can be observed during training and in final models, with MP Soft Rank decreasing and Stable Rank also declining as implicit regularization increases.
Batch size reduction can drive a small model through all 5+1 phases, linking generalization gaps to implicit regularization.
Explicit regularization can induce the Rank-collapse phase, illustrating how regularization strength shapes spectrum and capacity.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。