QUICK REVIEW

[论文解读] Traditional and Heavy-Tailed Self Regularization in Neural Network Models

Charles H. Martin, Michael W. Mahoney|arXiv (Cornell University)|Jan 24, 2019

Statistical Mechanics and Entropy参考文献 47被引用 40

一句话总结

该论文使用 Random Matrix Theory 来表明 DNN 的权重矩阵展现出隐式自正则化，揭示了一个 5+1 相位分类（以及一个 Heavy-Tailed 变体）的训练，受到批量大小和其他训练参数的影响。

ABSTRACT

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a `size scale' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of \emph{Heavy-Tailed Self-Regularization}, similar to the self-organization seen in the statistical physics of disordered systems. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size.

研究动机与目标

研究深度学习中的正则化为何与传统 ML 的正则化行为不同。
基于权重矩阵的谱属性，提出自正则化理论。
描述训练参数，尤其是 batch size，如何影响隐式正则化的体系。
提供一个实用框架，用于监控和控制深度网络的能量景观。

提出的方法

将权重矩阵建模为 W ≈ W_rand + Δsig，以将噪声与信号分离。
应用 Marchenko-Pastur (MP) 理论和 heavy-tailed MP 扩展来分析实证谱密度 (ESDs)。
识别普遍性类 (Gaussian、Spiked-Covariance、和 Heavy-Tailed) 用于对 ESDs 进行分类。
将 MP Soft Rank 定义为 λ+ / λmax，以量化信号相对于噪声的强度。
开发一个 5+1 Training 阶段的可视分类法（Random-like、Bleeding-out、Bulk+Spikes、Bulk-decay、Heavy-Tailed、Rank-collapse）。
通过在 MiniAlexNet 上改变 training knobs（尤其是 batch size）来演示相位转换。

实验结果

研究问题

RQ1在 DNN 权重矩阵中，哪些谱特征指示隐式自正则化？
RQ2MP 理论与 Heavy-Tailed 普遍性类如何描述从随机类到严重正则化的转变？
RQ3通过调整 batch size 等训练 knobs，小模型是否能呈现出所有 5+1 个训练相？
RQ4显式正则化与观察到的谱相之间存在怎样的关系？
RQ5重尾自正则化是否能在从 LeNet5 到 Inception/AlexNet 的架构跨领域泛化？

主要发现

较老/较小的模型显示 MP-like 谱，具有与隐式 Tikhonov-like 正则化相一致的低秩峰值。
现代大型 DNN 显示 Heavy-Tailed 谱密度，表明强相关性与 Heavy-Tailed Self-Regularization。
一个 5+1 的训练阶段可视分类法，可以描述权重谱从 random-like 演变到 rank-collapse 的过程。
当自正则化加强时，MP Soft Rank 下降，表明随机性行为减弱。
批量大小调整能使单个模型出现所有 5+1 阶段，体现 Generalization Gap 因素。
显式正则化进一步移动峰值并降低谱复杂度，与理论相符。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。