QUICK REVIEW

[论文解读] Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes

Mengye Ren, Renjie Liao|arXiv (Cornell University)|Nov 14, 2016

Domain Adaptation and Few-Shot Learning被引用 47

一句话总结

本文提出了一种统一的除法归一化框架，通过在不同张量维度上归一化激活值，推广了批量归一化和层归一化。通过引入平滑项（$\sigma^2$）和对激活值的L1正则化，该方法在CNN和RNN中均提升了训练稳定性和性能，在图像分类、语言建模和超分辨率任务中实现了最先进结果，且不依赖于批量统计信息。

ABSTRACT

Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate the effectiveness of our unified divisive normalization framework in the context of convolutional neural nets and recurrent neural networks, showing improvements over baselines in image classification, language modeling as well as super-resolution.

研究动机与目标

在基于张量维度上除法归一化的统一框架下，统一批量归一化、层归一化和除法归一化。
研究在归一化中添加平滑项（$\\sigma^2$）和L1正则化对深层网络归一化性能的影响。
评估所提框架在卷积神经网络和循环神经网络中多种任务上的有效性。
证明该方法在小批量设置和RNN中能提升训练稳定性和泛化能力。
提供实证证据表明，加入正则化的除法归一化优于标准归一化技术。

提出的方法

将归一化形式化为在不同张量维度（如批量、通道、滤波器、实例）上的除法操作，将批量归一化和层归一化作为特例统一。
在归一化分母中引入平滑参数$\sigma^2$，以提升数值稳定性和可逆性。
对归一化前的激活值应用L1正则化，以鼓励稀疏性并降低滤波器响应之间的相关性。
将层归一化重新表述为带有平滑参数的除法归一化形式，以增强性能。
在图像分类（CIFAR-10/100）、语言建模（PTB）和超分辨率任务上，使用所提出的除法归一化框架训练模型。
通过消融研究分离$\sigma^2$和L1正则化对模型性能的影响。

实验结果

研究问题

RQ1统一的除法归一化框架在性能和稳定性方面与批量归一化和层归一化相比如何？
RQ2在深层网络的归一化分母中添加平滑项（$\sigma^2$）会产生何种影响？
RQ3对归一化前激活值施加L1正则化如何影响表示学习和模型泛化？
RQ4在小批量设置下，除法归一化是否能在循环神经网络中超越批量归一化？
RQ5结合$\sigma^2$和L1正则化是否能产生更独立且鲁棒的特征表示？

主要发现

所提出的结合$\sigma^2$和L1正则化的除法归一化在CIFAR-100上达到0.8122的测试准确率，优于标准批量归一化（0.5156）和层归一化（0.4957）。
在PTB数据集的语言建模任务中，该方法将交叉熵损失降低至117.868（使用ReLU RNN），优于基线模型和标准归一化技术。
仅添加$\sigma^2$即可在RNN中显著提升性能，损失从基线的149.357降至BN*的138.947和LN*的116.609，表明其具有更强的正则化效果。
消融研究显示，$\sigma^2$和L1正则化在所有架构和任务中均持续提升性能，且$\sigma^2$在RNN中影响更为显著。
联合直方图分析证实，$\sigma^2$和L1正则化降低了成对相关性（Corr）并增加了互信息（MI），促进了更独立的表示。
该方法使RNN能够以更高的学习率实现稳定训练，展示了更优的训练动态和鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。