QUICK REVIEW

[论文解读] Norm matters: efficient and accurate normalization schemes in deep networks

Elad Hoffer, Ron Banner|arXiv (Cornell University)|Mar 5, 2018

Model Reduction and Neural Networks参考文献 43被引用 52

一句话总结

这篇论文重新考虑深度网络中的归一化，显示权重衰减、学习率和归一化通过权重范数相互作用；提出基于 L1 和 L∞ 的 BN 变体以及有界权重归一化以提高稳定性并实现半精度训练，具有竞争力的准确性。

ABSTRACT

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work, we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. This property highlights the connection between practices such as normalization, weight decay and learning-rate adjustments. We suggest several alternatives to the widely used $L^2$ batch-norm, using normalization in $L^1$ and $L^\infty$ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations. Finally, we suggest a modification to weight-normalization, which improves its performance on large-scale tasks.

研究动机与目标

了解权重范数如何与归一化和学习动力学相互影响。
提出归一化替代方案，使权重范数与优化目标解耦。
在数值稳定性和效率方面提升，特别是在低精度设置。
评估基于 L1 和 L∞ 的归一化，作为 Batch Normalization 的替代或补充。
引入有界权重归一化以增强大规模训练的性能。

提出的方法

将 BN 对权重范数的不变性视为将尺度与优化解耦的机制。
推导并测试学习率校正，模仿权重衰减对训练动力学的影响。
用 L1 和 L∞ 基于 BN 变体替换或增强 L2 BN，并为稳定性和性能推导适当的缩放常数（如 C_L1）。
证明 L1 BN 支持半精度训练，而 L2 BN 可能失败。
通过将通道维度的权重范数固定为标量 ρ 引入有界权重归一化（BWN），以提高 ImageNet 上的稳定性和 seq2seq 任务的性能。
探索 Lp-权重归一化（包括 L1 和 L∞ 变体）作为标准权重归一化的替代方案，适用于多种架构。

实验结果

研究问题

RQ1权重范数如何与 Batch Normalization 交互以影响学习动力学和有效步长？
RQ2替代的基于范数的归一化（L1、L∞）是否能在提供计算和低精度优势的同时达到 BN 的准确性？
RQ3绑定权重范数（有界权重归一化）是否能提升大规模任务和序列模型的性能？
RQ4是否可在半精度下使用 L1 归一化进行批量归一化？
RQ5Lp-权重归一化相对于传统权重归一化的权衡是什么？

主要发现

权重衰减通过约束权重范数来改善优化，有效稳定学习率；通过调整学习率或归一化也可获得类似效果。
基于 L1 和 L∞ 的 batch normalization 可以达到或接近 L2 BN 在 CIFAR 和 ImageNet 的准确性，且 L1 BN 使半精度训练更稳定。
L1 BN 提供计算和内存优势，并在量化噪声下保持鲁棒性，使半精度 BN 在 L2 BN 失败时成为可能。
有界权重归一化（BWN）方法在大规模任务（如 ImageNet）上显著改善性能，相较于标准权重归一化，接近 BN 的性能。
L1 和 Lp 归一化可以在架构（ResNet、Transformer）中作为对 BN 的低精度友好替代方案，且精度损失很小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。