QUICK REVIEW

[论文解读] Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks

Lei Huang, Xianglong Liu|arXiv (Cornell University)|Sep 16, 2017

Advanced Neural Network Applications被引用 90

一句话总结

本文将深度网络中学习正交矩形权重矩阵建模为 Optimization over Multiple Dependent Stiefel Manifolds (OMDSM)，并提出通过代理参数实现正交权重归一化，得到一个正交线性模块（Orthogonal Linear Module，OLM），在不改变协议的前提下提升了 CNN 的性能。

ABSTRACT

Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the rectangular orthogonal matrix can stabilize the distribution of network activations and regularize FNNs. We also propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal and back-propagates gradient information through the transformation during training. To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design an orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method largely improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets. In particular, we have reduced the test error of wide residual network on CIFAR-100 from 20.04% to 18.61% with such simple substitution. Our code is available online for result reproduction.

研究动机与目标

通过正交权重矩阵激发深度网络中的正则化和稳定优化。
将深度神经网络中学习正交滤波器的问题表述为 Optimization over Multiple Dependent Stiefel Manifolds (OMDSM)。
发展一种稳定的解，Orthogonal Weight Normalization，它通过正交化变换进行反向传播。
在实践中引入 Orthogonal Linear Module (OLM) 来替代标准的线性层。
在 CIFAR 和 ImageNet 数据集上，展示在 MLP 与 CNN 上的性能提升。

提出的方法

将每一层中的 W^l 定义为正交，W^l 属于 O^{n_l x d_l}，从而形成 OMDSM。
将 W^l 重新参数化为 W^l = φ(V^l)，其中 φ 将代理变量 V^l 映射到正交的 W^l。
对 V^l 进行中心化，并通过 W^l = D Λ^{-1/2} D^T (V^l - c 1_d^T) 使用协方差 Σ 的特征分解来计算 φ。
通过矩阵微分法和特征分解导数，沿 φ 反向传播梯度。
在约束 W = φ(V) 且 W W^T = I 的条件下，最小化失真 tr((W - V_c)(W - V_c)^T) 以稳定学习（OLM）。
可选地对 n > d 使用基于组的正交化，将权重分成大小为 N_G 的组，在组内进行正交化。
卷积层通过将 W^C 展开为二维并应用相同的正交化来处理；基于组的策略可降低计算量。
提出一个 Orthogonal Linear Module (OLM)，实现带 φ 变换的前向/反向传播，并保存 W 以用于推断。

实验结果

研究问题

RQ1在 OMDSM 下，深度前馈网络是否能够有效学习到正交矩形权重矩阵？
RQ2用代理参数正交化来求解 OMDSM，是否相对于 Riemannian 优化方法提供稳定且可扩展的训练？
RQ3用 OLM 替换标准线性模块对 CNN 架构的优化速度和泛化性能有何影响？
RQ4在大规模 CNN 中部署 OMDSM 的实际策略有哪些（如基于组的正交化、与 BN/Adam 的兼容性等）？

主要发现

对于 OMDSM 的黎曼优化方法表现出不稳定或收敛慢，而 OLM 实现了稳定且快速的优化。
OLM 稳定化激活分布并保持梯度范数，有助于深度和条件数的训练。
用 OLM 替换标准线性模块在 CNN 架构和数据集上带来持续的性能提升。
在 CIFAR-100 上，使用 Wide ResNet 的测试错误率从 20.04% 提升至 18.61%（以及在 CIFAR-10 上的相关提升）。
采用 OLM 的 VGG 风格网络（及变体）在 CIFAR-10/100 上达到最先进或具有竞争力的结果；例如，WRN-28-10-OLM 在 CIFAR-10 上达到 3.73%，在 CIFAR-100 上达到 18.76%。
BN-Inception 搭配 OLM 在 CIFAR-10/100 上优于基线；例如，CIFAR-100 从 24.87% 降到 22.02%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。