QUICK REVIEW

[论文解读] DiracNets: Training Very Deep Neural Networks Without Skip-Connections

Sergey Zagoruyko, Nikos Komodakis|arXiv (Cornell University)|Jun 1, 2017

Advanced Neural Network Applications参考文献 16被引用 75

一句话总结

DiracNets 使用 Dirac 权重参数化来训练非常深的普通网络，而不需要显式跳跃连接，在推理时接近 ResNet/WRN 的性能，并折叠为简单的卷积-ReLU 链。

ABSTRACT

Deep neural networks with skip-connections, such as ResNet, show excellent performance in various image classification benchmarks. It is though observed that the initial motivation behind them - training deeper networks - does not actually hold true, and the benefits come from increased capacity, rather than from depth. Motivated by this, and inspired from ResNet, we propose a simple Dirac weight parameterization, which allows us to train very deep plain networks without explicit skip-connections, and achieve nearly the same performance. This parameterization has a minor computational cost at training time and no cost at all at inference, as both Dirac parameterization and batch normalization can be folded into convolutional filters, so that network becomes a simple chain of convolution-ReLU pairs. We are able to match ResNet-1001 accuracy on CIFAR-10 with 28-layer wider plain DiracNet, and closely match ResNets on ImageNet. Our parameterization also mostly eliminates the need of careful initialization in residual and non-residual networks. The code and models for our experiments are available at https://github.com/szagoruyko/diracnets

研究动机与目标

理解跳跃连接和图像分类中加深的局限性。
提出 Dirac 权重参数化，以使非常深的普通网络能够端到端训练。
在 CIFAR 和 ImageNet 上展示 DiracNet 相对于 ResNet 与 WRN 的性能。
展示 Dirac 参数化如何与初始化交互并在推理时进行折叠。

提出的方法

在权重 W_hat = diag(a)I + W（可选的权重归一化 W_hat = diag(a)I + diag(b)W_norm) 的情形下引入 Dirac 参数化。
初始化 a ~ 1 且 b ~ 0.1；W 的初始化来自 N(0,1)；对 a,b 不使用 L2 正则化。
使用权重归一化和折叠来训练非常深的普通网络；在 CIFAR 和 ImageNet 上与 ResNet/WRN 进行比较。
将 Dirac 参数化与 ResNet 联系起来，显示隐式跳连接并讨论非线性前后的顺序。
在 CIFAR 上采用 plain 和 DiracNet 变体进行评估，在 ImageNet 上对 DiracNet-18/34 与 ResNet-18/34 进行比较。
证明 DiracNet 可以端到端训练，无需分层前期训练，并且在推理时可以折叠为一个 VGG 式的卷积-ReLU 链。

实验结果

研究问题

RQ1Dirac 参数化是否能够在没有显式跳连接的情况下训练数百层网络？
RQ2DiracNet 的性能相对于 ResNet 与 Wide ResNet 在 CIFAR-10/100 与 ImageNet 有多大差异？
RQ3Dirac 参数化是否降低对初始化的敏感性，并在测试时允许折叠为简单的卷积-ReLU 链？
RQ4网络宽度和深度对 DiracNets 与传统残差网络的影响是什么？

主要发现

DiracNets 使得能够训练极深的普通网络（数百层）并且具有竞争力的性能。
DiracNet-28-10 在 CIFAR-10/100 上达到 4.75% top-1 / 21.54% top-5，总参数 36.5M，接近 WRN-28-10。
在 CIFAR 上，普通 DiracNets 的性能超越其他普通网络并接近 ResNet/WRN 的性能；更深的 DiracNets 在普通网络失败时提升了准确度。
在 ImageNet 上，DiracNet-18/34 与 ResNet-18/34 的参数量相近，几乎匹配。
在使用此框架时，Dirac 参数化消除了在 ResNet 风格网络中的对初始化的谨慎要求。
Dirac 参数化的滤波器可以折叠成一个权重向量，在推理时产生一个简单的 VGG 式卷积-ReLU 块序列。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。