QUICK REVIEW

[论文解读] Wide Residual Networks

Sergey Zagoruyko, Nikos Komodakis|arXiv (Cornell University)|May 23, 2016

Advanced Neural Network Applications参考文献 21被引用 1,927

一句话总结

本文表明扩大残差块（WRNs）在性能上可超越非常深的细薄 ResNets，在参数层数显著减少、训练速度更快的情况下，在 CIFAR、SVHN、COCO 和 ImageNet 上达到最先进的结果。

ABSTRACT

Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https://github.com/szagoruyko/wide-residual-networks

研究动机与目标

研究残差块结构在深度之外对性能的影响。
评估扩大块是否比更深的网络在精度和训练效率上更优。
在宽残差块中探索正则化技术（ dropout ）。
展示使用 WRNs 在 CIFAR、SVHN、COCO 和 ImageNet 上的最先进结果。

提出的方法

用扩宽因子 k 和块深度 l 定义深度残差网络。
比较块类型和配置（B(3,3)、B(3,1,3) 等）以确定最佳结构。
在大致固定参数数量的前提下，通过改变 l 和 k 研究深度与宽度的关系。
在残差块内引入 dropout 以对更宽的网络进行正则化。
在 CIFAR-10/100、SVHN、ImageNet 和 COCO 上使用标准化的训练协议进行评估。

实验结果

研究问题

RQ1扩大残余块是否比增加 ResNet 的深度更有效地提升性能？
RQ2在固定参数数量下，达到最佳性能的块深度 l 与扩宽因子 k 的最优组合是什么？
RQ3在残差块内的 dropout 是否在跨数据集的宽网络中提供正则化收益？
RQ4在 CIFAR、SVHN、ImageNet 和 COCO 上，WRN 相对于传统的窄 ResNet 的表现如何？

主要发现

深度	k	# 参数量	CIFAR-10	CIFAR-100
40	1	0.6M	6.85	30.89
40	2	2.2M	5.33	26.04
40	4	8.9M	4.97	22.89
40	8	35.7M	4.66	-
28	10	36.5M	4.17	20.50
28	12	52.5M	4.33	20.43
22	8	17.2M	4.38	21.22
22	10	26.8M	4.44	20.75
16	8	11.0M	4.81	22.07
16	10	17.1M	4.56	21.59

在参数数量相近的情况下，宽度越大，宽残差网络的性能越好。
对于 CIFAR-10/100，WRN-40-4 和 WRN-28-10 的性能优于更薄更深、层数更少但训练更快的模型。
在 ImageNet 上，将 ResNet-50 的网络宽度扩展为 WRN-50-2-bottleneck，在层数显著减少的情况下获得更高的准确率，相比 ResNet-152。
在残差块内的 dropout 能在 CIFAR 和 SVHN 上带来可观的测试误差降低，并与宽度带来的收益互补。
WRN 架构在 CIFAR-10、CIFAR-100、SVHN 和 COCO 上实现了最先进的结果，在 ImageNet 上也具备竞争力，同时训练时间更快。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。