QUICK REVIEW

[论文解读] Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks

Masoud Abdi, Saeid Nahavandi|arXiv (Cornell University)|Sep 19, 2016

Advanced Neural Network Applications参考文献 34被引用 40

一句话总结

本文提出多残差网络（Multi-ResNet），一种更宽的网络架构，在不增加深度的前提下，通过提升每层中的残差函数数量来增强模型多样性与性能。通过利用模型并行性并模拟集成学习行为，Multi-ResNet在CIFAR-10上实现3.73%的top-1错误率，在CIFAR-100上实现19.45%的top-1错误率，达到当前最优性能，同时通过高效利用GPU，相比更深的残差网络将计算成本降低了最多15%。

ABSTRACT

In this article, we take one step toward understanding the learning behavior of deep residual networks, and supporting the observation that deep residual networks behave like ensembles. We propose a new convolutional neural network architecture which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble. The proposed multi-residual network increases the number of residual functions in the residual blocks. Our architecture generates models that are wider, rather than deeper, which significantly improves accuracy. We show that our model achieves an error rate of 3.73% and 19.45% on CIFAR-10 and CIFAR-100 respectively, that outperforms almost all of the existing models. We also demonstrate that our model outperforms very deep residual networks by 0.22% (top-1 error) on the full ImageNet 2012 classification dataset. Additionally, inspired by the parallel structure of multi-residual networks, a model parallelism technique has been investigated. The model parallelism method distributes the computation of residual blocks among the processors, yielding up to 15% computational complexity improvement.

研究动机与目标

探究深度残差网络是否因残差路径的指数级多样性及梯度流动模式，而表现出类似浅层网络集成的行为。
在不增加深度的前提下，提升分类准确率与计算效率，超越标准深度残差网络。
探索模型并行性作为数据并行性的替代方案，用于加速更宽、更浅的网络架构的训练。
证明在参数量相同的情况下，增加残差函数的多样性优于增加网络深度。

提出的方法

引入多残差块，每个块内包含多个并行的残差函数，从而增加从输入到输出的路径数量。
构建一个固定深度但更高多样性宽度的网络架构，而非通过增加层数来加深网络。
通过将每个多残差块的计算拆分到两个GPU上，每个GPU负责一半的残差函数，实现模型并行性。
采用混合并行策略：在四块K80 GPU上使用数据并行，同时在每块GPU的两个子GPU上实现内部模型并行。
使用标准SGD进行训练，配合适度的数据增强（翻转/平移），并与深度残差网络及当前最优模型进行性能对比。
分析梯度更新与路径贡献，以验证残差网络的集成式行为。

实验结果

研究问题

RQ1深度残差网络是否因残差路径的指数级数量而表现出类似浅层网络集成的行为？
RQ2增加每层中的残差函数数量，是否比增加网络深度更能有效提升准确率？
RQ3在更宽、更浅的网络架构中，将模型并行性应用于多残差块，是否相比使用数据并行性的深层网络能降低计算复杂度？
RQ4有效路径范围与路径多样性对梯度流动与优化稳定性有何影响？
RQ5在卷积层数量相同的情况下，一个更浅、更宽且每层含多个残差函数的网络，是否能超越更深、更窄的对应网络？

主要发现

Multi-ResNet在CIFAR-10上实现3.73%的top-1错误率，在CIFAR-100上实现19.45%的top-1错误率，优于大多数现有模型，包括更深的残差网络。
一个101层、每块含两个残差函数的Multi-ResNet在ImageNet 2012上比200层的ResNet高出0.22%的top-1准确率。
在适度数据增强下，与使用恒等映射的残差网络相比，Multi-ResNet在CIFAR-10上错误率降低6%，在CIFAR-100上降低10%。
尽管存在GPU间通信开销，模型并行性相比使用数据并行性的深层残差网络，将计算复杂度降低了最多15%。
当深度超过某个临界值n₀后，增加残差函数多样性带来的性能增益超过增加深度的增益，表明多样性比深度对准确率更具决定性影响。
结果支持残差网络具有集成行为：移除某一层对性能影响极小，且大多数梯度更新来自浅层路径。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。