[论文解读] Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
该论文将 ResNets 重新解释为一个线性增长的浅层子网络集合,并提出一个更浅且更宽的残差结构,在 ImageNet 上超过多数更深的模型,并在语义分割任务上显示出强迁移能力。
The trend towards increasingly deep neural networks has been driven by a general observation that increasing depth increases the performance of a network. Recently, however, evidence has been amassing that simply increasing depth may not be the best way to increase performance, particularly given other limitations. Investigations into deep residual networks have also suggested that they may not in fact be operating as a single deep network, but rather as an ensemble of many relatively shallow networks. We examine these issues, and in doing so arrive at a new interpretation of the unravelled view of deep residual networks which explains some of the behaviours that have been observed experimentally. As a result, we are able to derive a new, shallower, architecture of residual networks which significantly outperforms much deeper models such as ResNet-200 on the ImageNet classification dataset. We also show that this performance is transferable to other problem domains by developing a semantic segmentation approach which outperforms the state-of-the-art by a remarkable margin on datasets including PASCAL VOC, PASCAL Context, and Cityscapes. The architecture that we propose thus outperforms its comparators, including very deep ResNets, and yet is more efficient in memory use and sometimes also in training time. The code and models are available at https://github.com/itijyou/ademxapp
研究动机与目标
- 解释深度残差网络的解开视角及有效深度。
- 提出并评估更浅、更新的残差架构,能够超越更深的同类模型。
- 证明所提出架构对语义分割基准的迁移能力。
- 评估相对于非常深的 ResNets,提出网络的内存与训练效率。
提出的方法
- 提供带跳跃连接的残差单元的解开分析与有效深度分析。
- 设计一系列相对较浅的网络:每个残差单元含有两个3x3卷积和选择性瓶颈。
- 在 ImageNet 上进行评估,比较顶/一层准确率和吞吐量,与深层 ResNets 和 Inception 变体相比。
- 通过修改下采样、膨胀,以及分类器结构,在不进行大规模多尺度监督的情况下,将分类网络改造成适用于语义分割。
- 使用MXNet在多GPU设置下训练和微调,并报告内存使用和训练速度。
实验结果
研究问题
- RQ1残差网络是以浅层子网络的指数级集合来工作,还是以线性增长的集合来工作?
- RQ2在保持内存高效的前提下,更浅、更新的残差架构是否能在 ImageNet 上超越更深的 ResNets?
- RQ3在不进行大量后处理的情况下,所提出的架构在语义分割基准(PASCAL VOC、Cityscapes、ADE20K)上的迁移效果如何?
主要发现
- 浅而宽的残差架构在 ImageNet 上在 top-1/top-5 准确率方面可以超越非常深的 ResNets(如 ResNet-152、ResNet-200)。
- 大约由十七个残差单元组成的网络可以在内存效率更高的同时超越更深的模型。
- 使用所提出网络的特征进行语义分割,在 PASCAL VOC、Cityscapes 和 ADE20K 上取得类似于最先进的方法的结果,且不需要多尺度或 CRF 后处理。
- 内存使用和训练速度可以随着更浅的架构而改善,这取决于输入大小和下采样策略。
- 性能与合适的深度设计和避免过深相关,支持一个宽度与深度的权衡,倾向于端到端可训练性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。