QUICK REVIEW

[论文解读] High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks

Zifeng Wu, Chunhua Shen|arXiv (Cornell University)|Apr 15, 2016

Advanced Neural Network Applications参考文献 22被引用 84

一句话总结

本文提出了一种基于极深全卷积残差网络的高性能语义分割方法，引入了一种新颖的低分辨率网络模拟技术以克服GPU显存限制，并采用在线自举策略以提升训练效果。该方法在PASCAL VOC 2012数据集上实现了78.3%的新SOTA平均交并比（mIoU）得分，在Cityscapes数据集上达到77.3%。

ABSTRACT

We propose a method for high-performance semantic image segmentation (or semantic pixel labelling) based on very deep residual networks, which achieves the state-of-the-art performance. A few design factors are carefully considered to this end. We make the following contributions. (i) First, we evaluate different variations of a fully convolutional residual network so as to find the best configuration, including the number of layers, the resolution of feature maps, and the size of field-of-view. Our experiments show that further enlarging the field-of-view and increasing the resolution of feature maps are typically beneficial, which however inevitably leads to a higher demand for GPU memories. To walk around the limitation, we propose a new method to simulate a high resolution network with a low resolution network, which can be applied during training and/or testing. (ii) Second, we propose an online bootstrapping method for training. We demonstrate that online bootstrapping is critically important for achieving good accuracy. (iii) Third we apply the traditional dropout to some of the residual blocks, which further improves the performance. (iv) Finally, our method achieves the currently best mean intersection-over-union 78.3\% on the PASCAL VOC 2012 dataset, as well as on the recent dataset Cityscapes.

研究动机与目标

通过使用极深全卷积残差网络，实现语义图像分割的最先进性能。
解决深度网络中高分辨率特征图与大感受野导致的GPU显存限制问题。
通过在训练过程中有效挖掘困难正负像素，提升训练精度。
评估残差块、Dropout以及感受野大小等网络结构组件对分割性能的影响。

提出的方法

提出一种在训练和推理阶段均使用低分辨率网络模拟高分辨率特征图的方法，以降低GPU显存占用。
引入一种在线自举技术，动态选择高损失样本（困难训练像素）以提升模型泛化能力。
在选定的残差块中应用Dropout正则化，以减少过拟合并提升泛化性能。
采用空洞卷积与跳跃连接，扩大感受野的同时保持高分辨率特征图。
采用端到端训练，结合随机梯度下降与数据增强，以优化全卷积残差网络。
利用ImageNet预训练权重初始化网络，实现语义分割任务中的有效迁移学习。

实验结果

研究问题

RQ1在全卷积残差网络中，网络深度、特征图分辨率与感受野大小如何影响语义分割性能？
RQ2低分辨率网络能否有效模拟高分辨率网络的行为，在不损失精度的前提下降低GPU显存使用？
RQ3在线自举技术对语义分割任务中的训练精度与收敛性有何影响？
RQ4在残差块中应用Dropout如何影响模型泛化能力与基准数据集上的性能表现？
RQ5何种网络架构配置能在语义分割任务中实现性能与计算成本的最佳权衡？

主要发现

所提方法在PASCAL VOC 2012验证集上实现了78.3%的新SOTA平均交并比（mIoU）得分，超越了先前方法。
在线自举显著提升了模型精度，证明其在实现最优性能中具有关键作用。
增大感受野与特征图分辨率可提升分割性能，但会带来更高的GPU显存消耗。
所提模拟方法可在降低显存消耗的同时保持高精度，有效缓解GPU显存限制问题。
在残差块中应用Dropout可进一步提升性能，表明其在减少深度分割网络过拟合方面具有重要价值。
该方法在Cityscapes数据集上实现了77.3%的mIoU，证实其在多样化基准数据集上具有强大的泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。