QUICK REVIEW

[论文解读] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Liang-Chieh Chen, George Papandreou|arXiv (Cornell University)|Dec 22, 2014

Advanced Neural Network Applications参考文献 64被引用 3,623

一句话总结

本文将深度卷积神经网络（deep CNNs）与全连接条件随机场（fully connected CRFs）结合，用于像素级语义分割，通过用密集CRF解决定位问题并通过 atrous（hole）算法提高效率，在 PASCAL VOC 2012 上实现了最先进的结果。

ABSTRACT

Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.

研究动机与目标

在 DCNN 的不变性会削弱定位时，提出像素级语义分割的动机。
将一个预训练的 CNN 改造为密集特征提取器，同时尽量减少性能损失。
通过将 DCNN 一元项与全连接 CRF 结合来实现边界细化，以解决定位问题。

提出的方法

将一个 16 层 CNN（VGG-16）转换为全卷积网络以实现密集、逐像素评分。
使用 hole（atrous）算法以 8 像素步长计算密集特征图，从而实现高分辨率得分图。
在 PASCAL VOC 2012 上对 CNN 进行微调以实现 21 类像素标记，使用密集输出的逐像素交叉熵。
附加一个全连接 CRF（Krähenbühl & Koltun 2011），使用双边核和空间核来细化分割边界。
可选地从中间 CNN 层引入多尺度特征以增强定位。
探索具有不同视野设置（atrous 内核大小和步长）的变体，以在准确性和速度之间取得平衡。

实验结果

研究问题

RQ1在结合基于 DCNN 一元项的语义分割中，全连接 CRF 是否可以改善定位？
RQ2多尺度特征和大视野对分割性能有何影响？
RQ3atrous 卷积方法如何影响密集得分计算及总体效率？
RQ4在改变接收域和视野时，精度、速度和模型大小之间存在哪些权衡？

主要发现

在 PASCAL VOC 2012 测试集上实现了最先进的平均 IOU（71.6%，使用 DeepLab-MSc-CRF-LargeFOV）。
将全连接 CRF 添加到 DeepLab 系统可带来显著的性能提升（例如 DeepLab-CRF 在验证集上比 DeepLab 提升约 4%）。
多尺度特征提供额外的定位优势，与 CRF 结合时进一步提高 IoU。
atrous（hole）算法使得在现代 GPU 上实现 8-pixel 步幅的密集 CNN 评分，速度为 8 fps，而对 VOC 图像的 CRF 推断平均在 0.5 秒以下完成。
大视野变体在速度较快的同时实现具竞争力甚至更高的准确性，与某些替代方法相比，展示了在实时或接近实时应用中的有效权衡。
在 VOC 2012 测试集上的最佳单模型（DeepLab-MSc-CRF-LargeFOV）达到 71.6% 的 mean IOU，超过了 FCN-8s 和其他同期方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。