QUICK REVIEW

[论文解读] Fully Convolutional Networks for Semantic Segmentation

J. D. Long, Evan Shelhamer|arXiv (Cornell University)|Nov 14, 2014

Advanced Neural Network Applications被引用 2,812

一句话总结

该论文将分类网络转换为完全卷积网络（FCN），以进行端到端、像素级语义分割，在多个数据集上实现了最先进的结果，且训练高效、端到端，并采用跳连接架构来融合多尺度信息。

ABSTRACT

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.

研究动机与目标

证明并演示端到端训练的完全卷积网络能够在没有额外后处理或候选区域的情况下实现像素级语义分割。
将现有分类网络（AlexNet、VGG、GoogLeNet）转化为适合密集预测的 FCN，通过端到端微调实现适应。
Develop a skip architecture (FCN-32s, FCN-16s, FCN-8s) that combines coarse semantic information with fine appearance details to improve spatial precision.
在标准数据集（PASCAL VOC 2011/2012、NYUDv2、SIFT Flow）上评估该方法，并与此前的最先进方法进行比较。
分析端到端微调和多尺度融合对标准分割基准的影响。

提出的方法

通过将全连接层转换为卷积层并附加1x1卷积以在每个位置产生类别分数，将传统分类网络改造成完全卷积网络。
在网络中增加上采样（反卷积）层，从粗略输出中恢复密集像素级预测，并通过反向传播共同学习这些上采样滤波器。
引入跳连接架构，融合来自多层的预测（pool4/pool3 与 conv7），在保持高级语义的同时增强空间细节（FCN-32s、FCN-16s、FCN-8s）。
在分割数据上对改造后的网络进行微调，使用逐像素多项逻辑损失和平均交并比（IU）作为评估指标。
比较单通道 FCN 与跳连结 FCN，在多数据集上报告推理时间的改进和准确度提升。

实验结果

研究问题

RQ1端到端训练的 FCN 是否能在没有外部后处理或候选区域的情况下超越最先进的语义分割方法？
RQ2通过在网络中引入上采样，将分类网络改造为 FCN 是否能实现对分割任务的精确密集预测？
RQ3通过跳连接将粗深特征与细浅特征结合，是否能改善分割细节与准确性？
RQ4端到端微调和多尺度融合对标准分割基准（PASCAL VOC、NYUDv2、SIFT Flow）有何影响？

主要发现

FCN-8s 在 PASCAL VOC 2011 测试集的平均 IU 为 62.7，在 VOC 2012 测试集为 62.2，相较于之前的最先进 SDS，提升约 20% 的相对值。
FCN-16s 与 FCN-8s 相对于 FCN-32s 有所提升，平均 IU 分别在 PASCAL VOC 验证集上从 59.4 提升至 62.4 和 62.7，展示了跳连接带来的收益。
在 NYUDv2 上，RGB-HHA 与 RGB-HHA 融合模型的平均 IU 分别达到 32.8 和 34.0，超越了之前的方法；其中 RGB-HHA 搭配 FCN-16s 的平均 IU 达到 34.0。
在 SIFT Flow 上，FCN-16s 的平均 IU 为 39.5，FCN-8s 的平均 IU 表现也具有竞争力，呈现较强的语义与几何标注能力。
端到端训练并在网络内部进行上采样，推理速度快（对 500x500 输入仅约 175 ms），无需像超像素或 CRF 等后处理步骤。
通过跳连接将粗粒度语义信息与细粒度外观信息结合，得到边界细节与空间精度更高的细化分割。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。