QUICK REVIEW

[论文解读] The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

Simon Jégou, Michal Drozdzal|arXiv (Cornell University)|Nov 28, 2016

Advanced Neural Network Applications被引用 10

一句话总结

本文提出 FC-DenseNet，一种完全卷积、完全连接的 U-Net 类架构，通过扩展 DenseNets 实现语义分割。通过在下采样和上采样路径中均应用带有跳跃连接的密集块，并仅将上采样限制在每个分辨率的最后一个密集块，该模型在 CamVid 和 Gatech 上实现了最先进的性能，参数量显著减少（少于 1000 万），且无需后处理或预训练。

ABSTRACT

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions. Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train. In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets. Code to reproduce the experiments is available here : https://github.com/SimJeg/FC-DenseNet/blob/master/train.py

研究动机与目标

将 DenseNets 扩展为无需依赖后处理或预训练的全卷积网络，以实现语义分割。
通过仅对每个分辨率的最后一个密集块进行上采样，解决在 DenseNet 中对所有特征图进行朴素上采样导致的计算不可行性问题。
通过利用密集连接和跳跃连接实现特征重用与多尺度监督，以最小参数量实现高性能。
证明全卷积的 DenseNet 能够在城市场景分割基准上超越现有 SOTA 模型。

提出的方法

该架构在下采样路径中使用密集块，通过迭代连接特征图，实现特征重用并隐式实现深度监督。
设计了一种自定义的上采样路径，仅对每个分辨率最后一个密集块的特征图进行上采样，防止特征图数量呈指数级增长。
在下采样和上采样路径中对应层之间使用跳跃连接，以保留细粒度的空间细节。
使用标准交叉熵损失和 Softmax 输出，以端到端方式训练网络。
通过过渡层（下采样/上采样过渡层）控制特征图尺寸，降低计算成本。
最终架构为深度全卷积网络，层数在 56 到 103 层之间，具体取决于配置（例如，FC-DenseNet103）。

实验结果

研究问题

RQ1DenseNet 的密集连接与特征重用机制能否有效扩展至全卷积语义分割网络？
RQ2全卷积的 DenseNet 架构是否能在城市场景数据集上实现 SOTA 性能，且无需后处理或预训练？
RQ3能否在保持性能的同时缓解在 DenseNet 中对所有特征图进行上采样带来的计算成本？
RQ4与 FCN、U-Net 或 DeepLab 等现有全卷积模型相比，FC-DenseNet 的参数效率如何？
RQ5仅使用 2D 卷积，该模型能否在视频分割任务中表现出良好的泛化能力？

主要发现

在 CamVid 数据集上，FC-DenseNet103 的平均交并比（mIoU）达到 66.9%，超越了无需后处理或预训练的先前 SOTA 模型。
在 CamVid 上，该模型实现了 91.5% 的全局准确率，显著优于先前模型（如 Dilation8 (+FSO) 的 88.3%）。
在 Gatech 数据集上，FC-DenseNet103 达到 79.4% 的全局准确率，相较于使用 2D 卷积的先前 SOTA 提升了 23.7%，相较于 3D 空间-时间模型也提升了 3.4%。
该模型仅使用了 940 万个参数，相比 SOTA 模型（如 Dilation8 (+FSO)，使用 1.408 亿参数）减少了约 10 倍。
该架构在视频分割任务中表现出强大的泛化能力，仅使用 2D 卷积即在 Gatech 上达到 79.4% 的准确率，优于使用时间信息训练的 3D 模型。
消融研究证实，即使不添加额外的监督头，模型性能依然稳定，表明密集连接模式带来了隐式的深度监督。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。