QUICK REVIEW

[论文解读] Fast-SCNN: Fast Semantic Segmentation Network

Rudra P. K. Poudel, Stephan Liwicki|arXiv (Cornell University)|Feb 12, 2019

Advanced Neural Network Applications参考文献 26被引用 364

一句话总结

Fast-SCNN 在高分辨率图像上提供接近实时的语义分割，使用共享的早期特征提取器，在 Cityscapes 上以 123.5 fps 实现 68.0% mIoU，参数为 1.11M，ImageNet 预训练的收益极小。

ABSTRACT

The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

研究动机与目标

推动高分辨率影像和嵌入式设备的实时语义分割。
引入一个共享的早期特征提取器（学习下采样），以高效地结合细节与上下文。
设计一个低容量网络（1.11M 参数），使用深度可分离卷积和反向残差块。
证明在 ImageNet 上的预训练对于这种低容量模型的收益有限。

提出的方法

提出一个快速分割网络（Fast-SCNN），包含一个学习下采样模块，在两个分辨率分支之间共享早期卷积。
使用一个粗略的全局特征提取器，在降低分辨率时通过残余瓶颈块捕获上下文。
结合一个特征融合模块，通过简单相加将高分辨率的空间细节与低分辨率的全局上下文结合。
采用深度可分离卷积和反向残差块以减少参数和 FLOPs。
包含一个分类头，具有小型堆叠的深度可分离卷积，以及可能的 softmax 或 argmax 推理选项。

实验结果

研究问题

RQ1在嵌入式设备上，如何在不高内存需求的情况下实现高分辨率图像的实时语义分割？
RQ2在分辨率分支之间共享早期层的计算（学习下采样）是否在提高速度的同时保持准确性？
RQ3对于轻量级模型，网络容量和预训练对 Cityscapes 性能的影响是什么？

主要发现

Fast-SCNN 在 Cityscapes 上以 1024x2048 输入，在 Titan Xp (Pascal) 上达到 123.5 fps，mIoU 为 68.0%。
该模型约使用 1.11 百万参数，显著少于许多实时和离线方法。
一个学习下采样模块和一个单一跳连通连接实现高效的多分辨率特征共享与边界保留。
在 ImageNet 上进行预训练或添加粗糙的 Cityscapes 数据对于这款低容量网络仅带来微小的增益（约 0.5% mIoU）。
降低输入分辨率可提高 FPS（例如 1024x2048: 123.5 fps；512x1024: 285.8 fps；256x512: 485.4 fps），相应的 mIoU 为（68.0%, 62.8%, 51.9%）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。