QUICK REVIEW

[论文解读] DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

Hanchao Li, Pengfei Xiong|arXiv (Cornell University)|Apr 3, 2019

Advanced Neural Network Applications参考文献 35被引用 57

一句话总结

DFANet 引入深度特征聚合与子网络及子阶段级联，以实现高分辨率的实时语义分割，FLOPs 显著降低且精度具有竞争力。

ABSTRACT

This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8$ imes$ less FLOPs and 2$ imes$ faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3\% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3\% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image.

研究动机与目标

在受限计算条件下解决高分辨率图像的实时语义分割挑战。
开发一个轻量而具辨别性的特征聚合机制，用于融合多尺度上下文与空间细节。
通过复用高级特征并跨阶段与网络聚合特征，平衡推理速度与精度。
证明级联的多骨干设计在提升速度的同时仍保持有竞争力的 mIoU。

提出的方法

使用基于 Xception 的轻量骨干，采用深度可分卷积。
引入子网络聚合：堆叠骨干网络，使一个骨干网络的输出传递给下一个以细化高层特征。
引入子阶段聚合：在骨干之间对应阶段融合特征，以保留空间细节和上下文。
在骨干末端附加一个 FC-注意力模块，以在最小计算的前提下增大感受野。
使用一个轻量解码器，通过上采样和简单卷积融合高层与低层特征。
使用标准交叉熵损失和数据增强进行训练，采用 SGD 及多项式学习率策略。

实验结果

研究问题

RQ1在实时约束下，跨网络层级和阶段的深度特征聚合是否能提升分割精度？
RQ2堆叠多个轻量骨干并在阶段层级进行融合对精度与 FLOPs 有何影响？
RQ3在 Cityscapes 和 CamVid 上，DFANet 与现有实时分割方法在速度和精度方面有何比较？
RQ4在轻量骨干用于语义分割时，FC 注意力的作用是什么？
RQ5输入分辨率、骨干复杂度与整体性能之间的权衡是什么？

主要发现

模型	输入分辨率	FLOPs	参数	时间(毫秒)	帧率(fps)	mIoU(%)
SegNet	640 × 360	286G	29.5M	217	46	46.4
DPN	?	830G	1?M	-	-	60.1
DeepLab	512 × 1024	457.8G	262.1M	4000	0.25	63.1
ENet	640 × 360	3.8G	0.4M	-	-	51.3
ICNet	1024 × 2048	28.3G	26.5M	33	30.3	69.5
TwoColumn	512 × 1024	57.2G	-	68	14.7	72.9
BiSeNet1	768 × 1536	14.8G	5.8M	13	~	68.4
BiSeNet2	768 × 1536	55.3G	49M	21	~	74.7
DFANet A	1024 × 1024	3.4G	7.8M	10	100	71.3
DFANet B	1024 × 1024	2.1G	4.8M	8	120	67.1
DFANet A’	512 × 1024	1.7G	7.8M	6	160	70.3

DFANet 在 Cityscapes 验证集上以 3.4 GFLOPs、Backbone A x3+HL+LL 获得 71.9% mIoU；Backbone B x3+HL+LL 在 2.1 GFLOPs 时为 68.4% mIoU。
在 Cityscapes 测试上，DFANet A 在 3.4 GFLOPs 与 100 FPS 下达到 71.3% mIoU，DFANet A’ 在 1.7 GFLOPs 与 160 FPS 下达到 70.3% mIoU。
与此前的实时方法相比，DFANet 在 FLOPs 上最多更小 8 倍，速度最多提升 2 倍，同时保持有竞争力的精度。
DFANet 在 Cityscapes 上优于许多实时基线，同时使用明显更少的 FLOPs（如 1.7G–3.4G FLOPs 变体，70–71% mIoU）。
CamVid 的结果显示 DFANet A 达到 120 FPS，DFANet B 达到 160 FPS，且高分辨率视频帧的 mIoU 具有竞争力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。