QUICK REVIEW

[论文解读] SPGNet: Semantic Prediction Guidance for Scene Parsing

Bowen Cheng, Liang-Chieh Chen|arXiv (Cornell University)|Aug 26, 2019

Human Pose and Action Recognition参考文献 78被引用 35

一句话总结

SPGNet 在一个两阶段的编码-解码网络中引入了 Semantic Prediction Guidance (SPG) 模块，通过像素级语义监督对局部特征进行再加权，在 Cityscapes 上实现高效的强劲结果。

ABSTRACT

Multi-scale context module and single-stage encoder-decoder structure are commonly employed for semantic segmentation. The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path. In contrast, multi-stage encoder-decoder networks have been widely used in human pose estimation and show superior performance than their single-stage counterpart. However, few efforts have been attempted to bring this effective design to semantic segmentation. In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction. We find that by carefully re-weighting features across stages, a two-stage encoder-decoder network coupled with our proposed SPG module can significantly outperform its one-stage counterpart with similar parameters and computations. Finally, we report experimental results on the semantic segmentation benchmark Cityscapes, in which our SPGNet attains 81.1% on the test set using only 'fine' annotations.

研究动机与目标

以高效的多阶段架构推动语义分割。
提出 SPG 模块，通过像素级语义预测来对特征进行再加权。
探索多阶段编码-解码器网络以改善边界和上下文融合。
在 Cityscapes 上进行评估，以证明准确性和效率提升。
提供消融研究和可视化以解释 SPG 机制。

提出的方法

引入使用 supervise-and-excite 框架的 SPG 模块，以从第一阶段预测生成 Guided Attention。
使用带有 Cross Stage Feature Aggregation 的两阶段编码-解码器以提升后阶段。
设计一个带残差块的轻量级上采样模块，以实现高效的特征融合。
通过一个 1x1 卷积生成逐像素逐通道掩模来计算 Guided Attention，并相应地对解码器特征进行再加权。
使用最终阶段和中间阶段 logits 的损失进行监督训练。
与 Cityscapes 的最新方法进行对比，并进行大量消融研究和可视化。

实验结果

研究问题

RQ1在像素级语义预测引导下，SPG 模块是否能改善特征再加权与分割精度？
RQ2在参数与计算量相近的情况下，带 SPG 的两阶段编码-解码器能否优于单阶段对比？
RQ3在 Cityscapes 上，SPGNet 在准确性与效率方面与 DenseASPP 与 DANet 相比如何？
RQ4SPG 组件（监督、恒等映射、激发机制）对整体性能的贡献是什么？
RQ5在结合 SPG 时，多阶段编码-解码网络对语义分割是否有益？

主要发现

SPGNet 在 Cityscapes 测试集仅使用精细注释就达到 81.1% 的 mean IoU。
SPGNet 在 Cityscapes 测试在大多数类别上优于 DenseASPP，且运算量约为 DANet 的 22.7%。
采用 ResNet-50 主干的两阶段 SPGNet 在保持强准确性的同时，FLOPs 和参数量显著低于许多顶尖方法。
消融结果显示基于 sigmoid 的 SPG 激励结合监督和恒等路径在验证集上取得最佳 mIoU（ResNet-18 时为 77.67%）。
引导注意力图提供可解释的再加权，并可视化对象定位以及在相似类别之间的辨识。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。