QUICK REVIEW

[论文解读] Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhen-Liang Ni, Xinghao Chen|arXiv (Cornell University)|May 10, 2024

Advanced Image and Video Retrieval Techniques被引用 6

一句话总结

CGRSeg 引入 Rectangular Self-Calibration Module（矩形自校准模块）和 Dynamic Prototype Guided head（动态原型引导头），实现高效的金字塔上下文语义分割；在 ADE20K 上以 4.0 GFLOPs 实现 43.6% 的 mIoU。

ABSTRACT

Semantic segmentation is an important task for numerous applications but it is still quite challenging to achieve advanced performance with limited computational costs. In this paper, we present CGRSeg, an efficient yet competitive segmentation framework based on context-guided spatial feature reconstruction. A Rectangular Self-Calibration Module is carefully designed for spatial feature reconstruction and pyramid context extraction. It captures the axial global context in both horizontal and vertical directions to explicitly model rectangular key areas. A shape self-calibration function is designed to make the key areas closer to foreground objects. Besides, a lightweight Dynamic Prototype Guided head is proposed to improve the classification of foreground objects by explicit class embedding. Our CGRSeg is extensively evaluated on ADE20K, COCO-Stuff, and Pascal Context benchmarks, and achieves state-of-the-art semantic performance. Specifically, it achieves $43.6\%$ mIoU on ADE20K with only $4.0$ GFLOPs, which is $0.9\%$ and $2.5\%$ mIoU better than SeaFormer and SegNeXt but with about $38.0\%$ fewer GFLOPs. Code is available at https://github.com/nizhenliang/CGRSeg.

研究动机与目标

在有限计算资源下推动高效的语义分割。
设计能够提升前景定位和金字塔上下文提取的模块。
开发轻量化组件，提升边界勾勒和类别区分。
在 ADE20K、COCO-Stuff 和 Pascal Context 上展示在降低 FLOPs 的情况下的最先进性能。

提出的方法

提出 CGRSeg，一个具备金字塔上下文提取、空间特征重建和轻量化头部的框架。
引入 Rectangular Self-Calibration Module (RCM)，通过水平与垂直池化捕获轴向全局上下文，并通过大核带状卷积实现形状自校准。
应用形状自校准函数，将注意力区域与前景特征对齐。
使用局部细节增强融合路径，将注意力特征与输入特征融合。
开发 Dynamic Prototype Guided (DPG) 头，以嵌入类别信息并计算用于图像特定类别判别的动态原型。
利用堆叠式 RCM 进行金字塔特征交互，并使用下采样的多尺度特征形成金字塔上下文（P）。
对解码器特征和类别嵌入进行投影，以细化像素级表示并提升前景分类。

实验结果

研究问题

RQ1如何在轻量级分割骨干中高效建模面前景为中心的上下文？
RQ2矩形、轴向引导的注意力机制是否比传统注意力块更有效地捕获金字塔上下文？
RQ3动态类别原型是否在不增加显著计算负担的情况下提升逐像素判别？
RQ4将金字塔上下文提取与空间特征重建结合对标准基准的影响如何？

主要发现

方法	mIoU	FLOPs(G)	参数(M)	吞吐量(Img/s)
DeeplabV3+ (ECCV’18)	34.0	69.4	15.4	63.0
Segformer-B0 (NeurIPS’21)	37.4	8.4	3.8	117.1
FeedFormer-B0 (AAAI’23)	39.2	7.8	4.5	110.3
SegNeXt-T (NeurIPS’22)	41.1	6.6	4.3	123.5
Seaformer-L (ICLR’23)	42.7	6.5	14.0	142.3
PEM-STDC1 (CVPR’24)	39.6	16.0	17.0	-
CGRSeg-T (Ours)	43.6	4.0	9.4	138.4
DeeplabV3+ ECCV’18	44.1	255.1	62.7	21.6
EncNet (CVPR’18)	44.7	218.8	68.6	23.4
CCNet (ICCV’19)	45.2	278.4	68.9	23.2
Segformer-B1 (NeurIPS’21)	42.2	15.9	13.7	96.0
SegNeXt-S (NeurIPS’22)	44.3	15.9	13.9	91.1
FeedFormer-B1 (AAAI’23)	41.0	10.0	4.6	87.2
PEM-STDC2 (CVPR’24)	45.0	19.3	21.0	-
CGRSeg-B (Ours)	45.5	7.6	18.1	98.4
Segformer-B2 (NeurIPS’21)	46.5	62.4	27.5	70.4
SegNeXt-B (NeurIPS’22)	47.7	74.0	63.0	-
FeedFormer-B2 (AAAI’23)	48.0	42.7	29.1	56.9
LRFormer-T (arXiv’23)	46.7	17.0	13.0	-
CGRSeg-L (Ours)	48.3	14.9	35.7	73.0

CGRSeg 在 ADE20K 上以 4.0 GFLOPs（微型模型）达到 43.6% mIoU。
CGRSeg-T 在 ADE20K 上优于 SeaFormer 和 SegNeXt，且 FLOPs 显著更少（分别增加 0.9% 和 2.5% 的 mIoU 增益）。
CGRSeg-B 与 CGRSeg-L 获得更高的 mIoU（分别为 45.5% 和 48.3%），在模型之间的 FLOPs 也具有竞争力。
在 COCO-Stuff 上，CGRSeg-T 以 4.0 GFLOPs 达到 42.2% mIoU，CGRSeg-L 以 14.9 GFLOPs 达到 46.0% mIoU。
在 Pascal Context 上，CGRSeg-T 以 4.0 GFLOPs 实现 54.1% mIoU，CGRSeg-L 以 14.9 GFLOPs 实现 58.5% mIoU。
消融研究显示 RCM 与 DPG Head 贡献叠加增益：基线 40.86% mIoU；加入 RCM(PCE) + RCM(SFR) + DPG Head 后达到 43.60% mIoU。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。