QUICK REVIEW

[论文解读] Spatial-Assistant Encoder-Decoder Network for Real Time Semantic Segmentation

Yalun Wang, Shidong Chen|arXiv (Cornell University)|Sep 19, 2023

Advanced Neural Network Applications被引用 8

一句话总结

SANet 将编码器-解码器与双路径设计结合，提出 APPPM 用于多尺度上下文，SAD 用于高效解码器注意力；在 Cityscapes 和 CamVid 上实现具竞争力的 mIoU 与较高的 FPS。

ABSTRACT

Semantic segmentation is an essential technology for self-driving cars to comprehend their surroundings. Currently, real-time semantic segmentation networks commonly employ either encoder-decoder architecture or two-pathway architecture. Generally speaking, encoder-decoder models tend to be quicker,whereas two-pathway models exhibit higher accuracy. To leverage both strengths, we present the Spatial-Assistant Encoder-Decoder Network (SANet) to fuse the two architectures. In the overall architecture, we uphold the encoder-decoder design while maintaining the feature maps in the middle section of the encoder and utilizing atrous convolution branches for same-resolution feature extraction. Toward the end of the encoder, we integrate the asymmetric pooling pyramid pooling module (APPPM) to optimize the semantic extraction of the feature maps. This module incorporates asymmetric pooling layers that extract features at multiple resolutions. In the decoder, we present a hybrid attention module, SAD, that integrates horizontal and vertical attention to facilitate the combination of various branches. To ascertain the effectiveness of our approach, our SANet model achieved competitive results on the real-time CamVid and cityscape datasets. By employing a single 2080Ti GPU, SANet achieved a 78.4 % mIOU at 65.1 FPS on the Cityscape test dataset and 78.8 % mIOU at 147 FPS on the CamVid test dataset. The training code and model for SANet are available at https://github.com/CuZaoo/SANet-main

研究动机与目标

推动实现平衡精度与速度的实时语义分割。
开发将编码器-解码器与双路径理念融合的混合架构。
引入模块在不牺牲速度的前提下保留空间信息与多尺度上下文。
在 Cityscapes 与 CamVid 数据集上展示具竞争力的性能。

提出的方法

提出 SANet，以一个主要的编码器-解码器骨干为主，并增加一个空间扩张通道以维持高分辨率特征。
在编码端引入 APPPM（Asymmetric Pooling Pyramid Pooling Module），通过非对称的聚合形状和聚合后的 1x1 卷积来捕捉多尺度上下文。
设计 SAD（Simple Attention Decoder），利用通过非对称 1x3 与 3x1 卷积学习的水平与垂直注意力，将高分辨率与低分辨率特征融合。
使用单个轻量级解码器在保持速度的同时利用多分支语义信息。
采用 ImageNet 预训练后再进行分割训练；采用多项式学习率策略并使用辅助/边界损失进行监督。

实验结果

研究问题

RQ1混合型的 SANet 架构是否在实时语义分割中优于纯编码器-解码器或双路径模型？
RQ2APPPM 相较于标准的 PPM/ASPP 方法，是否提升了多尺度上下文特征提取？
RQ3Simple Attention Decoder 是否能有效地将高分辨率与低分辨率特征融合以在不损害 FPS 的前提下提升 mIoU？
RQ4在 Cityscapes 与 CamVid 上，SANet 在准确性与推理速度方面的对比性能如何？

主要发现

SANet 在 Cityscapes 与 CamVid 上实现具竞争力的 mIoU 与高 FPS（例如 Cityscapes：78.4 mIoU，65.1 FPS；CamVid：78.8 mIoU，147 FPS，测试集）。
APPPM 相较于标准基于池化的模块提供更好的多尺度特征提取，带来更高的 mIoU（消融实验显示 APPPM 优于 PPM）。
SAD 通过水平与垂直注意力有效地融合来自多个分支的特征，降低信息损失、提升准确性。
消融研究表明 APPPM 与 SAD 的组合在 mIoU 方面优于基线的编码器+分支方法，同时保持实时速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。