QUICK REVIEW

[论文解读] Pyramid Attention Network for Semantic Segmentation

Hanchao Li, Pengfei Xiong|arXiv (Cornell University)|May 25, 2018

Advanced Neural Network Applications参考文献 26被引用 235

一句话总结

PAN 将 Feature Pyramid Attention 和 Global Attention Upsample 结合起来，以利用全局上下文和多尺度特征，在 VOC 2012（84.0%）和 Cityscapes 上无需 COCO 预训练即可达到最先进的 mIoU。

ABSTRACT

A Pyramid Attention Network(PAN) is proposed to exploit the impact of global contextual information in semantic segmentation. Different from most existing works, we combine attention mechanism and spatial pyramid to extract precise dense features for pixel labeling instead of complicated dilated convolution and artificially designed decoder networks. Specifically, we introduce a Feature Pyramid Attention module to perform spatial pyramid attention structure on high-level output and combining global pooling to learn a better feature representation, and a Global Attention Upsample module on each decoder layer to provide global context as a guidance of low-level features to select category localization details. The proposed approach achieves state-of-the-art performance on PASCAL VOC 2012 and Cityscapes benchmarks with a new record of mIoU accuracy 84.0% on PASCAL VOC 2012, while training without COCO dataset.

研究动机与目标

通过利用全局上下文信息提升语义分割性能，而不依赖扩张卷积或复杂解码器。
引入一个使用高层上下文来引导低层定位的轻量级解码器。
设计并整合一个 Feature Pyramid Attention 模块，将多尺度上下文与像素级注意力融合。
开发一个 Global Attention Upsample 模块，在全局上下文的引导下重建高分辨率预测。
在 VOC 2012 和 Cityscapes 上展示无需 COCO 预训练的最先进性能。

提出的方法

引入 Feature Pyramid Attention (FPA) 以利用具有 3x3、5x5、7x7 卷积和全局池化分支的金字塔结构来融合多尺度上下文，并将注意力与原始特征相乘以保留定位信息。
提出 Global Attention Upsample (GAU) 作为解码器，利用来自高层特征的全局上下文在逐步上采样之前对低层特征进行加权。
以带有扩张卷积的 ResNet-101 作为编码器骨干（res5b 的速率为 2）。
用三层 3x3 卷积替换 7x7 的 ResNet-101 层，以降低参数量。
使用标准交叉熵损失、SGD 和多项式学习率策略进行训练，并进行数据增强（翻转和缩放）。
证明在相同输出步长下，FPA 能超越 PSPNet 和 DeepLabv3；并且 GAU 与 FPA 结合时能够改善定位。

实验结果

研究问题

RQ1一个基于金字塔注意力的模块是否能在不使用重型扩张卷积或复杂解码器的情况下提供像素级的多尺度上下文？
RQ2一个由全局上下文引导的上采样解码器是否在最小计算成本下改善边界定位？
RQ3FPA 和 GAU 单独及联合如何影响在标准基准如 VOC 2012 和 Cityscapes 上的性能？
RQ4在不使用 COCO 预训练的情况下，用 PAN 架构对 VOC 2012 和 Cityscapes 的训练会带来何种影响？

主要发现

Method	MS	Flip	mean IoU(%)	Pixel Acc.(%)
PAN			79.38	95.25
PAN	是		80.77	95.65
PAN	是	是	81.19	95.75

带有平均池化与 3x3/5x5/7x7 内核以及全局池化分支的 FPA 相对于基线 ResNet-101 能带来显著提升，在 VOC 2012 验证集上达到 78.37–78.37% mean IoU，取决于配置。
当 GAU 与 GAU 结合时，Global Attention Upsample 将 VOC 2012 验证集的 mean IoU 从 72.60% 提升到 77.84%（以及在 GAU+FPA 变体下达到 78.37%）。
PAN 在未进行 COCO 预训练时在 VOC 2012 测试集达到了 84.0% mean IoU，优于若干在可比设置下列出的方法（如 EncNet、PSPNet、DeepLabv3 等）。
在 Cityscapes 上，PAN 在未使用粗略注释的测试集上实现了 78.6% mean IoU，略高于若干先前方法。
相比依赖 COCO 数据进行训练的方法（如 Global Convolution Network），PAN 在未进行 COCO 预训练的情况下也能达到具有竞争力的性能。
消融研究表明 FPA 中的平均池化优于最大池化，且引入全局池化分支和更大内核（3x3、5x5、7x7）可提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。