QUICK REVIEW

[论文解读] Interlaced Sparse Self-Attention for Semantic Segmentation

Lang Huang, Yuhui Yuan|arXiv (Cornell University)|Jul 29, 2019

Advanced Neural Network Applications参考文献 60被引用 53

一句话总结

本文提出 Interlaced Sparse Self-Attention (IANet)，用于捕捉语义分割及相关任务中的长程上下文，在 Cityscapes、ADE20K、LIP 和 COCO 等数据集上相较基线和非局部方法显示出一致的改进。

ABSTRACT

In this paper, we present a so-called interlaced sparse self-attention approach to improve the efficiency of the \emph{self-attention} mechanism for semantic segmentation. The main idea is that we factorize the dense affinity matrix as the product of two sparse affinity matrices. There are two successive attention modules each estimating a sparse affinity matrix. The first attention module is used to estimate the affinities within a subset of positions that have long spatial interval distances and the second attention module is used to estimate the affinities within a subset of positions that have short spatial interval distances. These two attention modules are designed so that each position is able to receive the information from all the other positions. In contrast to the original self-attention module, our approach decreases the computation and memory complexity substantially especially when processing high-resolution feature maps. We empirically verify the effectiveness of our approach on six challenging semantic segmentation benchmarks.

研究动机与目标

通过有效建模长程上下文依赖来驱动并提升语义分割。
提出一种交错注意力机制，在长程与短程上下文聚合之间交替。
展示在语义分割、目标检测和实例分割等任务中的泛化能力。
通过广泛的消融实验和基准测试，将其与基线、非局部方法及相关注意力方法进行比较。

提出的方法

提出 Interlaced Sparse Self-Attention (IANet)，其通过级联长程与短程注意力块。
在骨干网络中用交错注意力替换或增强自注意力块，以捕获全局上下文。
在多个数据集上进行消融实验，将 IA 与基线、NL、RCCA 和 CGNL 进行对比。
在 Cityscapes、ADE20K 和 LIP 上进行分割评估，在 COCO 上使用 Mask-RCNN 进行检测/实例分割评估。
使用带 Dilated 卷积的 ImageNet 预训练骨干网络与辅助损失；采用多项式学习率策略和同步批量归一化。

实验结果

研究问题

RQ1交错注意力是否在跨越多样数据集的分割任务中，相较基线和非局部/自注意力方法提高分割性能？
RQ2在准确性和效率方面，交错注意力与其他上下文建模方法（NL、RCCA、CGNL）相比如何？
RQ3分区大小（L）以及长程与短程注意力的顺序对性能有何影响？
RQ4提出的 IA 技术能否推广至像 COCO with Mask-RCNN 这样的对象检测与实例分割任务？
RQ5在其他任务（如 CUB-200-2011 分类）中，增加多块 IA 会如何影响性能？

主要发现

在语义分割基准中，交错注意力相较基线和非局部方法提供了显著提升。
IANet 在 Cityscapes、ADE20K 和 LIP 上，较使用相似骨干网络的先前方法，达到最先进或具有竞争力的结果。
增加一个单独的交错注意力块，在 COCO 目标检测和实例分割方面相较 Mask-RCNN 基线取得持续的增益。
与 CGNL 和 NL 相比，IA 在 CUB-200-2011 上提供更好的 Top-1/Top-5 准确率，在 Cityscapes 消融中超越 RCCA。
分区大小和注意力阶段的顺序会影响性能，较大的分区以及 Long-Range then Short-Range 注意力组合效果最好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。