QUICK REVIEW

[论文解读] Expectation-Maximization Attention Networks for Semantic Segmentation

Xia Li, Zhisheng Zhong|arXiv (Cornell University)|Jul 31, 2019

Advanced Neural Network Applications参考文献 38被引用 113

一句话总结

EMA 通过 EM 迭代分析注意力，以学习像素级表示的紧凑基集，产生一个轻量、鲁棒的语义分割模块（EMAU），在标准基准上以降低的计算量和内存占用提升性能。

ABSTRACT

Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. We conduct extensive experiments on popular semantic segmentation benchmarks including PASCAL VOC, PASCAL Context and COCO Stuff, on which we set new records.

研究动机与目标

激励语义分割利用长程依赖，同时降低注意力的计算开销。
将自注意力重新表述为一个 EM 过程，以学习注意力图的紧凑基集。
开发一个轻量级的 EMAU 模块，能够方便地集成到 CNN 骨干网络中。

提出的方法

将注意力重新表述为一个期望-最大化（EM）过程，其中注意力图为潜在变量，基底为待学习的参数。
使用 EMA 迭代地估计职责（E 步）并更新基底（M 步），以获得输入特征的低秩重建。
引入数据再估计，从学习到的基底生成紧凑、对噪声鲁棒的表示。
将 EMA 嵌入到一个神经网络模块（EMAU）中，在 EMA 核心周围设置两个 1x1 卷积并引入残差连接。
通过跨小批量对初始基底进行移动平均更新来实现基底维护，并对基底进行欧几里得归一化以稳定训练。
提供关于迭代次数、维护策略和归一化的消融研究，以验证设计选择。

实验结果

研究问题

RQ1基于 EM 风格的迭代注意力能否学习出紧凑、鲁棒的基集，以相比标准自注意力或非局部块减少计算量？
RQ2EMAU 模块是否在降低 FLOPs 和内存使用的同时提升标准基准的分割准确率？
RQ3基底的初始化、维护（移动平均）和归一化（L2Norm）如何影响训练稳定性和性能？

主要发现

EMAU 在 PASCAL VOC、PASCAL Context 和 COCO Stuff 上实现了具有竞争力或最先进的平均 IoU，且比某些基线具有更低的计算成本。
EM 风格的注意力将复杂度从 O(N^2) 降至 O(NK)，其中 K<<N，通常在若干次迭代内收敛（T ~ 3）。
移动平均基底维护和 L2 归一化在训练稳定性和性能上优于替代策略（如基于梯度的更新或 LN）。
与 Non-local 和 A^2 块相比，EMAU 提供类似或更好的性能，同时减少了内存和 FLOPs。
可视化显示学习到的基底对应于超出简单前景/背景分离的有意义的语义概念。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。