QUICK REVIEW

[论文解读] Hierarchical Multi-Scale Attention for Semantic Segmentation

Andrew Tao, Karan Sapra|arXiv (Cornell University)|May 21, 2020

Advanced Neural Network Applications参考文献 42被引用 346

一句话总结

本文提出一种分层多尺度注意力机制，用于融合多尺度语义分割预测，在推理时在保持内存效率的同时提高准确性并具灵活性；在 Cityscapes 和 Mapillary Vistas 上达到最先进的结果，并对 Cityscapes 使用硬自动标注。

ABSTRACT

Multi-scale inference is commonly used to improve the results of semantic segmentation. Multiple images scales are passed through a network and then the results are combined with averaging or max pooling. In this work, we present an attention-based approach to combining multi-scale predictions. We show that predictions at certain scales are better at resolving particular failures modes, and that the network learns to favor those scales for such cases in order to generate better predictions. Our attention mechanism is hierarchical, which enables it to be roughly 4x more memory efficient to train than other recent approaches. In addition to enabling faster training, this allows us to train with larger crop sizes which leads to greater model accuracy. We demonstrate the result of our method on two datasets: Cityscapes and Mapillary Vistas. For Cityscapes, which has a large number of weakly labelled images, we also leverage auto-labelling to improve generalization. Using our approach we achieve a new state-of-the-art results in both Mapillary (61.1 IOU val) and Cityscapes (85.1 IOU test).

研究动机与目标

在跨尺度的语义分割中，明确动机并解决细节与全局上下文之间的权衡。
开发一种内存高效的注意力机制，使其在像素级别学习如何对相邻尺度进行加权。
实现无需再训练即可使用可变尺度进行灵活推断。
通过对粗糙图像进行自动标注，提升在 Cityscapes 上的泛化能力。
展示在 Cityscapes 和 Mapillary Vistas 上的最先进性能。

提出的方法

引入一种分层注意力机制，预测相邻尺度之间的相对注意力，而不是对每个尺度的完整注意力掩码。
用相邻尺度对进行训练（例如 r=1.0 和 r=0.5），在推断时串联注意力以融合 N 个尺度。
使用共享网络干线并保留单独的语义头和注意力头；通过像素级乘法和加法将注意力掩码应用于融合多尺度预测。
为粗糙 Cityscapes 图像采用硬自动标注，以生成密集且高效的标签，从而改善泛化。
使用 DelOp DeepLab V3+ 风格的骨干网络（ResNet-50 或 HRNet-OCR），并通过随机缩放增强和类别平衡采样进行训练。
在 Cityscapes 和 Mapillary Vistas 上进行评估，以与平均融合和显式注意力基线进行比较。

实验结果

研究问题

RQ1分层相邻尺度注意力是否能够有效替代完整的多尺度注意力掩码，同时保持或提升分割精度？
RQ2在推断时启用未见过的尺度（超出训练尺度）是否在不重新训练的情况下提升性能？
RQ3硬自动标注对 Cityscapes 泛化和 IoU 分数有什么影响？
RQ4分层注意力与显式多尺度注意力方法相比，其内存和训练效率如何？
RQ5将分层注意力与自动标注结合时，在 Cityscapes 和 Mapillary Vistas 上的性能提升是多少？

主要发现

分层多尺度注意力在 Mapillary（51.6）和 Cityscapes（85.1 测试集）数据集上实现了比单尺度和平均池化基线更高的 IoU。
以分层注意力添加 0.25x 尺度将 Mapillary 的 IoU 提升 0.6，在 Cityscapes 上通过在不重新训练的情况下实现更细的细节提升。
分层方法在内存效率方面更高效，相对于单尺度仅需 1.25x 的训练 FLOPs，并支持带附加尺度的灵活推断。
对粗糙 Cityscapes 图像的硬自动标注在 Cityscapes IoU 上比基线提升约 1.1 个百分点，在与分层注意力结合时带来总增益。
在验证/测试阶段，该方法在 Cityscapes（85.1 IoU）和 Mapillary Vistas（61.1 IoU）上分别达到最先进的结果。
消融实验表明多尺度注意力优于标准 HRNet-OCR 基线，并且 MS Attention 与自动标注的组合提供最佳结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。