QUICK REVIEW

[论文解读] Reverse Attention for Salient Object Detection

Shuhan Chen, Xiuli Tan|arXiv (Cornell University)|Jul 26, 2018

Visual Attention and Saliency Detection参考文献 45被引用 57

一句话总结

提出一种紧凑的显著对象检测网络，使用由反向注意力引导的侧输出残差学习，逐步细化显著性图，在模型规模较小（约81 MB）且实时速度（约45 FPS）的情况下实现高精度。

ABSTRACT

Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).

研究动机与目标

在保持嵌入式或实时应用精度的同时，降低显著性图的分辨率。
开发一个参数有限的轻量化架构，能够与最先进方法匹敌。
引入反向注意力以引导残差学习，覆盖完整的对象部件与边界。
在多个数据集上展示实时性能和更小的模型尺寸。

提出的方法

以 HED/VGG-16 主干为基础，提供五个分辨率逐步提升的侧输出阶段。
引入侧输出残差学习，使用少量参数逐步细化显著性。
嵌入一个自顶向下的反向注意力块，抹除当前预测以引导残差学习聚焦于缺失区域。
在每个侧输出处进行深度监督，使用像素级类别平衡的交叉熵损失。
避免融合层；在经过 sigmoid 激活后，使用第一组侧输出作为最终预测。

实验结果

研究问题

RQ1在没有重度多尺度融合的情况下，轻量级的残差细化策略是否能够提升显著性图？
RQ2反向注意力是否能有效引导残差学习，恢复未检测到的对象部件和边界？
RQ3残差深度（D）对准确性和效率的影响是什么？
RQ4在多样基准上的 F-measure 和 MAE，所提出的方法与最先进方法相比如何？
RQ5该方法是否能够在低内存需求下实现实时性能？

主要发现

所提出的模型在保持轻量级（81 MB）的同时，达到可与最先进方法竞争的性能。
与不含 RA 的基线相比，反向注意力显著提升 F-measure 和 MAE，消融实验中平均提升约 1.4% 的 F-measure，MAE 降低约 0.5%。
消融实验显示随着引入更多侧输出残差，性能提升，在关键数据集上 D=2 呈现最佳结果。
模型在标准 GPU 上约以 45 FPS 运行，在速度上超越若干同类，同时保持高质量的显著性图。
在六个基准数据集（MSRA-B、HKU-IS、ECSSD、PASCAL-S、SOD、DUT-OMRON）上的实验显示，即使不进行如 CRF 等后处理，也能得到有利的定量与定性结果。
该方法强调简洁性与高效性，为嵌入式设备上的实时显著对象检测提供了一个实用选项。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。