QUICK REVIEW

[论文解读] Towards High-Resolution Salient Object Detection

Yi Zeng, Pingping Zhang|arXiv (Cornell University)|Aug 20, 2019

Visual Attention and Saliency Detection参考文献 53被引用 37

一句话总结

本文提出了首个高分辨率显著对象检测数据集 HRSOD，以及一个三分支网络 (GSN, LRN, GLFN) 可直接在极高分辨率图像中检测显著对象，无需后处理。它在 HRSOD 上显示出最先进的性能，在标准低分辨率基准上也具竞争力。

ABSTRACT

Deep neural network based methods have made a significant breakthrough in salient object detection. However, they are typically limited to input images with low resolutions ($400 imes400$ pixels or less). Little effort has been made to train deep neural networks to directly handle salient object detection in very high-resolution images. This paper pushes forward high-resolution saliency detection, and contributes a new dataset, named High-Resolution Salient Object Detection (HRSOD). To our best knowledge, HRSOD is the first high-resolution saliency detection dataset to date. As another contribution, we also propose a novel approach, which incorporates both global semantic information and local high-resolution details, to address this challenging task. More specifically, our approach consists of a Global Semantic Network (GSN), a Local Refinement Network (LRN) and a Global-Local Fusion Network (GLFN). GSN extracts the global semantic information based on down-sampled entire image. Guided by the results of GSN, LRN focuses on some local regions and progressively produces high-resolution predictions. GLFN is further proposed to enforce spatial consistency and boost performance. Experiments illustrate that our method outperforms existing state-of-the-art methods on high-resolution saliency datasets by a large margin, and achieves comparable or even better performance than them on widely-used saliency benchmarks. The HRSOD dataset is available at https://github.com/yi94code/HRSOD.

研究动机与目标

通过在极高分辨率图像上实现直接的训练和推理，弥合高分辨率显著对象检测的空缺。
提供一个大规模、丰富标注的高分辨率数据集（HRSOD），以促进研究。
提出一种全局到局部的体系结构范式，在保留高分辨率细节的同时，利用全局上下文。

提出的方法

引入一个三分支架构：Global Semantic Network (GSN) 用于粗略全局显著性，Local Refinement Network (LRN) 用于高分辨率的局部精细化，Global-Local Fusion Network (GLFN) 用于高分辨率融合和空间一致性。
对 GSN 使用下采样输入以捕捉全局语义，并使用 attended patch sampling (APS) 选择需进行 LRN 精细化的不确定区域。
通过在 LRN 解码路径与对应的 GSN 特征拼接，将来自 GSN 的语义引导并入 LRN。
训练一个轻量级 GLFN，采用密集连接卷积，在保留细节的同时，将高分辨率输入与 GSN/LRN 输出融合。
提出 Attended Patch Sampling (APS)，以在 GSN 输出引导下将 LRN 的关注点聚焦于不确定区域。
提供一个可选的 GSN+APS+LRN+CRF 变体，以与后处理的细化进行对比。

实验结果

研究问题

RQ1是否可以通过神经网络直接学习高分辨率的显著性而无需后处理？
RQ2全局语义引导是否能改善高分辨率显著性检测中的局部精细化？
RQ3通过 APS 将精细化聚焦于不确定区域是否比均匀的 Patch 抽样更有效？
RQ4所提出的 Global-Local Fusion Network (GLFN) 在保持高分辨率细节和空间一致性方面表现如何？
RQ5在高分辨率数据集（HRSOD）上与标准低分辨率显著性基准相比，该方法的表现如何？

主要发现

所提出的方法在新的高分辨率数据集 HRSOD 上相对先进方法取得了显著优势。
在广泛使用的低分辨率显著性基准上，该方法达到与最先进方法相当或更好的性能。
APS 相较随机 Patch 采样显著提升了精细化，并且对 Patch 数量具有鲁棒性。
GLFN 提供强大的高分辨率融合，模型规模非常小（11.9 KB），并且对高分辨率输入的推理很快。
与基于 CRF 的后处理相比，LRN 结合 APS 和 GLFN 能获得更好的边界质量（边界位移误差更低）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。