Skip to main content
QUICK REVIEW

[论文解读] Suppress and Balance: A Simple Gated Network for Salient Object Detection

Xiaoqi Zhao, Youwei Pang|arXiv (Cornell University)|Jul 16, 2020
Visual Attention and Saliency Detection参考文献 73被引用 40
一句话总结

GateNet 引入多层门控单元以平衡并抑制编码器贡献,加上 Fold-ASPP 在双分支解码器中,在五个数据集上实现了实时速度的最先进显著目标检测。

ABSTRACT

Most salient object detection approaches use U-Net or feature pyramid networks (FPN) as their basic structures. These methods ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control between them, the other is without considering the disparity of the contributions of different encoder blocks. In this work, we propose a simple gated network (GateNet) to solve both issues at once. With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder. We design a novel gated dual branch structure to build the cooperation among different levels of features and improve the discriminability of the whole network. Through the dual branch design, more details of the saliency map can be further restored. In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales. Extensive experiments on five challenging datasets demonstrate that the proposed model performs favorably against most state-of-the-art methods under different evaluation metrics.

研究动机与目标

  • Motivate and address interference and uneven contribution from encoder blocks in U-Net/FPN-based SOD models.
  • Propose a simple gated network (GateNet) with multilevel gate units to balance information flow from encoder to decoder.
  • Introduce a dual-branch decoder architecture to recover details and improve saliency map quality.
  • Develop Fold-ASPP (Folded ASPP) to capture multiscale context while maintaining local correlations.
  • Demonstrate state-of-the-art performance on five challenging SOD datasets and show real-time inference speed.

提出的方法

  • Build GateNet on a feature pyramid (FPN) backbone with five gate units inserted between transition layers and decoder blocks.
  • Compute two gate values per level by concatenating encoder and decoder (or transition) features, then apply these gates to weight FPN and parallel branches.
  • Introduce a dual-branch decoder: an FPN-based branch for main saliency prediction and a parallel branch that fuses gated encoder features to recover details.
  • Propose Fold-ASPP: a folded atrous spatial pyramid pooling module that uses a Fold operation to create local 2x2 regions before applying dilated convolutions, enhancing multiscale context.
  • Use a residual parallel connection to combine the FPN and parallel branches into the final saliency map with a sigmoid output.
  • Train with multi-supervision using cross-entropy losses on the FPN branch output and the final fused output.

实验结果

研究问题

  • RQ1How can inter-block interference between encoder and decoder be controlled in salient object detection models?
  • RQ2Can gate-based information flow modulation improve the utilization of encoder features for saliency prediction?
  • RQ3Does a dual-branch decoder plus Fold-ASPP better capture multiscale context and fine details than single-branch decoders?
  • RQ4What is the impact of multilevel gate units and Fold-ASPP on accuracy and boundary quality across standard SOD datasets?

主要发现

  • GateNet consistently outperforms seventeen state-of-the-art SOD methods across five challenging datasets on metrics including F-measure, S-measure, and MAE.
  • Multilevel gate units balance contributions from encoder blocks and suppress background interference, improving saliency discrimination.
  • Fold-ASPP provides richer multiscale context and better localization, outperforming standard ASPP in ablations.
  • A dual-branch decoder with a parallel residual path restores details and preserves boundaries, enhancing boundary accuracy.
  • With stronger backbones, GateNet achieves further performance gains, and the model runs at real-time speeds (~30 fps) on standard hardware.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。