Skip to main content
QUICK REVIEW

[Paper Review] Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective

Jing Zhang, Tong Zhang|arXiv (Cornell University)|Mar 29, 2018
Visual Attention and Saliency Detection42 references65 citations
TL;DR

The paper proposes an end-to-end deep saliency detector trained without human annotations by learning from multiple noisy unsupervised saliency maps, using joint latent saliency prediction and explicit noise modeling.

ABSTRACT

The success of current deep saliency detection methods heavily depends on the availability of large-scale supervision in the form of per-pixel labeling. Such supervision, while labor-intensive and not always possible, tends to hinder the generalization ability of the learned models. By contrast, traditional handcrafted features based unsupervised saliency detection methods, even though have been surpassed by the deep supervised methods, are generally dataset-independent and could be applied in the wild. This raises a natural question that "Is it possible to learn saliency maps without using labeled data while improving the generalization ability?". To this end, we present a novel perspective to unsupervised saliency detection through learning from multiple noisy labeling generated by "weak" and "noisy" unsupervised handcrafted saliency methods. Our end-to-end deep learning framework for unsupervised saliency detection consists of a latent saliency prediction module and a noise modeling module that work collaboratively and are optimized jointly. Explicit noise modeling enables us to deal with noisy saliency maps in a probabilistic way. Extensive experimental results on various benchmarking datasets show that our model not only outperforms all the unsupervised saliency methods with a large margin but also achieves comparable performance with the recent state-of-the-art supervised deep saliency methods.

Motivation & Objective

  • Motivate unsupervised saliency learning to improve generalization without pixel-level labels.
  • Leverage multiple unsupervised saliency maps as noisy labels to train a deep model.
  • Jointly optimize a latent saliency predictor and a noise model in an end-to-end framework.

Proposed method

  • Two-module architecture: a latent saliency prediction module (FCN/DeepLab-based) and a noise modeling module.
  • Model each handcrafted unsupervised label as y_i^j = y_bar_i + n_i^j with n_i^j drawn from a pixel-wise zero-mean Gaussian q_i(Σ).
  • Loss combines saliency prediction loss (cross-entropy between predicted and noisy labels) and noise loss (KL divergence between q_i and empirical noise).
  • Noise variances are updated per image via KL-based updates, enabling iterative refinement across rounds.
  • Training uses DeepLab/ResNet-101 with end-to-end optimization; testing uses the latent predicted saliency map without the noise module.
  • Theoretic and practical design choices include truncation of outputs to [0,1], round-based noise updates, and SGD with momentum.

Experimental results

Research questions

  • RQ1Can saliency maps be learned from multiple noisy, unsupervised labels without human annotations?
  • RQ2Does explicit noise modeling improve the quality of unsupervised deep saliency detection compared to naive fusion or weak supervision?
  • RQ3How many iterative rounds are needed for convergence between the latent saliency predictor and the noise model?
  • RQ4How does the proposed unsupervised method compare to supervised deep saliency methods and traditional unsupervised methods across benchmark datasets?

Key findings

  • The method outperforms existing unsupervised saliency methods by a wide margin.
  • It achieves performance highly competitive with state-of-the-art supervised saliency detectors on benchmark datasets.
  • Ablation shows that alternating updates of the latent predictor and noise model improve performance over rounds, converging after several iterations.
  • The approach yields strong results across seven benchmarking datasets and various evaluation metrics (MAE, F-measure, PR).
  • Qualitative results illustrate robust salient object recovery in challenging scenarios (low contrast, complex backgrounds).

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.