Skip to main content
QUICK REVIEW

[论文解读] Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

Dan Xu, Wanli Ouyang|arXiv (Cornell University)|Jan 1, 2018
Advanced Image Fusion Techniques参考文献 42被引用 103
一句话总结

引入 AMH-Net,一种两级层次 CNN,带 Attention-Gated CRFs,用于融合多尺度特征进行轮廓检测,在 BSDS500 和 NYUDv2 上达到最先进的结果。

ABSTRACT

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection. This paper presents a novel approach for predicting contours which advances the state of the art in two fundamental aspects, i.e. multi-scale feature generation and fusion. Different from previous works directly consider- ing multi-scale feature maps obtained from the inner layers of a primary CNN architecture, we introduce a hierarchical deep model which produces more rich and complementary representations. Furthermore, to refine and robustly fuse the representations learned at different scales, the novel Attention-Gated Conditional Random Fields (AG-CRFs) are proposed. The experiments ran on two publicly available datasets (BSDS500 and NYUDv2) demonstrate the effectiveness of the latent AG-CRF model and of the overall hierarchical framework.

研究动机与目标

  • Motivate the use of richer, more complementary multi-scale representations for contour prediction beyond simple concatenation or averaging.
  • Propose Attention-Gated Conditional Random Fields (AG-CRFs) to robustly fuse and refine multi-scale features.
  • Integrate AG-CRFs into a two-level hierarchical CNN (AMH-Net) and train end-to-end with deep supervision.
  • Demonstrate improvements over state-of-the-art methods on BSDS500 and NYUDv2 datasets.

提出的方法

  • Define a set of S multi-scale feature maps from a front-end CNN.
  • Learn latent multi-scale representations h_s with gates g that control cross-scale information flow (AG-CRFs).
  • Use a Gaussian unary potential linking h_s to observed features f_s and a bilinear gated pairwise potential between scales.
  • Infer g and H via mean-field updates; gate expectations act as attention to modulate inter-scale message passing.
  • Provide two variants: fully-latent FLAG-CRFs and partially-latent PLAG-CRFs where attention can be derived from observed features or latent variables.
  • Implement AG-CRF updates as neural network layers with convolutional message passing, attention estimation, and gated fusion steps.
  • Construct AMH-Net by fusing three representations per layer (D upsampling, C same size, M downsampling) to obtain richer intra-layer and inter-layer multi-scale features.
  • Train end-to-end with deep supervision and a class-imbalance aware cross-entropy loss.
  • Fuse scale predictions during testing by averaging the outputs from multiple AG-CRF classifiers.

实验结果

研究问题

  • RQ1Can modeling and leveraging complex relationships between multi-scale CNN features via an attention-gated CRF improve contour prediction over simple fusion strategies?
  • RQ2Does a two-level hierarchical multi-scale network paired with AG-CRFs yield richer representations and better contour accuracy on standard benchmarks?
  • RQ3How do different AG-CRF variants (FLAG-CRFs vs PLAG-CRFs) impact contour detection performance?
  • RQ4What is the contribution of deep supervision and ablations to the final performance?

主要发现

数据集方法ODSOISAP
BSDS500AMH-Net (fusion).798.829.869
BSDS500HED (RGB).788.808.840
BSDS500COB.793.820.859
BSDS500DeepContour.756.773.797
BSDS500AMH-Net (FLAG-CRFs).?????????
NYUDv2AMH-Net RGB.744.758.765
NYUDv2AMH-Net HHA.716.729.734
NYUDv2AMH-Net RGB+HHA.771.786.802
  • AMH-Net (fusion) achieves an ODS of 0.798, outperforming prior methods on BSDS500.
  • On NYUDv2, AMH-Net with RGB+HHA achieves up to 0.771 (ODS) and 0.802 (AP) overall.
  • FLAG-CRFs consistently outperform PLAG-CRFs and non-attention CRF baselines in ODS, OIS, and AP.
  • Ablation studies show that removing AG-CRFs or deep supervision degrades performance, confirming the effectiveness of hierarchical multi-scale fusion and attention.
  • AMH-Net with RGB+HHA substantially surpasses traditional features and previous CNN-based contour detectors on both datasets.
  • The proposed approach uses only three scales yet achieves state-of-the-art results, suggesting room for further gains with additional scales.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。