QUICK REVIEW

[论文解读] Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

Dan Xu, Wanli Ouyang|arXiv (Cornell University)|Jan 1, 2018

Advanced Image Fusion Techniques参考文献 42被引用 103

一句话总结

引入 AMH-Net，一种两级层次 CNN，带 Attention-Gated CRFs，用于融合多尺度特征进行轮廓检测，在 BSDS500 和 NYUDv2 上达到最先进的结果。

ABSTRACT

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection. This paper presents a novel approach for predicting contours which advances the state of the art in two fundamental aspects, i.e. multi-scale feature generation and fusion. Different from previous works directly consider- ing multi-scale feature maps obtained from the inner layers of a primary CNN architecture, we introduce a hierarchical deep model which produces more rich and complementary representations. Furthermore, to refine and robustly fuse the representations learned at different scales, the novel Attention-Gated Conditional Random Fields (AG-CRFs) are proposed. The experiments ran on two publicly available datasets (BSDS500 and NYUDv2) demonstrate the effectiveness of the latent AG-CRF model and of the overall hierarchical framework.

研究动机与目标

Motivate the use of richer, more complementary multi-scale representations for contour prediction beyond simple concatenation or averaging.
Propose Attention-Gated Conditional Random Fields (AG-CRFs) to robustly fuse and refine multi-scale features.
Integrate AG-CRFs into a two-level hierarchical CNN (AMH-Net) and train end-to-end with deep supervision.
Demonstrate improvements over state-of-the-art methods on BSDS500 and NYUDv2 datasets.

提出的方法

Define a set of S multi-scale feature maps from a front-end CNN.
Learn latent multi-scale representations h_s with gates g that control cross-scale information flow (AG-CRFs).
Use a Gaussian unary potential linking h_s to observed features f_s and a bilinear gated pairwise potential between scales.
Infer g and H via mean-field updates; gate expectations act as attention to modulate inter-scale message passing.
Provide two variants: fully-latent FLAG-CRFs and partially-latent PLAG-CRFs where attention can be derived from observed features or latent variables.
Implement AG-CRF updates as neural network layers with convolutional message passing, attention estimation, and gated fusion steps.
Construct AMH-Net by fusing three representations per layer (D upsampling, C same size, M downsampling) to obtain richer intra-layer and inter-layer multi-scale features.
Train end-to-end with deep supervision and a class-imbalance aware cross-entropy loss.
Fuse scale predictions during testing by averaging the outputs from multiple AG-CRF classifiers.

实验结果

研究问题

RQ1Can modeling and leveraging complex relationships between multi-scale CNN features via an attention-gated CRF improve contour prediction over simple fusion strategies?
RQ2Does a two-level hierarchical multi-scale network paired with AG-CRFs yield richer representations and better contour accuracy on standard benchmarks?
RQ3How do different AG-CRF variants (FLAG-CRFs vs PLAG-CRFs) impact contour detection performance?
RQ4What is the contribution of deep supervision and ablations to the final performance?

主要发现

数据集	方法	ODS	OIS	AP
BSDS500	AMH-Net (fusion)	.798	.829	.869
BSDS500	HED (RGB)	.788	.808	.840
BSDS500	COB	.793	.820	.859
BSDS500	DeepContour	.756	.773	.797
BSDS500	AMH-Net (FLAG-CRFs)	.???	???	???
NYUDv2	AMH-Net RGB	.744	.758	.765
NYUDv2	AMH-Net HHA	.716	.729	.734
NYUDv2	AMH-Net RGB+HHA	.771	.786	.802

AMH-Net (fusion) achieves an ODS of 0.798, outperforming prior methods on BSDS500.
On NYUDv2, AMH-Net with RGB+HHA achieves up to 0.771 (ODS) and 0.802 (AP) overall.
FLAG-CRFs consistently outperform PLAG-CRFs and non-attention CRF baselines in ODS, OIS, and AP.
Ablation studies show that removing AG-CRFs or deep supervision degrades performance, confirming the effectiveness of hierarchical multi-scale fusion and attention.
AMH-Net with RGB+HHA substantially surpasses traditional features and previous CNN-based contour detectors on both datasets.
The proposed approach uses only three scales yet achieves state-of-the-art results, suggesting room for further gains with additional scales.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。