[论文解读] Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction
引入 AMH-Net,一种两级层次 CNN,带 Attention-Gated CRFs,用于融合多尺度特征进行轮廓检测,在 BSDS500 和 NYUDv2 上达到最先进的结果。
Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection. This paper presents a novel approach for predicting contours which advances the state of the art in two fundamental aspects, i.e. multi-scale feature generation and fusion. Different from previous works directly consider- ing multi-scale feature maps obtained from the inner layers of a primary CNN architecture, we introduce a hierarchical deep model which produces more rich and complementary representations. Furthermore, to refine and robustly fuse the representations learned at different scales, the novel Attention-Gated Conditional Random Fields (AG-CRFs) are proposed. The experiments ran on two publicly available datasets (BSDS500 and NYUDv2) demonstrate the effectiveness of the latent AG-CRF model and of the overall hierarchical framework.
研究动机与目标
- Motivate the use of richer, more complementary multi-scale representations for contour prediction beyond simple concatenation or averaging.
- Propose Attention-Gated Conditional Random Fields (AG-CRFs) to robustly fuse and refine multi-scale features.
- Integrate AG-CRFs into a two-level hierarchical CNN (AMH-Net) and train end-to-end with deep supervision.
- Demonstrate improvements over state-of-the-art methods on BSDS500 and NYUDv2 datasets.
提出的方法
- Define a set of S multi-scale feature maps from a front-end CNN.
- Learn latent multi-scale representations h_s with gates g that control cross-scale information flow (AG-CRFs).
- Use a Gaussian unary potential linking h_s to observed features f_s and a bilinear gated pairwise potential between scales.
- Infer g and H via mean-field updates; gate expectations act as attention to modulate inter-scale message passing.
- Provide two variants: fully-latent FLAG-CRFs and partially-latent PLAG-CRFs where attention can be derived from observed features or latent variables.
- Implement AG-CRF updates as neural network layers with convolutional message passing, attention estimation, and gated fusion steps.
- Construct AMH-Net by fusing three representations per layer (D upsampling, C same size, M downsampling) to obtain richer intra-layer and inter-layer multi-scale features.
- Train end-to-end with deep supervision and a class-imbalance aware cross-entropy loss.
- Fuse scale predictions during testing by averaging the outputs from multiple AG-CRF classifiers.
实验结果
研究问题
- RQ1Can modeling and leveraging complex relationships between multi-scale CNN features via an attention-gated CRF improve contour prediction over simple fusion strategies?
- RQ2Does a two-level hierarchical multi-scale network paired with AG-CRFs yield richer representations and better contour accuracy on standard benchmarks?
- RQ3How do different AG-CRF variants (FLAG-CRFs vs PLAG-CRFs) impact contour detection performance?
- RQ4What is the contribution of deep supervision and ablations to the final performance?
主要发现
| 数据集 | 方法 | ODS | OIS | AP |
|---|---|---|---|---|
| BSDS500 | AMH-Net (fusion) | .798 | .829 | .869 |
| BSDS500 | HED (RGB) | .788 | .808 | .840 |
| BSDS500 | COB | .793 | .820 | .859 |
| BSDS500 | DeepContour | .756 | .773 | .797 |
| BSDS500 | AMH-Net (FLAG-CRFs) | .??? | ??? | ??? |
| NYUDv2 | AMH-Net RGB | .744 | .758 | .765 |
| NYUDv2 | AMH-Net HHA | .716 | .729 | .734 |
| NYUDv2 | AMH-Net RGB+HHA | .771 | .786 | .802 |
- AMH-Net (fusion) achieves an ODS of 0.798, outperforming prior methods on BSDS500.
- On NYUDv2, AMH-Net with RGB+HHA achieves up to 0.771 (ODS) and 0.802 (AP) overall.
- FLAG-CRFs consistently outperform PLAG-CRFs and non-attention CRF baselines in ODS, OIS, and AP.
- Ablation studies show that removing AG-CRFs or deep supervision degrades performance, confirming the effectiveness of hierarchical multi-scale fusion and attention.
- AMH-Net with RGB+HHA substantially surpasses traditional features and previous CNN-based contour detectors on both datasets.
- The proposed approach uses only three scales yet achieves state-of-the-art results, suggesting room for further gains with additional scales.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。