Skip to main content
QUICK REVIEW

[论文解读] Fully Convolutional Attention Networks for Fine-Grained Recognition

Xiao Liu, Tian Xia|arXiv (Cornell University)|Mar 22, 2016
Domain Adaptation and Few-Shot Learning参考文献 35被引用 128
一句话总结

FCANs 使用强化学习与一个全卷积网络来定位多個具有辨識性的部件,无需部件注释,使训练/测试更快并在细粒度基准上具有竞争力的准确性。

ABSTRACT

Fine-grained recognition is challenging due to its subtle local inter-class differences versus large intra-class variations such as poses. A key to address this problem is to localize discriminative parts to extract pose-invariant features. However, ground-truth part annotations can be expensive to acquire. Moreover, it is hard to define parts for many fine-grained classes. This work introduces Fully Convolutional Attention Networks (FCANs), a reinforcement learning framework to optimally glimpse local discriminative regions adaptive to different fine-grained domains. Compared to previous methods, our approach enjoys three advantages: 1) the weakly-supervised reinforcement learning procedure requires no expensive part annotations; 2) the fully-convolutional architecture speeds up both training and testing; 3) the greedy reward strategy accelerates the convergence of the learning. We demonstrate the effectiveness of our method with extensive experiments on four challenging fine-grained benchmark datasets, including CUB-200-2011, Stanford Dogs, Stanford Cars and Food-101.

研究动机与目标

  • Motivate fine-grained recognition where small inter-class differences and large intra-class variations exist.
  • Eliminate dependence on expensive ground-truth part annotations by using weakly supervised learning.
  • Propose a fully convolutional attention framework that reuses feature maps for efficiency during training and testing.
  • Enable localization of multiple discriminative parts with greedy, step-wise rewards to accelerate training.

提出的方法

  • Propose FCANs consisting of a shared feature network, an attention network producing multiple part score maps, and a per-part classification network.
  • Use a Markov Decision Process formulation where actions are attention locations and rewards reflect classification quality.
  • Train with REINFORCE-based policy gradients using a greedy reward strategy that grants intermediate rewards when accuracy improves.
  • Reuse convolutional feature maps across time steps to avoid recomputing features (Fast-RCNN-like sharing).
  • Crop high-resolution regions around attended locations for final classification while keeping a shared representation for efficiency.

实验结果

研究问题

  • RQ1Can weakly supervised attention learn discriminative parts for fine-grained recognition without part annotations?
  • RQ2Does a fully convolutional attention architecture improve efficiency over recurrent attention models while maintaining accuracy?
  • RQ3How many attentions and what reward strategy yield the best accuracy and training convergence across datasets?

主要发现

DatasetAccuracy (%)
CUB-200-201184.3
Stanford Dogs88.9
Stanford Cars91.5
Food-10186.3
  • Achieves competitive fine-grained accuracy on four benchmarks without using part annotations at test time.
  • Outperforms prior RL-based attention models in both accuracy and efficiency due to fully convolutional feature reuse.
  • Two attentions provide a good trade-off between accuracy and computational cost, with diminishing gains beyond two attentions.
  • Greedy reward strategy accelerates training convergence and improves final accuracy compared to only-end rewards.
  • Training with shared feature maps and Fast-RCNN-like region extraction significantly reduces computation and speeds up testing.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。