QUICK REVIEW

[论文解读] Fully Convolutional Attention Networks for Fine-Grained Recognition

Xiao Liu, Tian Xia|arXiv (Cornell University)|Mar 22, 2016

Domain Adaptation and Few-Shot Learning参考文献 35被引用 128

一句话总结

FCANs 使用强化学习与一个全卷积网络来定位多個具有辨識性的部件，无需部件注释，使训练/测试更快并在细粒度基准上具有竞争力的准确性。

ABSTRACT

Fine-grained recognition is challenging due to its subtle local inter-class differences versus large intra-class variations such as poses. A key to address this problem is to localize discriminative parts to extract pose-invariant features. However, ground-truth part annotations can be expensive to acquire. Moreover, it is hard to define parts for many fine-grained classes. This work introduces Fully Convolutional Attention Networks (FCANs), a reinforcement learning framework to optimally glimpse local discriminative regions adaptive to different fine-grained domains. Compared to previous methods, our approach enjoys three advantages: 1) the weakly-supervised reinforcement learning procedure requires no expensive part annotations; 2) the fully-convolutional architecture speeds up both training and testing; 3) the greedy reward strategy accelerates the convergence of the learning. We demonstrate the effectiveness of our method with extensive experiments on four challenging fine-grained benchmark datasets, including CUB-200-2011, Stanford Dogs, Stanford Cars and Food-101.

研究动机与目标

Motivate fine-grained recognition where small inter-class differences and large intra-class variations exist.
Eliminate dependence on expensive ground-truth part annotations by using weakly supervised learning.
Propose a fully convolutional attention framework that reuses feature maps for efficiency during training and testing.
Enable localization of multiple discriminative parts with greedy, step-wise rewards to accelerate training.

提出的方法

Propose FCANs consisting of a shared feature network, an attention network producing multiple part score maps, and a per-part classification network.
Use a Markov Decision Process formulation where actions are attention locations and rewards reflect classification quality.
Train with REINFORCE-based policy gradients using a greedy reward strategy that grants intermediate rewards when accuracy improves.
Reuse convolutional feature maps across time steps to avoid recomputing features (Fast-RCNN-like sharing).
Crop high-resolution regions around attended locations for final classification while keeping a shared representation for efficiency.

实验结果

研究问题

RQ1Can weakly supervised attention learn discriminative parts for fine-grained recognition without part annotations?
RQ2Does a fully convolutional attention architecture improve efficiency over recurrent attention models while maintaining accuracy?
RQ3How many attentions and what reward strategy yield the best accuracy and training convergence across datasets?

主要发现

Dataset	Accuracy (%)
CUB-200-2011	84.3
Stanford Dogs	88.9
Stanford Cars	91.5
Food-101	86.3

Achieves competitive fine-grained accuracy on four benchmarks without using part annotations at test time.
Outperforms prior RL-based attention models in both accuracy and efficiency due to fully convolutional feature reuse.
Two attentions provide a good trade-off between accuracy and computational cost, with diminishing gains beyond two attentions.
Greedy reward strategy accelerates training convergence and improves final accuracy compared to only-end rewards.
Training with shared feature maps and Fast-RCNN-like region extraction significantly reduces computation and speeds up testing.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。