QUICK REVIEW

[论文解读] Deep Learning for Fine-Grained Image Analysis: A Survey

Xiu-Shen Wei, Jianxin Wu|arXiv (Cornell University)|Jul 6, 2019

Advanced Image and Video Retrieval Techniques参考文献 49被引用 76

一句话总结

本综述在三个主要任务——识别、检索和生成——中回顾了细粒度图像分析（FGIA）中深度学习的进展，讨论数据集、方法和未来方向。

ABSTRACT

Computer vision (CV) is the process of using machines to understand and analyze imagery, which is an integral branch of artificial intelligence. Among various research areas of CV, fine-grained image analysis (FGIA) is a longstanding and fundamental problem, and has become ubiquitous in diverse real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, \eg, species of birds or models of cars. The small inter-class variations and the large intra-class variations caused by the fine-grained nature makes it a challenging problem. During the booming of deep learning, recent years have witnessed remarkable progress of FGIA using deep learning techniques. In this paper, we aim to give a survey on recent advances of deep learning based FGIA techniques in a systematic way. Specifically, we organize the existing studies of FGIA techniques into three major categories: fine-grained image recognition, fine-grained image retrieval and fine-grained image generation. In addition, we also cover some other important issues of FGIA, such as publicly available benchmark datasets and its related domain specific applications. Finally, we conclude this survey by highlighting several directions and open problems which need be further explored by the community in the future.

研究动机与目标

提供使用深度学习的 FGIA 技术的综合性综述，包括问题背景、数据集和方法族的介绍。
提供一个系统性、分层的 FGIA 在识别、检索和生成方面的进展概览。
讨论 FGIA 在特定领域的应用及实际挑战。
识别 FGIA 社区的未解问题和潜在未来方向。

提出的方法

将 FGIA 技术归纳为识别的三个范式：定位-分类子网、端到端特征编码，以及外部信息的使用。
讨论端到端特征编码方法（例如双线性 CNN 与低维池化）以及定制损失函数。
描述外部信息的使用，如网络数据、多模态数据（文本、知识图谱）以及人机交互方法来提升 FGIA。
总结细粒度图像检索方法，包括监督学习和弱监督损失以及定位策略。
综述通过生成模型（如 CVAE-GAN、AttnGAN）实现的细粒度图像生成，用于类别特定和文本引导的合成。
评述在时尚、零售和重新识别领域的 FGIA 应用。

实验结果

研究问题

RQ1基于深度学习的细粒度图像识别、检索和生成的主要方法是什么？
RQ2基准数据集和监督类型如何影响 FGIA 的进展？
RQ3哪些外部信息与多模态信号最有效地提升 FGIA 的性能？
RQ4在以深度学习为基础的 FGIA 领域，当前面临的挑战与未来方向是什么？

主要发现

深度学习推动了 FGIA 在识别、检索和生成方面的显著进展。
三大 FGIA 识别范式为定位-分类子网、端到端特征编码，以及外部信息的使用。
外部信号如网络数据、文本描述和知识图谱可以提升 FGIA 性能，但也会引入噪声和领域差异，需要慎重处理。
多模态描述和弱监督在 FGIA 中的影响力超过了传统的图像-标签监督。
像 CUB200-2011 和 RPC 这样的基准数据集能够进行系统比较并推动 FGIA 的进展。
生成方法实现细粒度图像生成和文本到图像的合成，扩展了 FGIA 在识别之外的能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。