QUICK REVIEW

[论文解读] Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

Lingqiao Liu, Chunhua Shen|arXiv (Cornell University)|Nov 24, 2014

Advanced Image and Video Retrieval Techniques参考文献 23被引用 67

一句话总结

本文提出了一种基于稀疏编码的费雪向量编码（SCFVC）方法，通过从子空间中采样高斯均值来建模高维局部特征，实现了基于稀疏编码的高效推理。SCFVC在高维特征上的表现显著优于传统的基于高斯混合模型（GMM）的费雪向量编码，在通用物体、室内场景和细粒度图像分类任务中均取得了最先进（SOTA）的性能结果。

ABSTRACT

Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, % FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

研究动机与目标

解决高斯混合模型（GMM）基于的费雪向量编码在建模高维局部特征时因建模能力不足而带来的局限性。
克服通过增加GMM组件数量来建模高维特征空间所带来的计算不可行性。
开发一种可扩展且高效的GMM-FVC替代方案，同时保持对高维特征的高判别能力。
在多种图像分类任务中，通过结合深度卷积神经网络（CNN）激活作为局部特征，证明SCFVC的优越性。

提出的方法

提出一种生成模型，其中每个局部特征从其均值从低维子空间中采样的高斯分布中抽取。
将该模型近似为一个稀疏编码问题，从而可使用标准稀疏编码求解器进行学习和推理。
通过计算对数似然关于模型参数的梯度，推导出费雪向量编码，从而得到SCFVC。
利用预训练的深度卷积神经网络（CNN）特征作为高维局部描述符，用于图像表征。
应用SCFVC对基于CNN的局部特征进行编码，形成完整的图像分类流程。
利用高效的稀疏编码算法（例如，学习型FISTA、正交匹配追踪）以确保计算上的可行性。

实验结果

研究问题

RQ1传统的基于GMM的费雪向量编码能否有效建模如深度CNN激活等高维局部特征？
RQ2将高斯均值建模为子空间中的点，是否相比标准GMM能提升对高维特征的表征能力？
RQ3所提出的模型能否被重新表述为稀疏编码问题，以实现高效的学习与推理？
RQ4在多种图像分类基准上，SCFVC在编码高维特征时是否优于GMM-FVC？
RQ5SCFVC能否在通用物体、室内场景和细粒度图像分类任务中达到最先进性能？

主要发现

在使用100个基函数和1000维CNN特征时，SCFVC在MIT-67数据集上达到68.1%的准确率，显著优于使用400个混合成分和300维特征的GMM-FVC（64.0%）以及使用1000个混合成分和100维特征的GMM-FVC（60.8%）。
对于低维特征（如100D），SCFVC与GMM-FVC性能相当，但SCFVC在从100D到1000D的性能提升达7%，而GMM-FVC仅提升4%。
通过主成分分析（PCA）降低高维特征并增加GMM组件数量，无法恢复判别能力，表明高维特征保留了关键信息。
在细粒度鸟类分类任务中，SCFVC优于使用部件信息的方法（DPD+CNN+LogReg），表明结合SCFVC编码的深度特征比部件模型更具有效性。
即使基函数数量较少（如100个），该方法仍保持强劲性能，体现出其高效性与可扩展性。
通过使用近似稀疏编码算法，计算效率得以保持，使SCFVC在高维建模复杂度增加的情况下仍具实用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。