QUICK REVIEW

[Paper Review] Compact Bilinear Pooling

Yang Gao, Oscar Beijbom|arXiv (Cornell University)|Nov 19, 2015

Advanced Neural Network Applications43 references46 citations

TL;DR

This paper proposes two compact bilinear pooling methods—Tensor Sketch (TS) and Random Mapping (RM)—that reduce high-dimensional bilinear features (up to 250,000D) to just 8,192 dimensions with minimal performance loss. By leveraging kernelized analysis of polynomial kernels and enabling end-to-end back-propagation, the method achieves state-of-the-art performance in image classification and few-shot learning while enabling efficient storage and deployment.

ABSTRACT

Bilinear models has been shown to achieve impressive performance on a wide range of visual tasks, such as semantic segmentation, fine grained recognition and face recognition. However, bilinear features are high dimensional, typically on the order of hundreds of thousands to a few million, which makes them impractical for subsequent analysis. We propose two compact bilinear representations with the same discriminative power as the full bilinear representation but with only a few thousand dimensions. Our compact representations allow back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system. The compact bilinear representations are derived through a novel kernelized analysis of bilinear pooling which provide insights into the discriminative power of bilinear pooling, and a platform for further research in compact pooling methods. Experimentation illustrate the utility of the proposed representations for image classification and few-shot learning across several datasets.

Motivation & Objective

To address the high dimensionality of bilinear pooling features, which exceeds 250,000 dimensions and hinders practical deployment in classification, retrieval, and few-shot learning.
To develop compact bilinear representations that preserve the discriminative power of full bilinear pooling while drastically reducing feature dimensionality.
To enable end-to-end back-propagation through the compact pooling layer, supporting joint optimization of the entire recognition pipeline.
To provide a kernelized theoretical framework for bilinear pooling that motivates and justifies the proposed compact methods.
To demonstrate the utility of compact bilinear pooling in real-world scenarios such as image retrieval, embedded deployment, and few-shot learning.

Proposed method

The method employs Tensor Sketch (TS) and Random Mapping (RM) to project high-dimensional bilinear features into a low-dimensional space of 8,192 dimensions using randomized feature maps.
It leverages the connection between bilinear pooling and polynomial kernels, specifically the second-order polynomial kernel, to derive explicit feature maps that are computationally efficient.
The approach uses randomized projections based on the work of Kar (2012) and Pham (2013) for polynomial kernel approximation, adapted to the bilinear pooling setting.
Back-propagation through the compact bilinear layer is efficiently computed using the gradient of the randomized projection, enabling end-to-end training of deep networks.
The global compact descriptor is obtained by sum-pooling the compact features across spatial locations after applying the sketch transformation to each activation map.
The method is implemented in Caffe and MatConvNet, with public code available for reproducibility and integration.

Experimental results

Research questions

RQ1Can bilinear pooling features be compressed to a few thousand dimensions without significant loss in discriminative power?
RQ2Can compact bilinear pooling be integrated into deep neural networks with end-to-end back-propagation for joint optimization?
RQ3Does the kernelized interpretation of bilinear pooling provide a principled basis for deriving compact representations?
RQ4How does compact bilinear pooling compare to state-of-the-art methods like Fisher vectors and fully connected pooling in image classification and few-shot learning?
RQ5Can compact bilinear pooling improve performance in low-data regimes such as few-shot learning?

Key findings

The compact bilinear pooling method using Tensor Sketch (TS) achieves 32.29% error rate on the CUB-200-2011 texture classification dataset, outperforming Fisher vectors and matching the performance of full bilinear pooling with only 8,192 dimensions.
On the MIT Indoor scene dataset, TS achieved 1.06% error rate, outperforming Fisher vectors by 2.09% and matching full bilinear pooling with 96.5% compression.
In few-shot learning with one sample per class on CUB, TS achieved 15.5% accuracy, a 2.9% absolute improvement over full bilinear pooling (12.7%), demonstrating superior generalization in low-data regimes.
The performance gap between full bilinear pooling and TS remained stable at 2.5% even with three shots per class, indicating consistent gains from lower-dimensional features.
Fine-tuning degraded performance for full and compact bilinear pooling, suggesting that high-dimensional representations may be more sensitive to overfitting in small datasets.
The method enables a 96.5% reduction in feature dimensionality (from 250,000D to 8,192D), drastically reducing model parameters and storage requirements for deployment and retrieval.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.