QUICK REVIEW

[Paper Review] Learning to Hash with Binary Deep Neural Network

Thanh-Toan Do, Anh-Dzung Doan|arXiv (Cornell University)|Jul 18, 2016

Advanced Image and Video Retrieval Techniques34 citations

TL;DR

This paper proposes Binary Deep Neural Networks (BDNN) for unsupervised and supervised hashing, directly generating binary codes via a constrained hidden layer to enforce binary, independent, and balanced codes. By avoiding relaxation of the binary constraint and using alternating optimization with careful relaxation, the method achieves state-of-the-art performance on CIFAR10, MNIST, and NUS-WIDE, outperforming prior deep hashing and CNN-based methods in mAP and precision@2.

ABSTRACT

This work proposes deep network models and learning algorithms for unsupervised and supervised binary hashing. Our novel network design constrains one hidden layer to directly output the binary codes. This addresses a challenging issue in some previous works: optimizing non-smooth objective functions due to binarization. Moreover, we incorporate independence and balance properties in the direct and strict forms in the learning. Furthermore, we include similarity preserving property in our objective function. Our resulting optimization with these binary, independence, and balance constraints is difficult to solve. We propose to attack it with alternating optimization and careful relaxation. Experimental results on three benchmark datasets show that our proposed methods compare favorably with the state of the art.

Motivation & Objective

To address the challenge of optimizing non-smooth, NP-hard binary hashing problems by directly constraining a network layer to output binary codes.
To enforce strict independence and balance of binary codes during training without relaxation or approximation.
To integrate similarity preservation into the objective function for effective visual retrieval.
To develop a scalable optimization framework using alternating optimization and relaxation for the hard discrete constraints.
To extend the unsupervised method to supervised hashing by leveraging label information for semantic similarity preservation.

Proposed method

The network architecture includes a dedicated hidden layer that directly outputs binary codes in {-1, 1}^L, eliminating the need for post-binarization via sgn or thresholding.
The objective function includes terms for similarity preservation, strict independence (uncorrelated bits), and balance (50% probability for each bit to be ±1).
A novel optimization strategy uses alternating optimization between network weights and binary codes, with relaxation of the binary constraint via continuous relaxation of the sign function.
The method employs a reparameterization trick to handle the non-differentiable sgn function by approximating it during backpropagation.
For supervised hashing, the method incorporates label-based pairwise similarity into the loss function to preserve semantic relationships.
The training process alternates between updating network weights via stochastic gradient descent and projecting the continuous outputs to the nearest binary codes.

Experimental results

Research questions

RQ1Can direct binary output from a deep network layer improve hashing performance compared to relaxed or post-processed binarization?
RQ2How can independence and balance of binary codes be enforced in a differentiable and strict manner during end-to-end training?
RQ3Does incorporating all three properties—similarity preservation, independence, and balance—simultaneously lead to better retrieval accuracy?
RQ4Can the proposed optimization framework effectively handle the NP-hard nature of discrete hashing with binary, independent, and balanced constraints?
RQ5How does the proposed method compare to state-of-the-art deep hashing and CNN-based hashing methods in terms of mAP and precision@2?

Key findings

On CIFAR10, SH-BDNN achieves 69.62% mAP at L=32, outperforming SDH (67.63%), KSH (65.76%), and BRE (44.89%) at the same code length.
On MNIST, SH-BDNN achieves 95.51% precision@2 at L=32, significantly outperforming SDH (94.43%) and ITQ-CCA (84.57%) at low code lengths.
Compared to CNN-based methods DSRH and DRSCH, SH-BDNN achieves 66.22% mAP at L=32, exceeding DRSCH (63.05%) and DSRH (61.77%) on CIFAR10.
The proposed method achieves 54.12% mAP at L=8 on CIFAR10, outperforming SDH (31.60%) and BRE (23.84%) at the same low bit length.
The method maintains strong performance across all code lengths, with consistent superiority in both mAP and precision@2 on all benchmark datasets.
The ablation study confirms that enforcing strict independence and balance improves retrieval accuracy, especially at low bit lengths.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.