QUICK REVIEW

[论文解读] Binary Generative Adversarial Networks for Image Retrieval

Jingkuan Song|arXiv (Cornell University)|Aug 8, 2017

Advanced Image and Video Retrieval Techniques被引用 84

一句话总结

一个使用二进制约束GAN和基于连续信号激活的二值哈希方法（BGAN），学习用于图像检索的二进制码，在 CIFAR-10、NUS-WIDE 和 Flickr 上实现强的 mAP。

ABSTRACT

The most striking successes in image retrieval using deep hashing have mostly involved discriminative models, which require labels. In this paper, we use binary generative adversarial networks (BGAN) to embed images to binary codes in an unsupervised way. By restricting the input noise variable of generative adversarial networks (GAN) to be binary and conditioned on the features of each input image, BGAN can simultaneously learn a binary representation per image, and generate an image plausibly similar to the original one. In the proposed framework, we address two main problems: 1) how to directly generate binary codes without relaxation? 2) how to equip the binary representation with the ability of accurate image retrieval? We resolve these problems by proposing new sign-activation strategy and a loss function steering the learning process, which consists of new models for adversarial loss, a content loss, and a neighborhood structure loss. Experimental results on standard datasets (CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN significantly outperforms existing hashing methods by up to 107\% in terms of~mAP (See Table tab.res.map.comp) Our anonymous code is available at: https://github.com/htconquer/BGAN.

研究动机与目标

激励无标签的可扩展图像检索的无监督二进制哈希。
提出 BGAN 在生成可信图像的同时直接学习 L 位二进制码。
设计一个损失函数，将邻域结构、内容和对抗目标结合起来，以优化用于检索的二进制码。
证明直接二进制优化相较于放松哈希方法在性能上有所提升。

提出的方法

提出一个四部分的 BGAN 架构：编码器、哈希层、生成器和判别器。
使用基于 KNN 的邻域结构在无监督条件下引导二进制码学习。
使用带有连续近似（app）的符号激活，以实现直接二进制码而无需放松。
将损失定义为邻域结构损失、内容（感知）损失和对抗损失的加权和。
通过带分阶段 beta 进程的 SGD 进行训练，以逐步逼近符号函数，收敛到 sgn(z)。

实验结果

研究问题

RQ1RQ1：BGAN 的每个组件如何影响检索性能？
RQ2RQ2：直接的二进制优化（无放松）是否能提升哈希性能？
RQ3RQ3：BGAN 是否显著优于最先进的哈希方法？
RQ4RQ4：BGAN在大规模检索中的效率和实用性如何？

主要发现

BGAN 在标准数据集上优于现有的无监督哈希方法。
整合这三种损失分量（邻域、内容、对抗）可获得最佳检索性能。
通过基于连续的符号激活实现的直接二进制优化，相较于放松或两步法获得改进。
与若干基线相比，该架构在 CIFAR-10、NUS-WIDE 和 Flickr 的不同位长度上展示了强劲的 mAP 增益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。