QUICK REVIEW

[論文レビュー] Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

Zhirong Wu, Yuanjun Xiong|arXiv (Cornell University)|May 5, 2018

Domain Adaptation and Few-Shot Learning参考文献 46被引用数 172

ひとこと要約

教師なしで各インスタンスを非パラメトリックに識別することにより画像表現を学習し、メモリバンク埋め込みとノイズ対照推定を使用して、ImageNetとPlacesで強力な結果を達成し、半教師あり学習や物体検出への転移も良好。

ABSTRACT

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether this observation can be extended beyond the conventional domain of supervised learning: Can we learn a good feature representation that captures apparent similarity among instances, instead of classes, by merely asking the feature to be discriminative of individual instances? We formulate this intuition as a non-parametric classification problem at the instance-level, and use noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes. Our experimental results demonstrate that, under unsupervised learning settings, our method surpasses the state-of-the-art on ImageNet classification by a large margin. Our method is also remarkable for consistently improving test performance with more training data and better network architectures. By fine-tuning the learned feature, we further obtain competitive results for semi-supervised learning and object detection tasks. Our non-parametric model is highly compact: With 128 features per image, our method requires only 600MB storage for a million images, enabling fast nearest neighbour retrieval at the run time.

研究の動機と目的

個々のインスタンスを意味クラスではなく識別することにより、画像表現を学習する動機づけ。
大規模なインスタンス識別のためのスケーラブルな非パラメトリックソフトマックス分類器を開発する。
ノイズ対照推定と近接正則化で学習を安定化させる。
学習された特徴が半教師あり学習と物体検出へ一般化することを示す。
学習された128次元埋め込みの効率性とコンパクト性を示す。

提案手法

インスタンスレベルの識別を、L2正規化特徴量を用いた全トレーニングインスタンス上の非パラメトリックソフトマックスとして定式化する。
クラスウェイトを個別に保持せず、インスタンス埋め込みのメモリバンクVを維持してP(i|v)を計算する。
ノイズ対照推定(NCE)を用いてソフトマックスをノイズ分布で近似し、コストをサンプルごとにO(n)からO(1)へ削減する。
反復ごとの表現の大きな変化を抑制するペナルティ付き近接正則化を適用して最適化を安定化させる。
テスト画像はメモリバンク埋め込みとのコサイン類似度で最近傍kを用いて分類し、訓練と推論の整合性を確保する。

Figure 1 : Supervised learning results that motivate our unsupervised approach. For an image from class leopard , the classes that get highest responses from a trained neural net classifier are all visually correlated, e.g., jaguar and cheetah . It is not the semantic labeling, but the apparent simi

実験結果

リサーチクエスチョン

RQ1無監督設定で個々のインスタンスを識別することは、見かけ上のインスタンス類似性を保持する特徴空間を学習できるのか。
RQ2メモリバンクを用いた非パラメトリックソフトマックスは、無監督の特徴学習においてパラメトリックソフトマックスを上回るのか。
RQ3NCEと近接正則化は訓練の安定性と特徴品質にどう影響するのか。
RQ4学習された特徴は半教師付きタスクや物体検出へ良く転移するのか。

主な発見

インスタンスメモリバンクを備えた非パラメトリックソフトマックスは、パラメトリックソフトマックスよりCIFAR-10の分類性能を顕著に向上させる。
ImageNetでは、線形評価でトップ1精度46.5%、kNNでアーキテクチャ間にわたり41.0–46.5%を達成し、いくつかの無監督ベースラインを上回る。
この手法はPlaces205へも強い一般化を示し、プロトコルとアーキテクチャに依存して41.6–45.5%のトップ1を達成。
埋め込みサイズが約128次元で、100万画像あたり約600MBのコンパクトな表現と、最近傍検索が高速（約20ms/画像）を実現。
ラベル付きデータが限られた半教師あり学習は大幅に恩恵を受け、同じ少数のラベル付きサブセットで訓練された教師あり学習をしばしば上回る。
物体検出では、PASCAL VOC 2007で競争力のあるmAPを達成し、より深いネットワーク（例：ResNet-50で65.4%のmAP）とともに改善する。

Figure 2 : The pipeline of our unsupervised feature learning approach. We use a backbone CNN to encode each image as a feature vector, which is projected to a $128$ -dimensional space and L2 normalized. The optimal feature embedding is learned via instance-level discrimination, which tries to maxima

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。