QUICK REVIEW

[論文レビュー] Do Deep Generative Models Know What They Don't Know?

Eric Nalisnick, Akihiro Matsukawa|arXiv (Cornell University)|Oct 22, 2018

Generative Adversarial Networks and Image Synthesis参考文献 30被引用数 57

ひとこと要約

論文は、最先端の深層生成モデル（flowベース、VAE、PixelCNNを含む）が、訓練データとは異なる分布のデータに対してより高い尤度を割り当てることを示しており（例: CIFAR-10で訓練されたときのSVHN）、密度をアウトオブディストリビューション検出器として用いることの妥当性に挑戦している。流れモデルの原因を分析し、追加の理解なしに異常検知のための密度推定に依存することを警告する。

ABSTRACT

A neural network deployed in the wild may be asked to make predictions for inputs that were drawn from a different distribution than that of the training data. A plethora of work has demonstrated that it is easy to find or synthesize inputs for which a neural network is highly confident yet wrong. Generative models are widely viewed to be robust to such mistaken confidence as modeling the density of the input features can be used to detect novel, out-of-distribution inputs. In this paper we challenge this assumption. We find that the density learned by flow-based models, VAEs, and PixelCNNs cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. Moreover, we find evidence of this phenomenon when pairing several popular image data sets: FashionMNIST vs MNIST, CelebA vs SVHN, ImageNet vs CIFAR-10 / CIFAR-100 / SVHN. To investigate this curious behavior, we focus analysis on flow-based generative models in particular since they are trained and evaluated via the exact marginal likelihood. We find such behavior persists even when we restrict the flows to constant-volume transformations. These transformations admit some theoretical analysis, and we show that the difference in likelihoods can be explained by the location and variances of the data and the model curvature. Our results caution against using the density estimates from deep generative models to identify inputs similar to the training distribution until their behavior for out-of-distribution inputs is better understood.

研究の動機と目的

深層生成モデルが密度で校正される場合、訓練データとは異なる分布からの入力を検出できるかを評価する。
フロー系モデルが時にアウト・オブ・ディストリビューションの画像に対して高い密度を割り当てる理由を examinationする。
尤度の寄与成分を分解して現象の原因を特定する。
体積要素と定体積フローが尤度の振る舞いに与える役割を評価する。

提案手法

FlowベースのGlowモデルを FashionMNIST vs MNIST および CIFAR-10 vs SVHN（CelebA、ImageNet 比較を含む）で訓練する。
訓練データ分布内および分布外のテストセットで、対数尤度（次元あたりビット）を計算・比較する。
change-of-variables による尤度を log p(z) と log|det df/dx| に分解し、寄与項を特定する。
定体積（CV）および非体積保持（NVP）変換を調査して体積効果を制御する。
尤度のギャップをデータの共分散とモデルの曲率に関連づける二次分析を検討する。
現象の頑健性を評価するためにアンサンブルをテストする。

実験結果

リサーチクエスチョン

RQ1現代の深層生成モデルは、訓練データと異なる分布のデータに対して、より高い密度を割り当てることがあり得るか。
RQ2尤度のどの成分（潜在密度 vs 体積変化）が、訓練データと異なる分布のデータに対する密度の優位性を生むのか。
RQ3定体積フロー変換は、アウト・オブ・ディストリビューションの尤度のパラドックスを除去または低減するか。
RQ4データの分散とモデルの曲率は、アウト・オブ・ディストリビューションセットでより高い尤度を生むようにどう相互作用するか。
RQ5アンサンブルや画像のグレイ化は、訓練分布内データとアウト・オブ・ディストリビューションデータの尤度ギャップに影響するか。

主な発見

Flowベース、VAE、PixelCNNモデルは、訓練データよりアウト・オブ・ディストリビューションデータ（例: SVHN）が高い尤度を割り当てることがある。
フローモデルでは、アウト・オブ・ディストリビューション効果は主に潜在 p(z) の項よりも体積項によって生じる。
定体積フローは現象を除去しない；SVHNは依然としてCIFAR-10より高い尤度を持つことがある。
二次分析は、データ共分散とモデルの曲率の差異に起因してSVHNの尤度が高くなると予測する。
グレイ化（分散を減らす）がCIFAR-10とSVHNの尤度を高めることは、曲率ベースの説明と一致する。
アンサンブルは訓練分布内データとアウト・オブ・ディストリビューションデータの尤度ギャップを実質的に解消しない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。