QUICK REVIEW

[論文レビュー] Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs

Xingang Pan, Bo Dai|arXiv (Cornell University)|Nov 2, 2020

Generative Adversarial Networks and Image Synthesis参考文献 43被引用数 51

ひとこと要約

この論文は、RGB画像で訓練された既製の2D GANが潜在する3D幾何学的手掛かりを含んでおり、弱い形状事前知識と微分レンダリングを用いたGAN画像多様体を反復的に探索することで、単一画像からの教師なし3D形状再構成を実現する。

ABSTRACT

Natural images are projections of 3D objects on a 2D image plane. While state-of-the-art 2D generative models like GANs show unprecedented quality in modeling the natural image manifold, it is unclear whether they implicitly capture the underlying 3D object structures. And if so, how could we exploit such knowledge to recover the 3D shapes of objects in the images? To answer these questions, in this work, we present the first attempt to directly mine 3D geometric cues from an off-the-shelf 2D GAN that is trained on RGB images only. Through our investigation, we found that such a pre-trained GAN indeed contains rich 3D knowledge and thus can be used to recover 3D shape from a single 2D image in an unsupervised manner. The core of our framework is an iterative strategy that explores and exploits diverse viewpoint and lighting variations in the GAN image manifold. The framework does not require 2D keypoint or 3D annotations, or strong assumptions on object shapes (e.g. shapes are symmetric), yet it successfully recovers 3D shapes with high precision for human faces, cats, cars, and buildings. The recovered 3D shapes immediately allow high-quality image editing like relighting and object rotation. We quantitatively demonstrate the effectiveness of our approach compared to previous methods in both 3D shape reconstruction and face rotation. Our code is available at https://github.com/XingangPan/GAN2Shape.

研究の動機と目的

事前訓練された2D GANが本質的に3D幾何を捉えているかを調査する。
2Dキーポイントや3Dアノテーションなしに、単一の2D画像から3D形状を再構成する教師なしパイプラインを開発する。
GAN由来の3D形状から回転や再照明などの3D認識画像操作を可能にする。
従来の教師なし3D再構成法と比較して評価し、カテゴリ横断的適用性を示す。

提案手法

対象画像の深度を初期化するために、弱い凸形状事前知識（楕円体）を用いる。
現在の深度/アルベロを用いて、ランダムな視点と照明で偽サンプルをレンダリングし、これらのサンプルをGAN反転してGAN多様体内の投影サンプルを得る。
再構成投影と一致するよう、投影サンプルに対して深度、アルベロ、視点、光を共同最適化して、写真幾何学的オートエンコーディング・パイプラインを通じて3D形状を学習する。
細部を捉えるために、サイクルを通じて3D形状とネットワークを反復的に洗練させる。
GAN潜在コードを中間潜在空間内に保つよう正則化された潜在オフセット戦略を組み込み、再構成品質を向上させる。
より良い一般化のために、相対潜在オフセットを用いて共訓練を拡張することをオプションとして検討。

実験結果

リサーチクエスチョン

RQ1RGB画像で訓練された2D GANは、顔、車、建物などのオブジェクトの3D幾何を暗黙的に符号化しているだろうか？
RQ2GANの画像多様体と弱い形状事前知識を活用して、単一の2D画像から正確な3D形状を再構成する教師なしパイプラインは実現できるか？
RQ3形状事前知識と潜在正則化が3D再構成品質と対称性仮定に与える影響は何か？
RQ4回転や再照明といった3D認識操作をどれくらいうまく可能にするか？
RQ5標準ベンチマーク上で提案手法は既存の教師なし3D形状学習法とどのように比較されるか？

主な発見

No.	Method	Symmetry	SIDE (×10^{-2}) ↓	MAD (deg.) ↓
1	Supervised	N	0.419	10.83
2	Const. null depth	/	2.723	43.22
3	Average g.t. depth	/	1.978	22.99
4	Unsup3d (Wu et al., 2020 )	Y	0.807	16.34
5	Ours (w/o regularize)	Y	0.925	16.42
6	Ours	Y	0.756	14.81
7	Unsup3d (Wu et al., 2020 )	N	1.334	33.79
8	Ours	N	1.023	17.09

本手法は単一画像から顔、猫、車、建物の3D形状を高忠実度で再構成する。
Unsup3dと比較して、非対称性や大きな視点変化をより適切に扱い、対称性事前知識への依存を避ける。
BFMでの定量的結果は、SIDEとMAD指標がベースラインより有利であることを示し、キーポイントや3D監視なしで3D再構成を改善。
復元された3D形状は、回転や再照明といった現実的な3D対応操作を可能にし、基礎となる3D幾何と密接に一致する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。