QUICK REVIEW

[論文レビュー] BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

Thu Nguyen-Phuoc, Christian Richardt|arXiv (Cornell University)|Feb 20, 2020

Generative Adversarial Networks and Image Synthesis参考文献 69被引用数 93

ひとこと要約

BlockGAN は、ラベルなしの 2D 画像から 3D オブジェクト認識を持つシーン表現を直接学習し、3D オブジェクト特徴を 3D シーンへ生成・合成することで、現実的な照明と影を伴うオブジェクトの姿勢と識別の操作を可能にする。

ABSTRACT

We present BlockGAN, an image generative model that learns object-aware 3D scene representations directly from unlabelled 2D images. Current work on scene representation learning either ignores scene background or treats the whole scene as one object. Meanwhile, work that considers scene compositionality treats scene objects only as image patches or 2D layers with alpha maps. Inspired by the computer graphics pipeline, we design BlockGAN to learn to first generate 3D features of background and foreground objects, then combine them into 3D features for the wholes cene, and finally render them into realistic images. This allows BlockGAN to reason over occlusion and interaction between objects' appearance, such as shadow and lighting, and provides control over each object's 3D pose and identity, while maintaining image realism. BlockGAN is trained end-to-end, using only unlabelled single images, without the need for 3D geometry, pose labels, object masks, or multiple views of the same scene. Our experiments show that using explicit 3D features to represent objects allows BlockGAN to learn disentangled representations both in terms of objects (foreground and background) and their properties (pose and identity).

研究の動機と目的

ラベルなしの 2D 画像から 3D 的構成性とオブジェクト間相互作用を尊重するシーン表現を学習させる動機付け。
背景と複数の前景オブジェクトを、姿勢と識別を制御可能な形で分離する。
テスト時にオブジェクト数・姿勢・外観を操作しつつ、現実感を保つ。

提案手法

ノイズベクトルとオブジェクト姿勢パラメータから各オブジェクトの 3D オブジェクト特徴を生成する。
各オブジェクトの 3D 特徴を 3D similarity transform で変換し、それらを統一的な 3D シーン特徴へ結合する。
微分可能で学習ベースの perspective projection モジュールを介して 3D シーン特徴を 2D 画像へレンダリングする。
オブジェクト特徴を要素ごとの最大値で組み合わせてシーン特徴を形成する scene composer を使用する。
未ラベル画像に対する敵対的損失でエンドツーエンドに訓練し、 cluttered backgrounds に対するスタイル識別子を用いて補助する。

実験結果

リサーチクエスチョン

RQ1 unsupervised GANs は 2D 画像から直接、分離可能でオブジェクト認識を含む 3D 表現を学習できるか？
RQ2explicit な 3D オブジェクト表現は、シーン内の個々のオブジェクトの姿勢と識別の操作を可能にするか？
RQ3背景が混雑している場合でも、テスト時にオブジェクトの追加・削除をサポートできるか？
RQ4BlockGAN は画像忠実度とオブジェクトの分離性の点で、2D ベースおよび純粋な 3D 対応ベースと比較してどうか？

主な発見

BlockGAN はベースライン GAN よりも競争力のあるまたはそれ以上の視覚忠実度（KID スコア）を示しつつ、明示的なオブジェクトレベルの制御を提供する。
モデルは背景と前景、複数の前景オブジェクト間の分離を実現し、オブジェクトごとに姿勢と識別の操作をサポートする。
テスト時にはシーンにオブジェクトを追加・削除したり、ジオメトリ的に変更したりして、現実的な影と遮蔽を伴う。
BlockGAN は訓練時に少ないオブジェクトで学習しても、テスト時に前景オブジェクトを追加することを可能にし、真の組成的理解を示す。
LR-GAN と比較すると、BlockGAN は明示的なオブジェクトレベルの制御を提供し、背景と前景を同時に変えた場合の絡み合いを回避する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。