QUICK REVIEW

[Paper Review] 3D Shape Induction from 2D Views of Multiple Objects

Matheus Gadelha, Subhransu Maji|arXiv (Cornell University)|Dec 18, 2016

Advanced Vision and Imaging31 references19 citations

TL;DR

This paper proposes Projective GANs (PrGANs), a method to learn a generative model of 3D shapes from 2D silhouettes of multiple objects without 3D annotations or viewpoint information. By integrating a differentiable projection module into a GAN framework, PrGANs infer disentangled 3D shape and viewpoint distributions, enabling unsupervised 3D reconstruction and novel view generation from single images with performance comparable to 3D-GANs trained on real 3D data.

ABSTRACT

In this paper we investigate the problem of inducing a distribution over three-dimensional structures given two-dimensional views of multiple objects taken from unknown viewpoints. Our approach called "projective generative adversarial networks" (PrGANs) trains a deep generative model of 3D shapes whose projections match the distributions of the input 2D views. The addition of a projection module allows us to infer the underlying 3D shape distribution without using any 3D, viewpoint information, or annotation during the learning phase. We show that our approach produces 3D shapes of comparable quality to GANs trained on 3D data for a number of shape categories including chairs, airplanes, and cars. Experiments also show that the disentangled representation of 2D shapes into geometry and viewpoint leads to a good generative model of 2D shapes. The key advantage is that our model allows us to predict 3D, viewpoint, and generate novel views from an input image in a completely unsupervised manner.

Motivation & Objective

To learn a probabilistic distribution over 3D shapes from 2D silhouettes of multiple objects without 3D annotations or viewpoint labels.
To enable unsupervised inference of 3D shape and viewpoint from a single 2D image using a single trained model.
To develop a framework that generalizes across shape categories with variable topology, such as chairs, airplanes, and cars.
To disentangle geometry and viewpoint in 2D shape representations for improved generative modeling.
To enable 3D shape generation and novel view synthesis from 2D inputs in a completely unsupervised manner.

Proposed method

A deep generative model of 3D shapes is trained using a GAN framework, with a differentiable projection module that renders 3D voxel grids into 2D silhouettes.
The projection module approximates the rendering pipeline and enables backpropagation from 2D images to 3D voxel representations.
3D shapes are represented as binary occupancy grids in a fixed-resolution 3D voxel grid to ensure topological consistency across instances.
The generator produces 3D shapes from random noise, and the projection module renders them from random viewpoints to form synthetic 2D images for adversarial training.
The discriminator distinguishes between real 2D images and generated 2D projections, encouraging the generator to produce 3D shapes whose projections match the input data distribution.
The model is trained end-to-end using adversarial loss, allowing disentangled representations of geometry and viewpoint to emerge implicitly.

Experimental results

Research questions

RQ1Can a generative model learn a disentangled 3D shape distribution from 2D silhouettes without any 3D annotations or viewpoint labels?
RQ2Can PrGANs produce 3D shapes of quality comparable to GANs trained on real 3D data, even when trained on 2D views alone?
RQ3Can the model generalize to shape categories with variable topology, such as chairs and airplanes, when trained on mixed data?
RQ4Can the model perform unsupervised 3D reconstruction and novel view generation from a single 2D image?
RQ5How well can the model infer depth and viewpoint from a single input image in the absence of ground-truth supervision?

Key findings

PrGANs produce 3D shapes of comparable quality to GANs trained on real 3D data across multiple categories, including chairs, airplanes, and cars.
The model successfully induces a rich and diverse 3D shape distribution even when trained on a mixed set of objects from multiple categories.
The disentangled representation of geometry and viewpoint enables accurate unsupervised 3D reconstruction and novel view synthesis from a single 2D image.
The model generalizes well to unseen categories and generates plausible 3D shapes with consistent topology across instances.
Despite limitations in capturing hidden internal structures due to silhouetting, the method outperforms traditional view-based methods in terms of generative capability and generalization.
The approach is robust to unknown viewpoints and object identities, learning a joint distribution over 3D shapes and viewing angles without supervision.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.