QUICK REVIEW

[Paper Review] Accelerating 3D Deep Learning with PyTorch3D

Nikhila Ravi, Jeremy Reizenstein|arXiv (Cornell University)|Jul 16, 2020

3D Shape Modeling and Analysis60 references121 citations

TL;DR

Pytorch3D provides modular, differentiable 3D operators and a fast differentiable renderer to accelerate 3D deep learning, enabling unsupervised 3D shape prediction and scalable batching for meshes and point clouds. The library achieves up to 10x speedups and enables state-of-the-art results on ShapeNet without 3D supervision.

ABSTRACT

Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges involved in 3D deep learning, such as efficiently processing heterogeneous data and reframing graphics operations to be differentiable. We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning. It includes a fast, modular differentiable renderer for meshes and point clouds, enabling analysis-by-synthesis approaches. Compared with other differentiable renderers, PyTorch3D is more modular and efficient, allowing users to more easily extend it while also gracefully scaling to large meshes and images. We compare the PyTorch3D operators and renderer with other implementations and demonstrate significant speed and memory improvements. We also use PyTorch3D to improve the state-of-the-art for unsupervised 3D mesh and point cloud prediction from 2D images on ShapeNet. PyTorch3D is open-source and we hope it will help accelerate research in 3D deep learning.

Motivation & Objective

Address engineering challenges in 3D deep learning due to heterogeneous data formats (meshes, point clouds, voxels) and differentiability requirements.
Provide a modular, efficient, differentiable 3D operator library built on PyTorch.
Introduce a fast, modular differentiable renderer for meshes and point clouds to enable analysis-by-synthesis and inverse rendering.
Demonstrate speedups and memory efficiency over existing implementations and improve state-of-the-art for unsupervised 3D shape prediction from 2D images.

Proposed method

Introduce PyTorch3D data structures that support batches of 3D data with varying sizes and topologies.
Develop a modular, differentiable rendering engine for meshes and point clouds with two-stage rasterization and K-nearest-face influence per pixel.
Expose rendering intermediates so users can plug in custom shaders and components via PyTorch autograd.
Implement custom CUDA kernels for key 3D operators (Chamfer loss, graph convolution, KNN) to improve speed and memory usage.
Benchmark against pure PyTorch and other libraries to show speed/memory improvements up to 10x.
Demonstrate unsupervised 3D shape prediction on ShapeNet using differentiable renderers with 2D supervision.

Experimental results

Research questions

RQ1Can a modular, differentiable 3D rendering engine scale to large meshes and heterogeneous batches without sacrificing performance?
RQ2Do custom CUDA implementations for 3D operators provide substantial speed/memory advantages over existing approaches?
RQ3Can differentiable rendering with PyTorch3D improve unsupervised 3D mesh and point cloud prediction from 2D images on ShapeNet?

Key findings

PyTorch3D achieves significant speed and memory improvements versus naïve PyTorch and other libraries, with up to 10x improvements.
A modular renderer with separable rasterization and shading, using K nearest faces per pixel, improves efficiency and still maintains performance.
Experiments on ShapeNet show improved state-of-the-art in unsupervised 3D mesh and point cloud prediction from 2D images when using PyTorch3D renderers.
The point-cloud and mesh renderers support heterogeneous batches and still achieve competitive or superior results to SoftRas in several settings.
The approach enables larger image resolutions and more complex meshes to be used in unsupervised 3D learning without prohibitive compute.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.