QUICK REVIEW

[Paper Review] Self-Supervised Intrinsic Image Decomposition

Michael Jänner, Jiajun Wu|arXiv (Cornell University)|Nov 10, 2017

Advanced Vision and Imaging20 references67 citations

TL;DR

The paper introduces Rendered Intrinsics Network (RIN), a deep autoencoder that decomposes images into reflectance, shape, and lighting and uses a differentiable shader with a reconstruction loss to leverage unlabeled data for improved intrinsic representations and transfer to unseen categories.

ABSTRACT

Intrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data. In contrast to traditional fully supervised learning approaches, in this paper we propose learning intrinsic image decomposition by explaining the input image. Our model, the Rendered Intrinsics Network (RIN), joins together an image decomposition pipeline, which predicts reflectance, shape, and lighting conditions given a single image, with a recombination function, a learned shading model used to recompose the original input based off of intrinsic image predictions. Our network can then use unsupervised reconstruction error as an additional signal to improve its intermediate representations. This allows large-scale unlabeled data to be useful during training, and also enables transferring learned knowledge to images of unseen object categories, lighting conditions, and shapes. Extensive experiments demonstrate that our method performs well on both intrinsic image decomposition and knowledge transfer.

Motivation & Objective

Motivate intrinsic image decomposition as a challenging, underconstrained problem requiring robust representations.
Propose a deep structured autoencoder (RIN) that disentangles reflectance, shape, and lighting and reconstructs the input via a learned shader.
Enable learning from unlabeled data through a reconstruction loss to improve intermediate intrinsic representations.
Demonstrate transfer of learned representations to unseen shapes, objects, and lighting distributions without ground-truth intrinsic images.
Show that self-supervised transfer can adapt predictions across categories and conditions while preserving shader/differentiable rendering.

Proposed method

Introduce Rendered Intrinsics Network (RIN) with a shared encoder and three decoders for reflectance, shape, and lighting.
Incorporate a differentiable shading function that renders the intrinsic predictions to reconstruct the input image.
Use a two-network architecture: one for intrinsic image prediction and a shading network; include skip connections for sharp outputs.
Train initially with supervised intrinsic image labels, then continue with unlabeled data using reconstruction loss (self-supervised transfer).
Allow decoders to be updated independently during transfer to accommodate mismatches between labeled and unlabeled data.

Experimental results

Research questions

RQ1Can a deep model jointly predict reflectance, shape, and lighting and still reconstruct the input accurately?
RQ2Does incorporating a differentiable shader and input reconstruction provide a useful supervisory signal from unlabeled data?
RQ3Can self-supervised (reconstruction-based) learning improve intrinsic representations and enable transfer to new shapes, lighting, and object categories without ground-truth intrinsic images?
RQ4To what extent can RIN adapt to mismatches between labeled and unlabeled data across transfer tasks?
RQ5What is the impact of updating individual decoders during transfer in cross-domain scenarios?

Key findings

RIN enables self-supervised transfer by improving intermediate intrinsic predictions using input reconstruction as a supervisory signal.
Shape transfer on unseen objects yields up to 29% improvement in shape predictions after self-supervised updates (average across tested shapes).
Lighting transfer shows notable improvement in lighting predictions, e.g., 18% reduction in lighting MSE after adaptation to new lighting distributions.
Category transfer across shapes (cars vs airplanes) yields substantial gains in shading predictions (around 32%) and moderate gains in reflectance (≈21%).
The learned shader generalizes to real-world objects even when trained only on synthetic shapes, while freezing shader parameters prevents degenerate solutions.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.