[Paper Review] Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks
The paper introduces the deep decoder, an underparameterized, untrained non-convolutional network that generates natural images from few weights, enabling concise representations and competitive denoising, super-resolution, and inpainting without training.
Deep neural networks, in particular convolutional neural networks, have become highly effective tools for compressing images and solving inverse problems including denoising, inpainting, and reconstruction from few and noisy measurements. This success can be attributed in part to their ability to represent and generate natural images well. Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters---typically a multiple of their output dimension---and need to be trained on large datasets. In this paper, we propose an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters. The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality. This underparameterization enables the deep decoder to compress images into a concise set of network weights, which we show is on par with wavelet-based thresholding. Further, underparameterization provides a barrier to overfitting, allowing the deep decoder to have state-of-the-art performance for denoising. The deep decoder is simple in the sense that each layer has an identical structure that consists of only one upsampling unit, pixel-wise linear combination of channels, ReLU activation, and channelwise normalization. This simplicity makes the network amenable to theoretical analysis, and it sheds light on the aspects of neural networks that enable them to form effective signal representations.
Motivation & Objective
- Introduce an underparameterized image model that represents natural images with few parameters.
- Propose a simple, untrained network architecture without convolutions that can generate high-quality images.
- Demonstrate the deep decoder as a regularizer/structure prior for inverse problems (denoising, super-resolution, inpainting).
- Provide theoretical insight into why underparameterization helps avoid overfitting and how upsampling induces locality.
Proposed method
- Define the deep decoder G that maps a fixed random input B0 through d layers to an image, with parameters C = {Ci} controlling channel-wise linear combinations (Ci), upsampling (Ui), ReLU (−), and channel normalization (cn).
- Use upsampling to introduce spatial coupling without traditional convolutions; final output is x = sigmoid(Bd Cd).
- Train only the network weights C by minimizing L(C) = ||f(G(C)) − y||2 for a given forward model f and observation y, using Adam or gradient descent.
- Demonstrate underparameterization (N ≪ n, where n is image size) enables concise representation of images and limits fitting of noise.
- Compare with wavelet thresholding for compression; show that with d = 6 and k = 64 or 128, N ≈ 25k–100k versus image size 512×512×3.
Experimental results
Research questions
- RQ1Can an underparameterized, untrained, non-convolutional network serve as an effective image model for compression?
- RQ2How does the deep decoder perform on inverse problems like denoising, super-resolution, and inpainting without training?
- RQ3What architectural choices (upsampling, 1×1 channel mixing, normalization) are essential for its effectiveness?
- RQ4Why does the deep decoder resist fitting noise, and how does this relate to its denoising capabilities?
- RQ5How does it compare to trained models and to the Deep Image Prior (DIP) in untrained settings?
Key findings
- The deep decoder can compress natural images with as few parameters as a small fraction of the output size, performing on par with wavelet-based thresholding.
- As an untrained, underparameterized model, it provides strong denoising performance without requiring training data and without heavy regularization like early stopping.
- Compared to 1×1 convolutions, the chosen upsampling-based architecture yields concise representations and enables effective inversion for denoising, super-resolution, and inpainting.
- Theoretical analysis shows the model can fit only a small portion of noise, explaining its denoising capability beyond empirical results.
- Empirical comparisons show competitive denoising performance with untrained methods (including DIP) and favorable results against BM3D in certain settings, while also supporting competitive super-resolution and inpainting outcomes.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.