QUICK REVIEW

[Paper Review] Learning to See in the Dark

Chen Chen, Qifeng Chen|arXiv (Cornell University)|May 4, 2018

Advanced Image Processing Techniques26 references43 citations

TL;DR

The paper introduces the See-in-the-Dark (SID) dataset for extreme low-light imaging and trains end-to-end fully convolutional networks operating on raw sensor data to improve noise suppression and color accuracy in single-image low-light videos, outperforming traditional pipelines and post-denoising baselines.

ABSTRACT

Imaging in low light is challenging due to low photon count and low SNR. Short-exposure images suffer from noise, while long exposure can induce blur and is often impractical. A variety of denoising, deblurring, and enhancement techniques have been proposed, but their effectiveness is limited in extreme conditions, such as video-rate imaging at night. To support the development of learning-based pipelines for low-light image processing, we introduce a dataset of raw short-exposure low-light images, with corresponding long-exposure reference images. Using the presented dataset, we develop a pipeline for processing low-light images, based on end-to-end training of a fully-convolutional network. The network operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data. We report promising results on the new dataset, analyze factors that affect performance, and highlight opportunities for future work. The results are shown in the supplementary video at https://youtu.be/qWKUFK7MWvg

Motivation & Objective

Motivate fast, high-quality imaging in extremely low light where traditional pipelines fail.
Provide a real, publicly available dataset of raw low-light images with long-exposure ground truth (SID).
Develop an end-to-end learnable pipeline that processes raw sensor data to produce perceptually pleasing low-light images.
Evaluate how end-to-end raw-data processing compares to conventional denoising and burst/imaging approaches.
Explore generalization across cameras and potential for real-time or near-real-time processing.

Proposed method

Train end-to-end fully-convolutional networks (FCNs) that operate directly on raw sensor data, replacing traditional processing modules (demosaicing, denoising, color conversion).
Pack Bayer and X-Trans sensor data into multi-channel inputs, apply black-level subtraction and an external amplification ratio (ISO-like) before network processing, and use a sub-pixel layer to recover full resolution.
Evaluate two core architectures (CAN and U-net), with U-net providing better color and PSNR in experiments.
Train networks with L1 loss using ground-truth long-exposure references, with data augmentation and camera-specific models.
Assess performance against traditional pipelines, BM3D denoising, and idealized burst denoising via perceptual A/B testing (MTurk).
Investigate design choices (input color packing, loss functions, and absence of histogram stretching in training) and their impact on image quality.

Experimental results

Research questions

RQ1Can end-to-end FCNs operating on raw low-light sensor data recover perceptually high-quality images from 1/30 to 1/10 second exposures under <0.1 lux?
RQ2How does learning the entire pipeline on raw data compare to traditional pipelines and post-hoc denoising or burst methods in terms of perceptual quality and quantitative metrics?
RQ3Which network architecture and data representations best preserve color and detail in extreme low-light conditions?
RQ4Is raw-data processing transferable across cameras (sensor types) or require camera-specific models?
RQ5What factors (amplification ratio, packing scheme, loss function) most affect performance and generalization?

Key findings

SID provides 5094 raw short-exposure images with long-exposure ground truth across indoor and outdoor scenes.
End-to-end FCN-based processing of raw data improves over traditional pipelines, enabling significant noise suppression and correct color transformation.
Compared to BM3D and idealized burst denoising, the SID-based pipeline significantly outperforms on challenging x300 data in perceptual tests.
U-net architecture yields better color recovery and PSNR than CAN on SID data.
Operating on raw sensor data is more effective than working on sRGB outputs for extreme low-light conditions.
Some limitations include lack of dynamic scenes in SID and the need for camera-specific models; real-time full-resolution processing remains challenging.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.