Skip to main content
QUICK REVIEW

[Paper Review] Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks

Alessandro Giusti, Dan Cireşan|arXiv (Cornell University)|Feb 7, 2013
Image and Signal Denoising Methods3 references53 citations
TL;DR

This paper proposes a dynamic programming-based image scanning method that accelerates deep max-pooling convolutional neural networks by eliminating redundant convolutions during sliding window inference. By fragmenting max-pooling layer outputs to preserve full spatial coverage, the approach enables efficient, non-redundant forward propagation across the entire image, achieving up to 32× speedup over optimized GPU-based patch-by-patch methods in practice and nearly 3 orders of magnitude in theory on large networks.

ABSTRACT

Deep Neural Networks now excel at image classification, detection and segmentation. When used to scan images by means of a sliding window, however, their high computational complexity can bring even the most powerful hardware to its knees. We show how dynamic programming can speedup the process by orders of magnitude, even when max-pooling layers are present.

Motivation & Objective

  • Address the high computational cost of sliding window inference in deep convolutional neural networks for image segmentation and object detection.
  • Overcome the inefficiency of naive patch-by-patch evaluation, which redundantly recomputes convolutions across overlapping image patches.
  • Develop a general-purpose method that handles arbitrary interleavings of convolutional and max-pooling layers without sacrificing accuracy.
  • Enable real-time or near-real-time inference on large images by minimizing redundant computation through image-wide computation of feature maps.
  • Demonstrate significant speedups over both CPU-based patch methods and highly optimized GPU-based patch implementations.

Proposed method

  • Propose a novel image-based forward-propagation strategy that computes convolutions once per input image, rather than per overlapping patch.
  • Introduce a fragmentation technique for max-pooling layer outputs to ensure that all patches in the input image are represented in the extended feature maps.
  • Treat each fragment of a max-pooled map as an independent submap, preserving spatial information for all possible patch positions.
  • Ensure that the union of all fragments across all max-pooling layers contains complete information for every possible patch in the input image.
  • Apply the method to the entire input image in a single pass, propagating features through all convolutional and max-pooling layers without recomputation.
  • Use dynamic programming principles to systematically compute and propagate feature maps across the full image, avoiding redundant operations.

Experimental results

Research questions

  • RQ1How can we eliminate redundant convolutional computations when scanning large images with deep CNNs using a sliding window?
  • RQ2What is an efficient way to handle max-pooling layers in a sliding window setting without losing information about all possible patches?
  • RQ3Can a general-purpose, non-patch-based forward-propagation method be designed for arbitrary architectures of interleaved convolutional and max-pooling layers?
  • RQ4To what extent can image-wide computation reduce FLOPS and inference time compared to patch-based approaches?
  • RQ5How does the proposed method compare in speed to highly optimized GPU-based patch implementations, especially in terms of practical performance gains?

Key findings

  • The proposed image-based approach reduces FLOPS by a factor of 779.8 compared to the patch-based method on a large network used for neuronal membrane segmentation.
  • A simple MATLAB implementation of the image-based method achieves a 32.8× speedup over a highly optimized GPU-based patch implementation on a 512×512 image.
  • The method achieves near-theoretical speedup of nearly three orders of magnitude on large networks such as those used in the ISBI Electron Microscopy Segmentation Challenge.
  • The approach is exact and produces identical results to the patch-based method, ensuring no loss of accuracy despite the optimization.
  • Max-pooling layers, which traditionally hinder image-wide computation, are handled effectively via fragment-based processing that maintains full spatial coverage.
  • The speedup is robust even when using a high-level language like MATLAB, indicating that the performance gain is due to algorithmic efficiency rather than low-level optimizations.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.