[Paper Review] LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation
LiteFlowNet is a compact CNN that outperforms FlowNet2 on challenging benchmarks while having ~30x fewer parameters and ~1.36x faster runtime, achieved via cascaded flow inference, feature warping, and a novel flow regularization layer.
FlowNet2, the state-of-the-art convolutional neural network (CNN) for optical flow estimation, requires over 160M parameters to achieve accurate flow estimation. In this paper we present an alternative network that outperforms FlowNet2 on the challenging Sintel final pass and KITTI benchmarks, while being 30 times smaller in the model size and 1.36 times faster in the running speed. This is made possible by drilling down to architectural details that might have been missed in the current frameworks: (1) We present a more effective flow inference approach at each pyramid level through a lightweight cascaded network. It not only improves flow estimation accuracy through early correction, but also permits seamless incorporation of descriptor matching in our network. (2) We present a novel flow regularization layer to ameliorate the issue of outliers and vague flow boundaries by using a feature-driven local convolution. (3) Our network owns an effective structure for pyramidal feature extraction and embraces feature warping rather than image warping as practiced in FlowNet2. Our code and trained models are available at https://github.com/twhui/LiteFlowNet .
Motivation & Objective
- Motivate a lightweight yet accurate CNN for optical flow estimation.
- Develop pyramidal feature extraction and feature warping to reduce search space and improve efficiency.
- Introduce cascaded flow inference with descriptor matching for progressive refinement.
- Incorporate a feature-driven local convolution regularization to reduce outliers and sharpen boundaries.
- Demonstrate end-to-end training and competitive performance on standard benchmarks.
Proposed method
- Two sub-networks: NetC for pyramidal feature extraction and NetE for pyramidal flow estimation.
- Feature warping (f-warp) applied to CNN features rather than images to reduce feature-space distance.
- Cascaded flow inference at each pyramid level with a descriptor matching unit M and a sub-pixel refinement unit S.
- Cost-volume-based descriptor matching with short-range search and sparse sampling to reduce computation.
- Flow regularization via a feature-driven local convolution (f-lconv) whose filters are adapted to features, flow, and occlusion cues.
- Training proceeds stage-wise across pyramid levels with staged addition of M, S, and R units; end-to-end training with L2 loss and Adam optimizer.
Experimental results
Research questions
- RQ1Can a compact CNN architecture achieve state-of-the-art or near state-of-the-art optical flow accuracy with significantly fewer parameters?
- RQ2Does feature warping in the CNN feature space improve matching efficiency and accuracy over image warping?
- RQ3Does a cascaded flow inference strategy with descriptor matching and sub-pixel refinement improve large-displacement flow estimation?
- RQ4Can a learned, feature-driven regularization layer reduce artifacts and sharpen flow boundaries while maintaining efficiency?
Key findings
- LiteFlowNet achieves competitive or superior results to FlowNet2 on Sintel final pass and KITTI benchmarks while using ~30x fewer parameters and running ~1.36x faster.
- A 6-level pyramid with separate NetC (features) and NetE (flow) enables effective coarse-to-fine estimation.
- Feature warping of CNN features (f-warp) reduces the residual flow to be estimated, improving accuracy and efficiency.
- Cascaded flow inference with descriptor matching (M) and sub-pixel refinement (S) progressively improves flow, aiding large-displacement cases.
- A novel feature-driven local convolution (f-lconv) provides image- and flow-aware regularization, stabilizing boundaries and reducing artifacts.
- LiteFlowNet and its variants outperform SPyNet and several FlowNet2 variants, while being significantly more parameter-efficient; LiteFlowNet-ft (fine-tuned) excels on Sintel and KITTI when trained with task-specific data.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.