Skip to main content
QUICK REVIEW

[Paper Review] CornerNet-Lite: Efficient Keypoint Based Object Detection

Hei Law, Yun Teng|arXiv (Cornell University)|Apr 18, 2019
Advanced Neural Network Applications66 references164 citations
TL;DR

CornerNet-Lite combines CornerNet-Saccade and CornerNet-Squeeze to improve efficiency for keypoint-based detection, achieving faster inference with competitive or better AP than prior real-time detectors on COCO.

ABSTRACT

Keypoint-based methods are a relatively new paradigm in object detection, eliminating the need for anchor boxes and offering a simplified detection framework. Keypoint-based CornerNet achieves state of the art accuracy among single-stage detectors. However, this accuracy comes at high processing cost. In this work, we tackle the problem of efficient keypoint-based object detection and introduce CornerNet-Lite. CornerNet-Lite is a combination of two efficient variants of CornerNet: CornerNet-Saccade, which uses an attention mechanism to eliminate the need for exhaustively processing all pixels of the image, and CornerNet-Squeeze, which introduces a new compact backbone architecture. Together these two variants address the two critical use cases in efficient object detection: improving efficiency without sacrificing accuracy, and improving accuracy at real-time efficiency. CornerNet-Saccade is suitable for offline processing, improving the efficiency of CornerNet by 6.0x and the AP by 1.0% on COCO. CornerNet-Squeeze is suitable for real-time detection, improving both the efficiency and accuracy of the popular real-time detector YOLOv3 (34.4% AP at 30ms for CornerNet-Squeeze compared to 33.0% AP at 39ms for YOLOv3 on COCO). Together these contributions for the first time reveal the potential of keypoint-based detection to be useful for applications requiring processing efficiency.

Motivation & Objective

  • Motivate and address the speed-accuracy tradeoff in keypoint-based object detection without anchors.
  • Propose two efficient variants of CornerNet to improve offline and real-time performance.
  • Demonstrate that saccades and a compact backbone can yield significant speedups with minimal AP loss or even gains.
  • Evaluate CornerNet-Lite on COCO to compare against YOLOv3 and CornerNet.
  • Highlight practical training and architectural adaptations that enable real-time or near-real-time inference.

Proposed method

  • Introduce CornerNet-Saccade that uses an attention-based downscaled pass to propose object locations and then processes selected high-resolution crops in parallel.
  • Develop CornerNet-Squeeze with a compact hourglass backbone inspired by SqueezeNet and MobileNets to reduce per-pixel computation.
  • Employ an hourglass-54 backbone for Saccade to balance depth and efficiency.
  • Use Soft-NMS and boundary-crop suppression to handle partial objects and overlapping crops.
  • Train using the same CornerNet losses for corner heatmaps, embeddings, and offsets across variants.
  • Compare inference time and accuracy on COCO using a consistent hardware setup.

Experimental results

Research questions

  • RQ1Can a saccade-like attention mechanism reduce the number of pixels processed without sacrificing CornerNet accuracy?
  • RQ2Can a compact backbone (CornerNet-Squeeze) provide real-time performance while maintaining or improving AP?
  • RQ3How do CornerNet-Saccade and CornerNet-Squeeze compare against the original CornerNet and YOLOv3 in terms of speed and accuracy on COCO?
  • RQ4Is combining saccades with the ultra-compact backbone beneficial or detrimental for real-time detection?
  • RQ5What are the trade-offs in training efficiency and memory usage for these variants?

Key findings

  • CornerNet-Saccade achieves 6x speed-up over CornerNet with a 1% AP increase on COCO (AP from 42.2% to 43.2%).
  • CornerNet-Squeeze achieves 34.4% AP at 30ms, outperforming YOLOv3 (33.0% at 39ms) on COCO.
  • CornerNet-Lite improves offline efficiency while maintaining high accuracy, and enables real-time performance with competitive AP.
  • CornerNet-Saccade uses a downsized image to predict attention maps for multiple object locations across sizes (small, medium, large).
  • A combined CornerNet-Saccade-Squeeze without attention yields worse performance due to capacity limits; saccades need sufficiently accurate attention maps to help.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.