Skip to main content
QUICK REVIEW

[Paper Review] DSSD : Deconvolutional Single Shot Detector

Cheng-Yang Fu, Wei Liu|arXiv (Cornell University)|Jan 23, 2017
Advanced Neural Network Applications3 references1,636 citations
TL;DR

DSSD adds deconvolutional encoder-decoder context to SSD with Residual-101, achieving 81.5% mAP on VOC2007 and 33.2% mAP on COCO, outperforming prior single-network detectors.

ABSTRACT

The main contribution of this paper is an approach for introducing additional context into state-of-the-art general object detection. To achieve this we first combine a state-of-the-art classifier (Residual-101[14]) with a fast detection framework (SSD[18]). We then augment SSD+Residual-101 with deconvolution layers to introduce additional large-scale context in object detection and improve accuracy, especially for small objects, calling our resulting system DSSD for deconvolutional single shot detector. While these two contributions are easily described at a high-level, a naive implementation does not succeed. Instead we show that carefully adding additional stages of learned transformations, specifically a module for feed-forward connections in deconvolution and a new output module, enables this new approach and forms a potential way forward for further detection research. Results are shown on both PASCAL VOC and COCO detection. Our DSSD with $513 imes 513$ input achieves 81.5% mAP on VOC2007 test, 80.0% mAP on VOC2012 test, and 33.2% mAP on COCO, outperforming a state-of-the-art method R-FCN[3] on each dataset.

Motivation & Objective

  • Motivate improving general object detection by injecting larger-scale contextual information.
  • Investigate replacing VGG with a deeper backbone (Residual-101) in SSD for higher accuracy.
  • Develop a deconvolution-based hourglass module to pass semantic context to later prediction layers.
  • Introduce a prediction module and a deconvolution module to stabilize training and improve small-object detection.

Proposed method

  • Replace VGG with Residual-101 as the base network in SSD to improve feature quality.
  • Add a prediction module with residual blocks to enhance prediction layers and stabilize training.
  • Attach deconvolution layers after SSD to form an asymmetric encoder-decoder (hourglass) network.
  • Incorporate a deconvolution module with batch normalization and learned upsampling, combined via element-wise product for context fusion.
  • Use skip connections to pass high-level context to finer-resolution feature maps, creating DSSD.
  • Train in two stages: first freeze SSD and train deconvolution side, then fine-tune entire network; adopt SSD-like data augmentation and adjusted aspect ratios for default boxes.

Experimental results

Research questions

  • RQ1Can adding a deconvolution-based encoder-decoder (hourglass) structure to SSD improve accuracy, especially for small objects?
  • RQ2Does replacing VGG with Residual-101 and introducing a dedicated prediction module improve VOC/COCO detection performance without sacrificing speed?
  • RQ3What is the impact of different feature fusion strategies (sum vs product) in the deconvolution module on detection accuracy?
  • RQ4How does training strategy (two-stage training with frozen backbone followed by full fine-tuning) affect convergence and final performance?

Key findings

  • DSSD with Residual-101 and deconvolution layers achieves higher accuracy than SSD and competitive state-of-the-art methods on VOC and COCO.
  • Prediction modules and deconvolution modules significantly improve mAP, especially for small objects and context-specific classes.
  • Element-wise product fusion in the deconvolution module yields best accuracy among tested fusion methods.
  • On VOC2007, DSSD with 513 input achieves 81.5% mAP, outperforming prior single-network detectors like R-FCN and SSD variants.
  • On VOC2012, DSSD achieves 80.0% mAP, and on COCO, DSSD 513 reaches 33.2% mAP, demonstrating strong cross-dataset performance.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.