QUICK REVIEW

[论文解读] FSSD: Feature Fusion Single Shot Multibox Detector

Zuoxin Li, Yang, Lu|arXiv (Cornell University)|Dec 4, 2017

Advanced Neural Network Applications参考文献 7被引用 391

一句话总结

FSSD 通过引入一个轻量级特征融合模块，连接多层特征以构建新的特征金字塔，从而提高检测准确性（特别是对小目标），并在速度略有下降的情况下实现提升。

ABSTRACT

SSD (Single Shot Multibox Detector) is one of the best object detection algorithms with both high accuracy and fast speed. However, SSD's feature pyramid detection method makes it hard to fuse the features from different scales. In this paper, we proposed FSSD (Feature Fusion Single Shot Multibox Detector), an enhanced SSD with a novel and lightweight feature fusion module which can improve the performance significantly over SSD with just a little speed drop. In the feature fusion module, features from different layers with different scales are concatenated together, followed by some down-sampling blocks to generate new feature pyramid, which will be fed to multibox detectors to predict the final detection results. On the Pascal VOC 2007 test, our network can achieve 82.7 mAP (mean average precision) at the speed of 65.8 FPS (frame per second) with the input size 300$ imes$300 using a single Nvidia 1080Ti GPU. In addition, our result on COCO is also better than the conventional SSD with a large margin. Our FSSD outperforms a lot of state-of-the-art object detection algorithms in both aspects of accuracy and speed. Code is available at https://github.com/lzx1413/CAFFE_SSD/tree/fssd.

研究动机与目标

解决基于 SSD 的检测器中的多尺度目标检测挑战。
提出一个轻量级的特征融合模块，用于连接并对来自不同层的特征进行下采样。
从融合后的特征生成一个新的特征金字塔，并将其输入到 multibox 检测器。
在 PASCAL VOC 和 MS COCO 上评估 FSSD，以量化准确性和速度的提升。

提出的方法

定义一个特征融合框架，对选定层的投影特征（通过 1x1 卷积）在重设为公共空间尺寸后进行拼接。
使用拼接（而非逐元素求和）来融合来自 conv3 3、conv4 3、fc7 和 conv7 2 的特征（在 SSD300 骨干中可选地排除 conv3 3）。
在融合后应用 Batch Normalization 以归一化特征尺度。
通过对融合后的特征图应用下采样块（stride-2 卷积）来构建金字塔特征提取器。
通过从 VGG16/SSD 预训练或来自 COCO 预训练模型进行微调来训练 FSSD，使用 SSD 风格的损失和难负样本挖掘。

实验结果

研究问题

RQ1单个、轻量级的特征融合模块是否能够通过利用多尺度特征来提升 SSD？
RQ2基于拼接的融合在多尺度特征融合中是否优于基于求和的融合？
RQ3在 VOC 和 COCO 数据集上，融合特征设计对准确率与速度的影响是什么？

主要发现

FSSD 在 VOC2007 测试上以 300x300 输入获得 82.7 mAP，单卡 1080Ti 下 65.8 FPS（COCO-pretrained 模型）。
在 VOC2012 上，FSSD300 以 COCO 预训练达到 82.0% mAP，FSSD512 达到 84.2% mAP，超出 SSD 基线。
COCO test-dev 结果显示 FSSD300 实现 27.1% AP，高于 SSD300*（25.1%），而 FSSD512 实现 31.8% AP。
消融研究表明拼接优于逐元素求和，融合后的 Batch Normalization 将 mAP 提升约 0.7%。
所提出的融合金字塔设计在小目标检测方面带来显著提升，并且相比标准 SSD 减少多部件检测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。