QUICK REVIEW

[论文解读] DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

Wanli Ouyang, Xiaogang Wang|arXiv (Cornell University)|Dec 17, 2014

Advanced Neural Network Applications参考文献 66被引用 78

一句话总结

本文提出 DeepID-Net，一种用于通用目标检测的可变形深度卷积神经网络，引入了变形约束池化（def-pooling）层，通过几何约束建模部件级形变。通过结合新颖的基于目标的预训练方案、模型平均和流水线优化，该框架在 ILSVRC2014 检测基准上的平均平均精度（mAP）从 RCNN 的 31.0% 提升至 50.3%，优于 RCNN 和 GoogLeNet。

ABSTRACT

In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN \cite{girshick2014rich}, which was the state-of-the-art, from 31\% to 50.3\% on the ILSVRC2014 detection test set. It also outperforms the winner of ILSVRC2014, GoogLeNet, by 6.1\%. Detailed component-wise analysis is also provided through extensive experimental evaluation, which provide a global view for people to understand the deep learning object detection pipeline.

研究动机与目标

为解决图像分类用的图像级预训练与需要定位敏感性的目标检测之间的领域差距。
通过可微分的、受约束的池化层，建模跨目标类别和语义层级的共享可变形视觉模式。
通过整合预训练、形变建模、上下文信息和模型平均的统一流水线，提升检测性能。
在标准化评估下，对深度学习目标检测的各个组件进行全面的、分量级分析。

提出的方法

引入一种变形约束池化（def-pooling）层，通过空间偏移的二次惩罚函数建模部件形变。
提出一种新型预训练策略，使用基于目标的标注而非图像级标签，以更好地匹配检测任务需求。
采用多尺度、多类别预训练方案，以增强检测任务的特征泛化能力。
通过在多种架构（A-net、Z-net、O-net、G-net）之间进行模型平均，提升鲁棒性和性能。
整合来自图像分类得分和边界框回归的上下文建模，用于结果优化。
应用选择性搜索和边缘框（edgeboxes）生成区域建议，并使用边界框过滤机制剔除低质量候选框。

实验结果

研究问题

RQ1与图像级预训练相比，基于目标的预训练是否能提升目标检测的特征表示？
RQ2可微分的、变形约束的池化层在建模可变形部件时，如何提升检测性能？
RQ3各组件——预训练、def-pooling、上下文建模、模型平均——对整体 mAP 提升的贡献分别是什么？
RQ4结合多种模型的统一检测流水线是否能优于单一模型或现有 SOTA 方法？

主要发现

所提出的基于目标的预训练方案相比图像级预训练将 mAP 提升了 2.6%，证明其在检测任务中的优越性。
当在 Z-net 架构中用 def-pooling 层替换标准池化层时，mAP 提升了 2.5%，表明其在建模形变方面的有效性。
在多种架构（A-net 至 G-net）之间进行模型平均显著提升了性能，最终集成模型在 ILSVRC2014 上达到 50.7% 的 mAP。
完整流水线（包含 def-pooling、多尺度预训练和上下文建模）将 mAP 从 RCNN 基线的 29.9% 提升至 50.3%。
该方法在 mAP 上比 ILSVRC2014 冠军模型 GoogLeNet 高出 6.1%，确立了新的 SOTA 水平。
组件消融实验表明，使用基于目标的标注进行预训练和多尺度数据训练带来的个体增益最大（分别为 2.6% 和 2.2%）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。