QUICK REVIEW

[论文解读] Deformable ConvNets v2: More Deformable, Better Results

Xizhou Zhu, Han Hu|arXiv (Cornell University)|Nov 27, 2018

Advanced Neural Network Applications参考文献 34被引用 181

一句话总结

本文提出了 Deformable ConvNets v2 (DCNv2)，通过增加更多可变形层和一个调制机制来丰富可变形采样，并引入一个特征模仿训练目标，从而在 COCO 数据集的目标检测和实例分割任务上取得显著提升。

ABSTRACT

The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.

研究动机与目标

推动对物体几何变形的建模超越原始 DCNv1 的改进。
通过堆叠可变形层并引入调制机制来提高建模能力。
通过受 R-CNN 特征启发的教师引导的特征模仿损失，指导增强模型的有效训练。
在 COCO 数据集上展示 DCNv2 在 Faster R-CNN 和 Mask R-CNN 以及不同骨干网络上的兼容性与性能提升。

提出的方法

在 ResNet-50 的 conv3–conv5 阶段用可变形卷积替换更多的 3x3 卷积层，以加深可变形建模。
引入一个调制机制，为每个采样位置分配一个学习得到的幅值，从而有选择地增强或抑制采样点。
将带有调制的可变形 RoI 池化扩展，以更好地控制 RoI 内的上下文聚合。
引入一个 R-CNN 特征模仿损失，使每个 RoI 的特征朝向 R-CNN 在裁剪内容上学到的聚焦表示。
保持轻量级的可变形模块，以保持与现有架构如 Faster R-CNN 和 Mask R-CNN 的兼容性。

实验结果

研究问题

RQ1在不过度被背景内容干扰的前提下，通过扩展和调制可变形采样是否能提升对物体几何的敏感度？
RQ2在多段 ResNet 阶段堆叠可变形层是否能相较于 DCNv1 在 COCO 上带来稳定的提升？
RQ3特征模仿目标是否有助于 DCNv2 学习更偏向对象的表示，类似于 R-CNN 的特征？
RQ4在常见骨干网络（如 ResNet-50/101、ResNeXt-101）上，DCNv2 在 COCO 的检测与分割任务上的表现如何？

主要发现

丰富的变形建模在 COCO 上给 Faster R-CNN 和 Mask R-CNN 的精度带来显著提升，超过 DCNv1。
带调制的可变形组件在基本可变形模块基础上提供额外改进，提升边界框和掩码性能。
R-CNN 特征模仿进一步提升每个 RoI 的特征，特别是用于正 RoI，通过鼓励关注对象前景。
在 conv3–conv5 阶段应用带调制的 DCNv2 以及调制 RoI 池化，相较原始可变形设置，在所有骨干上均获得显著的性能提升。
训练收益来自轻量级额外参数和类似蒸馏的信号，而不增加推理成本。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。