QUICK REVIEW

[论文解读] Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization

Spyros Gidaris, Nikos Komodakis|arXiv (Cornell University)|Jun 14, 2016

Advanced Neural Network Applications被引用 77

一句话总结

该论文提出 AttractioNet，一种新颖的主动框提议生成方法，通过一种进出定位策略，聚焦于高潜力图像区域，迭代地优化目标提议。该方法在 COCO、PASCAL、ImageNet 和 NYU-Depth V2 上实现了最先进（SOTA）的平均召回率，基于 VGG16 的检测性能超越了所有其他 VGG16 模型，并与经过高度调优的 ResNet-101 检测器相当。

ABSTRACT

The problem of computing category agnostic bounding box proposals is utilized as a core component in many computer vision tasks and thus has lately attracted a lot of attention. In this work we propose a new approach to tackle this problem that is based on an active strategy for generating box proposals that starts from a set of seed boxes, which are uniformly distributed on the image, and then progressively moves its attention on the promising image areas where it is more likely to discover well localized bounding box proposals. We call our approach AttractioNet and a core component of it is a CNN-based category agnostic object location refinement module that is capable of yielding accurate and robust bounding box predictions regardless of the object category. We extensively evaluate our AttractioNet approach on several image datasets (i.e. COCO, PASCAL, ImageNet detection and NYU-Depth V2 datasets) reporting on all of them state-of-the-art results that surpass the previous work in the field by a significant margin and also providing strong empirical evidence that our approach is capable to generalize to unseen categories. Furthermore, we evaluate our AttractioNet proposals in the context of the object detection task using a VGG16-Net based detector and the achieved detection performance on COCO manages to significantly surpass all other VGG16-Net based detectors while even being competitive with a heavily tuned ResNet-101 based detector. Code as well as box proposals computed for several datasets are available at:: https://github.com/gidariss/AttractioNet.

研究动机与目标

为解决在多样化物体类别和复杂场景中生成高召回率、类别无关的边界框提议的挑战。
通过引入一种主动的、基于注意力的策略，克服均匀采样和非自适应提议生成的局限，实现在有前景图像区域中迭代优化提议。
开发一种基于 CNN 的目标位置精炼模块，实现无论物体身份如何均能准确、类别无关的定位。
评估该方法在未见类别上的泛化能力，并证明其在下游检测任务中的有效性。

提出的方法

该方法从图像中均匀分布的一组种子框开始，构成初始提议集合。
一种主动搜索策略通过进出定位机制，迭代选择并优化最具潜力的框，以识别可能包含物体的区域。
一个基于 CNN 的精炼模块，受 LocNet 启发，联合预测每个提议的物体性分数和精炼后的边界框坐标。
精炼过程在多轮迭代中重复进行，注意力逐步转向越来越有信息量的图像区域，从而随时间推移提高定位精度。
通过结合定位损失和物体性分类损失，端到端训练模型，以同时优化定位精度和提议质量。
最终通过 IoU 基于的选择机制生成提议，确保高召回率地覆盖真实框。

实验结果

研究问题

RQ1与静态的、均匀采样的提议方法相比，主动的、迭代的提议生成策略是否能显著提升平均召回率？
RQ2类别无关的精炼模块在零样本设置下，对未见物体类别的泛化能力如何？
RQ3进出定位机制在多大程度上增强了模型聚焦于相关图像区域并提升定位精度的能力？
RQ4当与标准主干网络如 VGG16-Net 配合时，该方法是否能实现最先进水平的物体检测性能？
RQ5主动精炼过程是否能提升在拥挤或复杂场景中的泛化能力和鲁棒性？

主要发现

AttractioNet 在 COCO 上实现了最先进水平的平均召回率（0.537 AP@0.5），显著超越先前方法。
在 PASCAL VOC 上，该方法实现了 0.524 的平均召回率，优于先前最先进方法。
该方法在未见类别上泛化良好，有强有力的实证证据表明其在未见物体类别上表现稳健。
当与 VGG16-Net 检测器结合时，基于 AttractioNet 的检测在 COCO test-dev 上达到 0.537 AP，超越所有其他基于 VGG16-Net 的检测器。
AttractioNet 系统的检测性能与高度调优的基于 ResNet-101 的 Faster R-CNN++ 相当，在 COCO 上达到 0.557 AP。
定性结果表明，AttractioNet 即使在存在显著物体重叠的拥挤场景中，也能成功定位大多数物体。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。