QUICK REVIEW

[论文解读] Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts

Xianjie Chen, Roozbeh Mottaghi|arXiv (Cornell University)|Jun 8, 2014

Advanced Image and Video Retrieval Techniques参考文献 21被引用 92

一句话总结

本文提出了一种灵活的基于部件的物体检测模型，通过使用带开关变量的全连接图模型，动态检测整体物体或身体部位，以解耦不可靠组件，根据可检测性进行判断。该方法在PASCAL VOC 2010动物类别上实现了4.1%的AP提升，通过在形变、遮挡和低分辨率条件下实现自适应检测，同时借助新创建的全标注数据集实现了精确的部件定位。

ABSTRACT

Detecting objects becomes difficult when we need to deal with large shape deformation, occlusion and low resolution. We propose a novel approach to i) handle large deformations and partial occlusions in animals (as examples of highly deformable objects), ii) describe them in terms of body parts, and iii) detect them when their body parts are hard to detect (e.g., animals depicted at low resolution). We represent the holistic object and body parts separately and use a fully connected model to arrange templates for the holistic object and body parts. Our model automatically decouples the holistic object or body parts from the model when they are hard to detect. This enables us to represent a large number of holistic object and body part combinations to better deal with different "detectability" patterns caused by deformations, occlusion and/or low resolution. We apply our method to the six animal categories in the PASCAL VOC dataset and show that our method significantly improves state-of-the-art (by 4.1% AP) and provides a richer representation for objects. During training we use annotations for body parts (e.g., head, torso, etc), making use of a new dataset of fully annotated object parts for PASCAL VOC 2010, which provides a mask for each part.

研究动机与目标

解决在物体检测中对高度形变、遮挡或低分辨率动物的检测挑战。
通过分别建模整体物体和身体部位并实现自适应切换，提升检测鲁棒性。
通过高精度定位头部、躯干、腿部等身体部位，提供比边界框更丰富的物体表征。
在身体部位难以检测时，允许模型依赖整体物体或仅依赖可靠部件，从而实现检测。
开发并发布一个新数据集，包含PASCAL VOC 2010中6种动物类别的像素级掩码标注。

提出的方法

使用全连接图模型，其中节点代表整体物体和身体部位（头部、躯干、腿部），边编码空间和尺度关系。
为每个节点引入开关变量，当其不可检测时动态解耦整体物体或身体部位。
通过利用不同可检测性模式下共享节点的特性，在环状图上进行推理，以保持效率。
使用新创建的PASCAL VOC 2010全标注数据集中的部件级标注（掩码）进行模型训练。
采用判别式学习框架，在联合建模中优化检测AP，同时实现部件定位。
采用基于可检测性的策略而非可见性，使模型能够忽略难以检测的组件（如形变的身体或微小部件）。

实验结果

研究问题

RQ1在大形变、部分遮挡和低分辨率条件下，如何提升物体检测性能？
RQ2统一模型能否根据可检测性动态切换检测整体物体或身体部位？
RQ3同时建模整体和部件级表征是否能带来更好的检测性能和更丰富的物体描述？
RQ4当身体部位较小、被遮挡或模糊时，模型在部件定位上的有效性如何？
RQ5基于可检测性的切换机制是否优于基于可见性的或固定部件的模型？

主要发现

所提方法在PASCAL VOC 2010动物类别上相比最先进方法实现了4.1%的平均精度（AP）绝对提升。
即使简化为仅使用身体部位而无需整体物体，该模型仍比DPM高出7.3% AP，比Sup-DPM高出4.1% AP。
对于超小（XS）物体，66.7%的鸟类实例和52.5%的绵羊实例仅通过整体物体被检测到，表明模型在低分辨率情况下的适应能力。
猫的头部定位达到73.5% POP和77.3% PCP，而绵羊的躯干定位达到79.2% POP和88.6% PCP，表明对明显或稳定部件具有高可靠性。
狗的腿部定位达到28.1% POP和44.9% PCP，表明性能中等，原因在于截断和尺寸过小。
对于超小物体，仅使用整体物体的可检测性模式最为有效，其中66.7%的鸟类实例和52.5%的绵羊实例完全依赖整体物体进行检测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。