[论文解读] Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation
本文提出 Complete-IoU (CIoU) 损失和 Cluster-NMS,在边界框回归和 NMS 中融入三种几何因素,在检测和分割模型的实时推断下提升 AP 和 AR。
Deep learning-based object detection and instance segmentation have achieved unprecedented progress. In this paper, we propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $\ell_n$-norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires less iterations. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT and BlendMask-RT), and object detection (e.g., YOLO v3, SSD and Faster R-CNN) models. Taking YOLACT on MS COCO as an example, our method achieves performance gains as +1.7 AP and +6.2 AR$_{100}$ for object detection, and +0.9 AP and +3.5 AR$_{100}$ for instance segmentation, with 27.1 FPS on one NVIDIA GTX 1080Ti GPU. All the source code and trained models are available at https://github.com/Zzh-tju/CIoU
研究动机与目标
- 动机:指出用于检测和分割的基于 IoU 的损失在边界框回归方面的局限性。
- 提出包含重叠、中心距离和纵横比项的完整几何因子损失(CIoU)。
- 开发 Cluster-NMS,以在加速非极大抑制的同时实现几何因子集成。
- 在最先进的检测器和分割器上演示训练与推理的提升。
- 在不牺牲精度的前提下,在 GPU 上展示实时性能。
提出的方法
- 将 CIoU 损失定义为 1 - IoU 加上归一化的中心距离和带自适应权重(alpha)的纵横比项。
- 将三种几何因子 S(重叠)、D(距离)、V(纵横比)表述为无尺度且归一化到 [0,1]。
- 提供分析与仿真,将 CIoU 与 IoU 及 GIoU 损失进行对比,展示在极端情况上的更快收敛和更好的回归。
- 引入将框分组为簇并在 GPU 上以较少迭代执行 NMS 的 Cluster-NMS。
- 通过分数惩罚和基于距离的项将几何因子整合到 Cluster-NMS 中(Cluster-NMS_S、Cluster-NMS_S+D、Cluster-NMS_W、Cluster-NMS_W+D)。
- 将 CIoU 与 Cluster-NMS 应用于 YOLACT、BlendMask-RT、YOLOv3、SSD 和 Faster R-CNN 以验证增益。
实验结果
研究问题
- RQ1与传统损失和 NMS 变体相比,CIoU 损失和 Cluster-NMS 是否能改善边界框回归和抑制质量?
- RQ2三种几何因子(重叠面积、中心距离、纵横比)如何影响训练动态与收敛性?
- RQ3将 CIoU 与 Cluster-NMS 集成到最先进的检测器和分割器中,是否能维持或提高推理速度?
- RQ4这些方法在对象检测和实例分割任务中是否都有效?
主要发现
- CIoU 损失在 AP 和 AR 上相对于 l1 范式和基于 IoU 的损失表现出一致的提升。
- Cluster-NMS 在保持实时推断的同时提供显著的 AP 和 AR 增益。
- 将其应用于 MS COCO 的 YOLACT,方法在目标检测上实现 +1.7 AP 和 +6.2 AR100,在实例分割上实现 +1.1 AP 和 +3.5 AR100,在 GTX 1080Ti 上达到 27.1 FPS。
- 应用于其他模型(YOLOv3、SSD、Faster R-CNN)也观察到增益。
- CIoU 收敛速度快于 GIoU,并且对极端纵横比的处理更好。
- Cluster-NMS 可以纯在 GPU 上实现,并且在较少迭代的情况下复现原始 NMS 结果。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。