[论文解读] Gradient Harmonized Single-stage Detector
本文提出 Gradient Harmonizing Mechanism (GHM),用于在一阶段检测器中平衡梯度贡献,提出用于分类的 GHM-C 和回归的 GHM-R,在 COCO 上实现了不依赖大量超参数调优的最先进结果。
Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i.e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples. In this work, we first point out that the essential effect of the two disharmonies can be summarized in term of the gradient. Further, we propose a novel gradient harmonizing mechanism (GHM) to be a hedging for the disharmonies. The philosophy behind GHM can be easily embedded into both classification loss function like cross-entropy (CE) and regression loss function like smooth-$L_1$ ($SL_1$) loss. To this end, two novel loss functions called GHM-C and GHM-R are designed to balancing the gradient flow for anchor classification and bounding box refinement, respectively. Ablation study on MS COCO demonstrates that without laborious hyper-parameter tuning, both GHM-C and GHM-R can bring substantial improvement for single-stage detector. Without any whistles and bells, our model achieves 41.6 mAP on COCO test-dev set which surpasses the state-of-the-art method, Focal Loss (FL) + $SL_1$, by 0.8.
研究动机与目标
- 识别一种阶段检测器训练中的不和谐来源(类别/属性不平衡)。
- 提出基于梯度的调和机制,在训练过程中平衡梯度贡献。
- 开发用于分类的 GHM-C 和用于回归的 GHM-R,使其能够适应小批量数据,而无需大量超参数调优。
- 在 COCO 上通过 RetinaNet 风格的一阶段检测器展示改进,并与 focal loss 及其他基线进行比较。
提出的方法
- 将梯度密度 GD(g) 定义为训练样本中梯度范数 g 的分布。
- 对每个样本计算梯度调和参数 beta_i = N / GD(g_i),以重新加权损失。
- 通过用 L_GHM-C = (1/N) sum_i beta_i L_CE(p_i, p_i*) 替换标准 CE 损失来形成 GHM-C。
- 通过引入 ASL1(Authentic Smooth L1)及其梯度 gr,将 GHM 扩展到回归,应用 L_GHM-R = (1/N) sum_i beta_i ASL1(d_i)。
- 使用单位区域(epsilon)近似梯度密度,并进行 EMA 平滑,以确保小批量更新中的稳定性。
- 表明 GHM 能够适应每个批次中的数据分布,减少简单负样本和离群值的支配地位。
实验结果
研究问题
- RQ1Can gradient-density-based reweighting improve the training efficiency and accuracy of one-stage detectors?
- RQ2How do GHM-C and GHM-R compare to cross-entropy and smooth L1 losses, respectively, on COCO benchmarks?
- RQ3Does the proposed EMA-based gradient density estimation provide stable and scalable training in large-scale datasets?
- RQ4Can the GHM approach transfer to two-stage detectors and other backbones while maintaining or improving accuracy?
主要发现
| 方法 | 网络 | AP | AP50 | AP75 | AP_S | AP_M | AP_L |
|---|---|---|---|---|---|---|---|
| Faster RCNN | FPN-ResNet-101 | 36.2 | 59.1 | 39.0 | 18.2 | 39.0 | 48.2 |
| Mask RCNN | FPN-ResNet-101 | 38.2 | 60.3 | 41.7 | 20.1 | 41.1 | 50.2 |
| Mask RCNN | FPN-ResNeXt-101 | 39.8 | 62.3 | 43.4 | 22.1 | 43.2 | 51.2 |
| YOLOv3 | DarkNet-53 | 33.0 | 57.9 | 34.4 | 18.3 | 35.4 | 41.9 |
| DSSD513 | DSSD-ResNet-101 | 33.2 | 53.3 | 35.2 | 13.0 | 35.4 | 51.1 |
| Focal Loss | RetinaNet-FPN-ResNet-101 | 39.1 | 59.1 | 42.3 | 21.8 | 42.7 | 50.2 |
| Focal Loss | RetinaNet-FPN-ResNeXt-101 | 40.8 | 61.1 | 44.1 | 24.1 | 44.2 | 51.2 |
| GHM-C + GHM-R | RetinaNet-FPN-ResNet-101 | 39.9 | 60.8 | 42.5 | 20.3 | 43.6 | 54.1 |
| GHM-C + GHM-R | RetinaNet-FPN-ResNeXt-101 | 41.6 | 62.8 | 44.2 | 22.3 | 45.1 | 55.3 |
- GHM-C substantially improves classification performance over standard CE and is competitive with, or better than, Focal Loss on COCO.
- GHM-R improves bounding box regression over SL1 and ASL1, particularly at higher IoU thresholds, indicating better localization.
- The combination of GHM-C and GHM-R with RetinaNet achieves state-of-the-art-like results on COCO test-dev, outperforming Focal Loss variants.
- With unit-region approximation (M around 30), training remains efficient and substantially faster than naive densities, while maintaining performance gains.
- GHM approach extends to two-stage detectors, yielding improved AP over SL1 baselines in Faster R-CNN variants.
- On COCO test-dev, GHM-C + GHM-R with RetinaNet-ResNet-101 achieves 39.9 AP and with ResNeXt-101 achieves 41.6 AP, exceeding Focal Loss baselines.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。