[论文解读] Mask-Guided Attention Network for Occluded Pedestrian Detection
本文提出 MGAN,即掩码引导的注意力模块,突出可见行人区域以抑制遮挡部分,集成到 Faster R-CNN 中,在 CityPersons 和 Caltech 数据集上使用粗级别分割标注达到最先进结果。
Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved on standard pedestrians, the performance on heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other pedestrians and inter-class occlusions caused by other objects, such as cars and bicycles. These result in a multitude of occlusion patterns. We propose an approach for occluded pedestrian detection with the following contributions. First, we introduce a novel mask-guided attention network that fits naturally into popular pedestrian detection pipelines. Our attention network emphasizes on visible pedestrian regions while suppressing the occluded ones by modulating full body features. Second, we empirically demonstrate that coarse-level segmentation annotations provide reasonable approximation to their dense pixel-wise counterparts. Experiments are performed on CityPersons and Caltech datasets. Our approach sets a new state-of-the-art on both datasets. Our approach obtains an absolute gain of 9.5% in log-average miss rate, compared to the best reported results on the heavily occluded (HO) pedestrian set of CityPersons test set. Further, on the HO pedestrian set of Caltech dataset, our method achieves an absolute gain of 5.0% in log-average miss rate, compared to the best reported results. Code and models are available at: https://github.com/Leotju/MGAN.
研究动机与目标
- 在强遮挡下激励鲁棒的行人检测,因为遮挡物会降低全身特征的效果。
- 提出一个轻量级的 Mask-Guided Attention (MGA) 分支,以强调可见区域并抑制建议框内的遮挡。
- 通过将 MGA 集成到标准的 Faster R-CNN 基检测器,实现端到端训练。
- 利用粗级可见区域标注作为 MGA 分支的实用监督信号。
提出的方法
- 引入两分支架构:一个标准行人检测器 (SPD) 分支和一个 Mask-Guided Attention (MGA) 分支。
- MGA 从 RoI Align 特征生成逐像素的空间注意力图,通过通道重加权调制全身特征。
- MGA 掩码由一个小型 CNN 产生,输出逐像素概率图,与 RoI 特征按通道相乘。
- 用联合损失 L = L0 + alpha Lmask + beta Locc 进行训练,以共同优化检测和遮挡感知监督。
- Lmask 使用粗级(弱)逐像素监督,通过可见区域的边界框与二元交叉熵实现。
- Locc 通过对 RCNN 分类损失按遮挡水平加权,以强调困难样本。
实验结果
研究问题
- RQ1掩码引导的空间注意分支是否能在常规检测器中改善对遮挡行人的检测?
- RQ2粗级可见区域标注是否足以监督逐像素注意掩码,而无需密集的逐像素标注?
- RQ3在不同遮挡水平下,引入遮挡敏感损失项对检测的影响如何?
主要发现
| 方法 | R | HO |
|---|---|---|
| 基线 SPD (L0) | 13.8 | 57.0 |
| 我们的 MGAN (L0 + Lmask) | 11.9 | 52.7 |
| 我们的 MGAN (L0 + Locc) | 13.2 | 55.6 |
| 我们的最终 MGAN (L0 + Lmask + Locc) | 11.5 | 51.7 |
- MGAN 相较于 Faster R-CNN 基线,在 CityPersons 高遮挡集的漏检率对数平均值从 57.0 降至 51.7(绝对提升 5.3%)。
- 仅使用 MGA 即可将 HO 的漏检率降至 52.7,结合 Lmask 与 Locc 在 HO 上为 51.7,在 R 集为 11.5(单位未变,原文为 11.5),对比为 11.5/51.7。
- 粗级分割标注在 MGA 监督方面的表现与密集像素级标注相近,提供一种具有成本效益的替代方案。
- MGAN 在 CityPersons 验证集和 Caltech 数据集的多种遮挡设置下,超过了若干最先进的遮挡聚焦方法。
- 在 CityPersons 测试集上,MGAN 取得了最先进的结果,R=9.29,HO=40.97( MR 越小越好)。
- MGAN 在重度遮挡下,对小、中、大尺度行人均表现出强劲表现。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。