Skip to main content
QUICK REVIEW

[论文解读] Beta R-CNN: Looking into Pedestrian Detection from Another Perspective

Zixuan Xu, Banghuai Li|arXiv (Cornell University)|Oct 23, 2022
Advanced Neural Network Applications参考文献 18被引用 23
一句话总结

本文基于二维Beta分布的Beta Representation来建模行人,并提出Beta R-CNN(BetaHead和BetaMask)以及BetaNMS,在遮挡和密集场景中提升检测。

ABSTRACT

Recently significant progress has been made in pedestrian detection, but it remains challenging to achieve high performance in occluded and crowded scenes. It could be attributed mostly to the widely used representation of pedestrians, i.e., 2D axis-aligned bounding box, which just describes the approximate location and size of the object. Bounding box models the object as a uniform distribution within the boundary, making pedestrians indistinguishable in occluded and crowded scenes due to much noise. To eliminate the problem, we propose a novel representation based on 2D beta distribution, named Beta Representation. It pictures a pedestrian by explicitly constructing the relationship between full-body and visible boxes, and emphasizes the center of visual mass by assigning different probability values to pixels. As a result, Beta Representation is much better for distinguishing highly-overlapped instances in crowded scenes with a new NMS strategy named BetaNMS. What's more, to fully exploit Beta Representation, a novel pipeline Beta R-CNN equipped with BetaHead and BetaMask is proposed, leading to high detection performance in occluded and crowded scenes.

研究动机与目标

  • 在遮挡和拥挤场景中超越传统二维边界框,推动改进的行人检测。
  • 提出一个Beta Representation,将全身信息与可见模式整合为一个单一的概率模型。
  • 开发一个带有 BetaHead 和 BetaMask 的检测器(Beta R-CNN),以利用 Beta Representation 实现更好的定位与判别。
  • 引入基于Beta分布的NMS(BetaNMS),使用KL散度来区分高度重叠的实例。

提出的方法

  • 将二维Beta分布定义为Beta Representation,参数化为由全身框与可见框导出的八个值 [l,t,r,b,alpha_x,beta_x,alpha_y,beta_y]。
  • 计算沿x和y的均值和方差,以获得回归目标 [l,t,r,b, mu_x, mu_y, sigma_x, sigma_y],并按文中所述进行归一化。
  • 引入 BetaHead,通过四个边界参数和四个形状参数,用 SmoothL1 损失回归这八个 Beta 参数。
  • 引入 BetaMask,利用从预测的二维Beta分布采样的掩码调制 RoI 特征以突出可见区域,使用相对于真实 Beta 掩码的KL散度损失进行训练。
  • 采用KL散度作为Beta-based NMS(BetaNMS)的距离度量,使用Beta分布之间对称化的KL散度来抑制高度重叠的情况,使其比基于IoU的NMS更有效。

实验结果

研究问题

  • RQ1基于Beta分布的表示能否比传统边界框更好地区分遮挡和高度重叠的行人?
  • RQ2在行人被遮挡或处于拥挤场景时,BetaHead/BetaMask 是否提升定位与识别?
  • RQ3在拥挤的行人场景中,BetaNMS 是否比基于 IoU 的 NMS 更有效?

主要发现

  • Beta Representation 更关注可视质量中心,对可见性变化的处理比均匀框更好,有助于在遮挡和拥挤场景中的判别。
  • BetaHead 和 BetaMask 在拥挤数据集上提升 MR^-2 和 AP,展示在遮挡和拥挤驱动场景中的收益。
  • 使用KL散度的BetaNMS在高度重叠实例中优于基于IoU的NMS及其他替代方法。
  • 结合所提出组件的 Beta R-CNN 在 CrowdHuman 和 CityPersons 基准测试上达到最先进或具有竞争力的结果,特别是在重遮挡/子集场景中。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。