QUICK REVIEW

[论文解读] Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation

Zhengxiong Luo, Zhicheng Wang|arXiv (Cornell University)|Dec 30, 2020

Human Pose and Action Recognition参考文献 35被引用 23

一句话总结

本文提出Scale-Adaptive Heatmap Regression (SAHR) 和 Weight-Adaptive Heatmap Regression (WAHR)，通过根据尺度和不确定性动态调整关键点监督的高斯核标准差，提升自下而上的人体姿态估计性能。SAHR 通过尺度图头学习每个关键点的标准差，而 WAHR 通过重新加权损失来缓解前景-背景类别不平衡问题。该方法在 COCO test-dev2017 上达到 72.0 AP，相比最先进方法提升 +1.5 AP。

ABSTRACT

Heatmap regression has become the most prevalent choice for nowadays human pose estimation methods. The ground-truth heatmaps are usually constructed via covering all skeletal keypoints by 2D gaussian kernels. The standard deviations of these kernels are fixed. However, for bottom-up methods, which need to handle a large variance of human scales and labeling ambiguities, the current practice seems unreasonable. To better cope with these problems, we propose the scale-adaptive heatmap regression (SAHR) method, which can adaptively adjust the standard deviation for each keypoint. In this way, SAHR is more tolerant of various human scales and labeling ambiguities. However, SAHR may aggravate the imbalance between fore-background samples, which potentially hurts the improvement of SAHR. Thus, we further introduce the weight-adaptive heatmap regression (WAHR) to help balance the fore-background samples. Extensive experiments show that SAHR together with WAHR largely improves the accuracy of bottom-up human pose estimation. As a result, we finally outperform the state-of-the-art model by +1.5AP and achieve 72.0AP on COCO test-dev2017, which is com-arable with the performances of most top-down methods. Source codes are available at https://github.com/greatlog/SWAHR-HumanPose.

研究动机与目标

为解决自下而上人体姿态估计中固定标准差热图回归的局限性，特别是尺度变化和标注模糊性问题。
提出一种方法，根据人体尺度和不确定性自适应调整每个关键点的高斯核标准差，提升模型鲁棒性。
缓解自适应回归引入的前景-背景样本不平衡问题，该问题可能阻碍模型收敛与性能提升。
在不依赖人体检测或多尺度测试的前提下，实现自下而上姿态估计的最先进性能。

提出的方法

引入一个尺度图头，用于预测每个关键点的标准差乘数，从而实现热图监督中高斯核扩散的自适应调整。
通过将基础标准差 σ₀ 与预测的尺度图 s 相乘，修改真实热图的构建方式，得到每个关键点的 σ = σ₀ · s。
引入每个关键点可学习的、空间变化的标准差，实现对语义区分区域和标注不确定性的更好建模。
提出一种受焦点损失启发的可学习、空间可变损失加权机制，降低易样本（背景）的影响，集中训练于难样本（前景）。
将 SAHR 与 WAHR 统一为联合训练目标，在多人、尺度多变场景中提升泛化能力与精度。
采用标准主干网络（如 HrHRNet-W48），增加尺度图头与权重图头，通过在自适应热图上使用 L2 损失进行端到端训练。

Figure 1: Top row: the noses of different persons are covered by gaussian kernels with the same standard deviation. Bottom row: the standard deviations for keypoints of different persons are adaptively adjusted in SAHR.

实验结果

研究问题

RQ1在存在显著尺度变化的情况下，热图回归中的自适应标准差是否能提升自下而上人体姿态估计的性能？
RQ2通过可变高斯核扩散建模关键点不确定性，对定位精度与鲁棒性有何影响？
RQ3引入自适应标准差是否会加剧热图监督中的前景-背景类别不平衡问题？
RQ4可学习的自适应加权方案是否能缓解该不平衡问题并进一步提升模型性能？
RQ5SAHR 与 WAHR 联合使用在自下而上姿态估计中能达到何种程度的最先进性能，特别是在人群密集场景中？

主要发现

所提出的 SAHR 方法在 COCO test-dev2017 上相较基线 HrHRNet-W48 提升 +1.5 AP，达到 72.0 AP。
在更具挑战性的 CrowdPose 数据集上，该方法在不使用多尺度测试时达到 71.6 AP，在使用多尺度测试时达到 73.8 AP，优于密集场景中的自上而下方法。
消融实验表明，自适应标准差显著提升了大尺度人体的性能，大尺度实例的 AP 从 66.6 提升至 75.1。
可学习的加权损失（WAHR）有效降低了易样本（背景）的影响，提升了难样本的性能，尤其在人群密集场景中表现突出。
该方法在人群密集场景中展现出卓越的泛化能力，而自上而下方法因遮挡和检测错误而失效，而 HrHRNet-W48 + SWAHR 等自下而上方法则实现了最先进结果。

Figure 2: During training, the ground-truth heatmaps are firstly scaled according to predicted scale maps and then are used to supervise the whole model via weight-adaptive loss. During testing, the predicted heatmaps and associative embeddings are used for grouping of individual persons.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。