QUICK REVIEW

[论文解读] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

Bowen Cheng, Bin Xiao|arXiv (Cornell University)|Aug 27, 2019

Human Pose and Action Recognition参考文献 42被引用 75

一句话总结

HigherHRNet 引入一个带尺度感知的高分辨率特征金字塔，结合多分辨率监督和热图聚合，提升自下而上的多人姿态估计，在 COCO test-dev 上达到 state-of-the-art 结果，在 CrowdPose 上具有强劲表现。

ABSTRACT

Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene. The code and models are available at https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation.

研究动机与目标

解决自下而上的多人姿态估计中的尺度变异，尤其是对小体型人员。
开发一个在各尺度上保留空间细节的高分辨率特征金字塔。
使用多分辨率监督进行训练，并在推理时执行多分辨率热图聚合。
在 COCO 上展示关键点定位精度的提升，以及在拥挤场景（CrowdPose）中的鲁棒性。

提出的方法

在 HRNet 的基础上创建一个从 1/4 分辨率开始的高分辨率特征金字塔，并通过反卷积上采样生成更高分辨率的热图。
通过将地面实况关键点转换为金字塔各分辨率并在每个分辨率使用高斯热图来实现多分辨率监督。
在多分辨率下预测热图并在推理时聚合它们以形成尺度感知的热图。
使用关联嵌入来进行关键点分组以形成个体实例。
可选地在反卷积模块中添加残差块以 refinement 特征和热图。

实验结果

研究问题

RQ1尺度感知的高分辨率特征金字塔是否能改善自下而上的姿态估计中对小体型人员的关键点定位？
RQ2多分辨率监督和热图聚合是否能在无需后处理 refined 的情况下带来性能提升？
RQ3与现有的自下而上和自上而下方法相比，HigherHRNet 在 COCO 和 CrowdPose 上的表现如何？

主要发现

Method	Feat. stride/resolution	AP	AP^M	AP^L
HigherHRNet (Ours)	2/256	66.9	61.0	75.7
HigherHRNet (Ours)	1/512	66.5	61.1	74.9

HigherHRNet 相较于 HRNet 基线在 COCO2017 test-dev 上实现 66.4 AP 的提升，并在多尺度测试下达到 70.5 AP，超越以往的自下而上方法。
对于中等尺寸的人员，HigherHRNet 相较于大型人员显示出更大的提升（AP^M 提升），表明对尺度变异有更好的处理。
在 COCO2017 test-dev 上，使用多尺度测试的 HigherHRNet-W48 达到 70.5 AP，超过所有现有无 refinement 的自下而上方法。
在 CrowdPose 测试集上，HigherHRNet-W48 实现 67.6 AP，超过自上而下方法和以往的自下而上方法，显示在拥挤场景中的鲁棒性。
消融研究表明：反卷积、特征拼接、热图聚合以及提升 Backbone 能力均对 AP 有贡献，其中一个反卷积模块通常能带来最佳的 COCO 性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。