QUICK REVIEW

[论文解读] Deep High-Resolution Representation Learning for Human Pose Estimation

Ke Sun, Bin Xiao|arXiv (Cornell University)|Feb 25, 2019

Human Pose and Action Recognition参考文献 72被引用 57

一句话总结

本文提出 HRNet，一种在处理过程中保持高分辨率表示并反复融合多尺度特征的高分辨率网络，在 COCO、MPII 和 PoseTrack 数据集上达到最先进的姿态估计效果。

ABSTRACT

This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. The code and models have been publicly available at \url{https://github.com/leoxiaobin/deep-high-resolution-net.pytorch}.

研究动机与目标

激励学习可靠而精确的人体姿态估计高分辨率表示。
设计一个在所有阶段都保持高分辨率表示的网络，而不是从低分辨率特征恢复分辨率。
提出在并行的高到低子网之间进行重复多尺度融合，以丰富高分辨率表示。
在 COCO 和 MPII 上展示关键点热图精度的优势，在 PoseTrack 上实现改进的姿态跟踪。

提出的方法

提出从高分辨率子网络开始并逐步添加并行的高到低子网络的高分辨率网络（HRNet）。
在平行中连接多分辨率子网络，并通过跨阶段与阶段内的交换单元进行重复多尺度融合。
使用对高斯真实热图的均方误差损失，从最终高分辨率表示回归 K 个热图。
在小宽度（W32）和大宽度（W48）下实例化 HRNet，具有四个阶段和八个交换单元。
采用标准数据增强、Adam 优化器以及 ImageNet 预训练骨干网进行训练以提升性能。

实验结果

研究问题

RQ1在整个网络中保持高分辨率表示是否能相比传统的高到低流程提升关键点定位的精度？
RQ2在并行子网络之间进行重复多尺度融合是否能带来更丰富的高分辨率特征和更好的热图？
RQ3相对于最先进方法，HRNet 在 COCO、MPII 与 PoseTrack 基准的性能提升是多少？
RQ4网络宽度和输入分辨率如何影响姿态估计的精度与效率？
RQ5HRNet 对于基于视频的姿态跟踪是否有效，超越单图像姿态估计？

主要发现

HRNet-W32（无预训练）在 COCO 验证集上，输入为 256x192，达到 73.4 AP，优于同等规模且 GFLOPs 更低的 Hourglass。
HRNet-W32（带预训练）在 COCO 验证集上达到 74.4 AP，AP50 90.5，AP75 81.9，AR 79.8，优于未预训练的变体。
HRNet-W48（带预训练）在 COCO 验证集上达到 75.1 AP，AP50 90.6，AP75 82.2，AR 80.4，表明更宽的网络带来更高的准确性。
在 COCO test-dev 上，HRNet-W32 和 HRNet-W48 分别达到 74.9 AP 和 75.5 AP（单模型，自上而下方法）。
在 MPII，HRNet-W32 达到 92.3 PCKh@0.5，超过若干前序方法并达到与现状最优相当。
在 PoseTrack 2017，HRNet-W48 获得 74.9 mAP 和 57.9 MOTA，优于若干基线，显示出强的视频跟踪性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。