QUICK REVIEW

[论文解读] Distribution-Aware Coordinate Representation for Human Pose Estimation

Feng Zhang, Xiatian Zhu|arXiv (Cornell University)|Oct 14, 2019

Human Pose and Action Recognition参考文献 32被引用 43

一句话总结

DARK 引入一种面向分布的坐标表示，用于基于热图的人体姿态估计，改进解码与编码以提升精度，并在模型与数据集（MPII、COCO）之间实现即插即用兼容性。

ABSTRACT

While being the de facto standard coordinate representation in human pose estimation, heatmap is never systematically investigated in the literature, to our best knowledge. This work fills this gap by studying the coordinate representation with a particular focus on the heatmap. Interestingly, we found that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before. In light of the discovered importance, we further probe the design limitations of the standard coordinate decoding method widely used by existing methods, and propose a more principled distribution-aware decoding method. Meanwhile, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating accurate heatmap distributions for unbiased model training. Taking the two together, we formulate a novel Distribution-Aware coordinate Representation of Keypoint (DARK) method. Serving as a model-agnostic plug-in, DARK significantly improves the performance of a variety of state-of-the-art human pose estimation models. Extensive experiments show that DARK yields the best results on two common benchmarks, MPII and COCO, consistently validating the usefulness and effectiveness of our novel coordinate representation idea.

研究动机与目标

突出坐标表示（编码/解码）对姿态估计性能的影响。
提出一种有原理基础的、面向分布的解码方法，基于高斯假设与泰勒展开。
在编码阶段解决量化/热图分布问题，以提供无偏监督。
展示DARK作为一个与模型无关的插件，能够提升在COCO和MPII上的最先进模型性能。

提出的方法

识别热图解码的重要性，并提出基于2D高斯模型的面向分布的解码方法，用于亚像素定位。
在热图最大值周围应用泰勒展开，以一阶与二阶导数估计真实关节中心（μ）。
引入热图分布调制，通过高斯核平滑使之更接近训练时的高斯分布。
通过将高斯核以亚像素的真实坐标居中，提供无偏热图编码，消除量化偏差。
展示DARK作为一个可与现有模型（如HRNet、SimpleBaseline、Hourglass）兼容的插件，无需架构改动。

实验结果

研究问题

RQ1坐标解码（及其传统平移）如何影响跨模型的姿态估计精度？
RQ2面向分布的解码方法是否能在亚像素定位方面超越标准平移？
RQ3基于高斯的热图分布调制在真实预测下是否提升了解码？
RQ4无偏亚像素热图编码是否带来可度量的监督收益？
RQ5DARK是否可泛化为跨不同姿态估计架构的模型无关插件？

主要发现

标准坐标解码并进行平移，在128x96时对HRNet-W32相比不带平移解码可提升高达5.7%的AP，DARK则带来额外增益。
对热图应用分布调制（DM），在128x96的HRNet-W32上使COCO val的AP从68.1提升至68.4。
在128x96的HRNet-W32上，使用DARK解码的无偏热图编码在COCO val上的AP为70.7（相较于有偏的66.9）。
在128x96的HRNet-W32上，DARK的AP及相关指标达到70.7，且在更高输入尺寸（256x192、384x288）相较基线进一步提升性能（如74.4/75.8 对比 74.4/73.7，视设定而定）。
在COCO test-dev上，DARK结合HRNet-W48在384x288达到AP 76.2，超越最佳对手0.7 AP点（76.2 vs. 75.5）。
MPII结果显示，DARK将mean PCKh@0.5提升至90.6，PCKh@0.1提升至42.0，优于HRNet-W32基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。