QUICK REVIEW

[论文解读] Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

Zigang Geng, Ke Sun|arXiv (Cornell University)|Apr 6, 2021

Human Pose and Action Recognition参考文献 77被引用 34

一句话总结

Introduces DEKR, a bottom-up pose estimation method that uses multi-branch adaptive convolutions to disentangle keypoint representations and directly regress keypoint positions, achieving state-of-the-art results on COCO and CrowdPose.

ABSTRACT

In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions. We present a simple yet effective approach, named disentangled keypoint regression (DEKR). We adopt adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them. We use a multi-branch structure for separate regression: each branch learns a representation with dedicated adaptive convolutions and regresses one keypoint. The resulting disentangled representations are able to attend to the keypoint regions, respectively, and thus the keypoint regression is spatially more accurate. We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods and achieves superior bottom-up pose estimation results on two benchmark datasets, COCO and CrowdPose. The code and models are available at https://github.com/HRNet/DEKR.

研究动机与目标

通过聚焦关键点区域的表示来提升自下而上姿态估计的效果，而不是依赖热力图检测与分组。
提出一个解耦回归框架（DEKR），具有自适应卷积和多分支结构，用于分别回归每个关键点。
证明在 COCO 和 CrowdPose 上，使用 DEKR 的直接关键点回归优于传统的关键点检测与分组方法。

提出的方法

使用受像素级空间变换器启发的自适应卷积来激活关键点区域像素。
采用多分支结构，每个分支学习专门用于一个关键点的表示并回归其二维偏移。
使用联合回归损失以及关键点和中心的热力图损失，以及基于加权的 OKS 评估进行训练。
推理阶段进行基于中心的和基于姿态的非极大值抑制，并有一个姿态评分网络对候选进行排序。

实验结果

研究问题

RQ1解耦的逐关键点表示是否能够提升自下而上的直接关键点回归质量？
RQ2将自适应激活与独立回归分支结合，是否比单分支回归或分组方法获得更高的定位精度？
RQ3在标准基准 COCO 与 CrowdPose 上，与最先进的自下而上方法相比，DEKR 的性能提升如何？

主要发现

方法	输入尺寸	AP	AP50	AP75	APM	APL	AR	AR-M	AR-L
DEKR (D-32 ss)	512	68.0	86.7	74.5	62.1	77.7	73.0	66.2	82.7
DEKR (D-48 ss)	640	71.0	88.3	77.4	66.7	78.5	76.0	70.6	84.0
DEKR (D-32 ms)	512	71.0	87.7	77.1	65.2	77.8	75.9	70.5	83.6
DEKR (D-48 ms)	640	71.0	89.2	78.0	67.1	76.9	76.7	71.5	83.9
DEKR (CrowdPose val, D-32 ss)	512	65.5	86.2	?	64.1	75.5	75.4	69.7	83.0
DEKR (CrowdPose val, D-48 ss)	640	67.0	88.0	?	66.6	75.8	76.9	71.5	83.9

DEKR 在 COCO 与 CrowdPose 上达到自下而上的最先进结果。
单分支回归被多分支、解耦方式所超越，后者聚焦于单个关键点区域。
自适应激活和独立回归共同带来显著的 AP 提升（例如，在 COCO 验证集使用 HRNet-W32 时达到 68.0 AP；使用 HRNet-W48 时达到 71.0 AP）。
在 COCO 验证集，DEKR 结合 HRNet-W32 达到 68.0 AP；结合 HRNet-W48 达到 71.0 AP；多尺度测试提升至 71.0 AP（W32）和 72.8 AP（W48）。
在 COCO test-dev，DEKR 使用 HRNet-W32 达到 67.3 AP；使用 HRNet-W48 达到 70.0 AP，多尺度分别达到 69.8 和 71.0。
在 CrowdPose，DEKR 在验证集（D-32 ss）达到 65.5 AP，(D-48 ss) 67.0 AP；多尺度提高至 67.5 和 68.3。
消融研究显示自适应激活在 COCO 验证集与 HRNet-W32 上贡献约 3.5 AP，独立回归贡献约 2.6 AP。
与分组或事后匹配策略（如 CenterNet 的吸收式方案）相比，DEKR 在单尺度下提供稳定的提升，无需基于热力图的后处理来匹配关键点。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。