QUICK REVIEW

[论文解读] OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Zhe Cao, Gines Hidalgo|arXiv (Cornell University)|Dec 18, 2018

Human Pose and Action Recognition参考文献 69被引用 673

一句话总结

OpenPose 提出了一种实时的自下而上、基于 Part Affinity Fields (PAFs) 的多人2D姿态估计方法，用于关联身体部位，并发布了一个用于身体、脚、手和面部关键点的开源库。

ABSTRACT

Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.

研究动机与目标

通过精确的多人2D姿态估计，促进对图像和视频中人员的实时理解。
应对未知人数、遮挡以及随人数增加而带来的运行时扩展等挑战。
引入 Part Affinity Fields (PAFs) 作为自下而上的表示，用以耦合检测与关联。
将 OpenPose 作为一个处理身体、脚、手和面部关键点的开源系统发布。

提出的方法

一个 CNN 预测身体部位的置信图和编码四肢方向的 PAFs。
多阶段网络结合中间监督以增强学习；PAF 的细化至关重要，而身体部位的细化则不那么关键。
用三个 3x3 卷积层替代 7x7 卷积，拼接以保持感受野并提高速度（类似 DenseNet 的连接）。
在 PAF 上使用贪心解析，结合线积分肢体评分和二分匹配来组装多个人的身体姿态。
扩展到脚部关键点，使用公开的脚部数据集，并演示在不牺牲速度或准确性的前提下实现身体+脚部的联合检测。

实验结果

研究问题

RQ1Part Affinity Fields 是否能够在不对人员检测器有 heavy dependence 的情况下实现准确、实时的自下而上多人姿态解析？
RQ2在多人解析中，细化 PAFs 与细化身体部位信心图对准确性和速度有何影响？
RQ3结合身体和脚部关键点检测是否提高姿态估计的性能和效率？
RQ4在标准基准测试中，OpenPose 在运行时和准确性方面如何与现有方法（如 Mask R-CNN、Alpha-Pose）相比？

主要发现

在各基准测试中实现实时的多人2D姿态估计，具有竞争力的准确性。
PAF 的细化对准确性至关重要，而细化身体部位信心图的效果较少。
网络深度增加带来更快、更准确的结果（在各节中报告的 ~200% 速度提升和 ~7% 的准确性提升）。
引入一个带注释的脚部数据集，并显示身体+脚部关键点检测器在降低推理时间的同时保持身体部分的准确性。
OpenPose 是首个用于身体、脚、手和面部关键点（多达 135 个关键点）的开源实时系统，并在 GTX 1080 Ti 上以大约 22 FPS 运行。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。