Skip to main content
QUICK REVIEW

[论文解读] Cascaded Pyramid Network for Multi-Person Pose Estimation

Yilun Chen, Zhicheng Wang|arXiv (Cornell University)|Nov 20, 2017
Human Pose and Action Recognition参考文献 7被引用 127
一句话总结

本文提出 Cascaded Pyramid Network (CPN),结合 GlobalNet 与 RefineNet,以解决多人与姿态估计中的困难关键点,在 COCO minival 上达到 69.4 AP,在 COCO test-dev 上达到 72.1 AP(当时的最新技术水平)。

ABSTRACT

The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.Code (https://github.com/chenyilun95/tf-cpn.git) and the detection results are publicly available for further research.

研究动机与目标

  • 促使改进多人与姿态估计中困难关键点(遮挡/不可见)的定位。
  • 提出一种级联架构,整合金字塔特征以获得稳健的关键点热力图。
  • 引入在线 hard keypoints mining,使学习聚焦于困难关节。
  • 评估检测器选择、数据预处理和输入裁剪策略对性能的影响。

提出的方法

  • 引入 Cascaded Pyramid Network (CPN),由 GlobalNet 和 RefineNet 组成。
  • GlobalNet 使用特征金字塔结构在具有高空间分辨率和丰富上下文的信息下定位易分辨的关键点。
  • RefineNet 连接金字塔特征并应用 online hard keypoints mining,以聚焦于困难关键点。
  • 以 GlobalNet 的 L2 损失和 RefineNet 的 online hard keypoints mining 损失进行训练。
  • 采用自上而下的流水线:基于检测器的人体框,随后对每个框进行 CPN 的关键点定位。

实验结果

研究问题

  • RQ1级联金字塔方法是否能改善多人与姿态估计中遮挡或不可见关键点的定位?
  • RQ2在 RefineNet 中整合多层金字塔特征是否能在不过度增加计算量的前提下提升困难关键点的精度?
  • RQ3online hard keypoints mining 对姿态估计精度的影响如何?
  • RQ4检测器质量和数据预处理如何影响 COCO 的最终关键点 AP?

主要发现

  • CPN 采用 GlobalNet 和 RefineNet,在 ResNet-50 主干上在 COCO minival 上达到 69.4 AP(OKS)。
  • RefineNet 结合 online hard keypoints mining,相较于基线 GlobalNet 提供了约 0.8 AP 的提升。
  • 输入裁剪大小和多层特征融合对性能影响显著,增大裁剪并使用多个金字塔层可获得更高的 AP。
  • 在 COCO test-dev 上,单一 CPN 模型达到 72.1 AP,且集成模型 (CPN+) 在不使用额外数据(仅限 COCO)的情况下达到 73.0 AP。
  • 该方法在当时为 COCO 多人关键点设置了最先进的结果,相较 2016 年获胜者具有显著的相对提升。
  • Soft-NMS 与更先进的检测器变体在消融研究中进一步提升关键点检测性能。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。