QUICK REVIEW

[论文解读] YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

Debapriya Maji, Soyeb Nagori|arXiv (Cornell University)|Apr 14, 2022

Human Pose and Action Recognition被引用 39

一句话总结

YOLO-pose 是一种无热图、端到端可训练的方法，在单次前向推理中检测多个人及其二维姿态，使用 OKS 损失来优化姿态评估。它在 COCO 验证集/测试开发集上实现了最先进的 AP50，且不需要测试时增强。

ABSTRACT

We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox

研究动机与目标

为基于热图的两阶段姿态估计提供一个无热图、端到端可训练的替代方案。
在一个前向传递中结合边界框检测和二维姿态估计，用于多个人。
直接针对对象关键点相似度（OKS）进行优化，而不是代理损失。
消除底部向上方法所需的后处理分组，避免多次推理。

提出的方法

使用 YOLO 框架作为联合人员检测和姿态估计的基础。
采用对象关键点相似度（OKS）损失，直接优化评估指标。
在单次前向传递中输出一个带有相关二维姿态的边界框，针对每个检测到的人。
避免热图、后处理分组和测试时增强以实现具有竞争力的精度。
端到端训练，不需要翻转测试或多尺度测试时增强。

实验结果

研究问题

RQ1一个无热图、端到端可训练的模型是否能够联合检测人员并使用 OKS 作为优化目标来估计其姿态？
RQ2将姿态估计整合到基于 YOLO 的检测器中，是否通过避免后处理和多次前向推理来提高效率？
RQ3在没有测试时的增强的情况下，YOLO-Pose 在 COCO 上的 AP50 验证和测试开发的表现如何？
RQ4是否有可能在姿态估计任务中通过单次前向传递超越底部向上方法？

主要发现

在 COCO 验证集（90.2% AP50）和测试开发集（90.3% AP50）上实现了最先进的结果，且无需测试时增强。
在没有翻转测试、多尺度测试或其他测试时增强的情况下，单次前向传播即可超过现有自底向上方法。
通过直接优化 OKS 实现端到端训练，避免了代理的 L1 损失。
无需后处理将关键点聚合成骨架，因为每个边界框都具有相关姿态。
避免了某些顶下方法所需的多次前向推理，将检测和姿态估计合并为单次推理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。