[论文解读] YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss
YOLO-pose 是一种无热图、端到端可训练的方法,在单次前向推理中检测多个人及其二维姿态,使用 OKS 损失来优化姿态评估。它在 COCO 验证集/测试开发集上实现了最先进的 AP50,且不需要测试时增强。
We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox
研究动机与目标
- 为基于热图的两阶段姿态估计提供一个无热图、端到端可训练的替代方案。
- 在一个前向传递中结合边界框检测和二维姿态估计,用于多个人。
- 直接针对对象关键点相似度(OKS)进行优化,而不是代理损失。
- 消除底部向上方法所需的后处理分组,避免多次推理。
提出的方法
- 使用 YOLO 框架作为联合人员检测和姿态估计的基础。
- 采用对象关键点相似度(OKS)损失,直接优化评估指标。
- 在单次前向传递中输出一个带有相关二维姿态的边界框,针对每个检测到的人。
- 避免热图、后处理分组和测试时增强以实现具有竞争力的精度。
- 端到端训练,不需要翻转测试或多尺度测试时增强。
实验结果
研究问题
- RQ1一个无热图、端到端可训练的模型是否能够联合检测人员并使用 OKS 作为优化目标来估计其姿态?
- RQ2将姿态估计整合到基于 YOLO 的检测器中,是否通过避免后处理和多次前向推理来提高效率?
- RQ3在没有测试时的增强的情况下,YOLO-Pose 在 COCO 上的 AP50 验证和测试开发的表现如何?
- RQ4是否有可能在姿态估计任务中通过单次前向传递超越底部向上方法?
主要发现
- 在 COCO 验证集(90.2% AP50)和测试开发集(90.3% AP50)上实现了最先进的结果,且无需测试时增强。
- 在没有翻转测试、多尺度测试或其他测试时增强的情况下,单次前向传播即可超过现有自底向上方法。
- 通过直接优化 OKS 实现端到端训练,避免了代理的 L1 损失。
- 无需后处理将关键点聚合成骨架,因为每个边界框都具有相关姿态。
- 避免了某些顶下方法所需的多次前向推理,将检测和姿态估计合并为单次推理。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。