QUICK REVIEW

[논문 리뷰] YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

Debapriya Maji, Soyeb Nagori|arXiv (Cornell University)|2022. 04. 14.

Human Pose and Action Recognition인용 수 39

한 줄 요약

YOLO-pose는 히트맵 없이 엔드-투-엔드로 학습 가능한 접근법으로, 단일 순전파에서 다수의 사람과 그들의 2D 포즈를 감지하며 포즈 평가를 OKS 손실로 최적화합니다. 테스트 시 증강 없이 COCO 검증/테스트-데브에서 최첨단 AP50을 달성합니다.

ABSTRACT

We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox

연구 동기 및 목표

히트맵 기반의 2단계 포즈 추정에 대한 히트맵 없는 엔드투엔드 학습 가능한 대안을 제시한다.
단일 순전파에서 다수 인원의 바운딩 박스 탐지와 2D 포즈 추정을 결합한다.
대리 손실이 아닌 직접적으로 객체 키포인트 유사도(OKS)를 최적화한다.
하향식(bottom-up) 방식에서 필요한 후처리 그룹화를 제거하고 다중 패스 추론을 피한다.

제안 방법

YOLO 프레임워크를 사람 탐지와 포즈 추정의 결합 기반으로 사용한다.
평가 지표를 직접 최적화하기 위해 객체 키포인트 유사도(OKS) 손실을 채택한다.
감지된 각 사람마다 단일 순전파에서 연관된 2D 포즈가 있는 바운딩 박스를 출력한다.
히트맵, 후처리 그룹화, 테스트 시 증강을 피하여 경쟁력 있는 정확도를 달성한다.
뒤집기 테스트나 다중 스케일 테스트 시 증강 없이 엔드투엔드로 학습한다.

실험 결과

연구 질문

RQ1히트맵 없이 엔드투엔드로 학습 가능한 모델이 OKS를 최적화 대상으로 사용하여 사람을 공동으로 탐지하고 포즈를 추정할 수 있는가?
RQ2포즈 추정을 YOLO 기반 탐지기에 통합하면 후처리 및 다중 순전파를 피함으로써 효율성이 향상되는가?
RQ3테스트 시 증강 없이 COCO에서 검증 및 테스트-데브의 AP50 측면에서 YOLO-Pose의 성능은 어떠한가?
RQ4单일 순전파로 하향식(bottom-up) 방법을 능가하는 포즈 추정이 가능한가?

주요 결과

테스트 시 증강 없이 COCO 검증(90.2% AP50) 및 테스트-데브(90.3% AP50)에서 최첨단 결과를 달성한다.
플립 테스트, 다중 스케일 테스트 또는 기타 테스트-타임 증강 없이 단일 순전파로 기존 하향식 접근법을 능가한다.
OKS를 직접 최적화하여 엔드투엔드 학습을 제공하고 대리 L1 손실을 피한다.
각 바운딩 박스에 연결된 포즈가 있어 키포인트를 스켈레톤으로 후처리 그룹화할 필요가 없어진다.
일부 상향식(top-down) 접근법에서 필요한 다중 순전파를 피하고 탐지와 포즈 추정을 단일 추론 단계로 통합한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.