QUICK REVIEW

[논문 리뷰] Simple and Lightweight Human Pose Estimation

Zhe Zhang, Jie Tang|arXiv (Cornell University)|2019. 11. 23.

Human Pose and Action Recognition참고 문헌 41인용 수 37

한 줄 요약

개선된 병목 블록, 반복 학습 및 Beta-Soft-Argmax 후처리로 더 작고 빠른 CPU 추론을 갖춘 경쟁력 있는 COCO 포즈 결과를 달성하는 경량 포즈 네트워크(LPN)를 제시한다.

ABSTRACT

Recent research on human pose estimation has achieved significant improvement. However, most existing methods tend to pursue higher scores using complex architecture or computationally expensive models on benchmark datasets, ignoring the deployment costs in practice. In this paper, we investigate the problem of simple and lightweight human pose estimation. We first redesign a lightweight bottleneck block with two non-novel concepts: depthwise convolution and attention mechanism. And then, based on the lightweight block, we present a Lightweight Pose Network (LPN) following the architecture design principles of SimpleBaseline. The model size (#Params) of our small network LPN-50 is only 9% of SimpleBaseline(ResNet50), and the computational complexity (FLOPs) is only 11%. To give full play to the potential of our LPN and get more accurate predicted results, we also propose an iterative training strategy and a model-agnostic post-processing function Beta-Soft-Argmax. We empirically demonstrate the effectiveness and efficiency of our methods on the benchmark dataset: the COCO keypoint detection dataset. Besides, we show the speed superiority of our lightweight network at inference time on a non-GPU platform. Specifically, our LPN-50 can achieve 68.7 in AP score on the COCO test-dev set, with only 2.7M parameters and 1.0 GFLOPs, while the inference speed is 17 FPS on an Intel i7-8700K CPU machine.

연구 동기 및 목표

자원 제약이 있는 배치에 적합한 단순하고 경량의 HPE(Human Pose Estimation) 모델 필요성을 제시한다.
매개변수와 FLOPs를 줄이면서 정확도를 유지하는 경량 병목 블록과 전체 아키텍처(LPN)를 소개한다.
무거운 사전학습이나 복잡한 파이프라인 없이 성능을 극대화하기 위한 학습 및 후처리 전략을 제안한다.
COCO 데이터셋에서 LPN의 효율성과 정확성을 CPU 추론 성능을 포함해 시연한다.

제안 방법

깊이wise 컨볼루션과 Global Context(GC) 어텐션 블록을 사용하여 경량 병목 블록을 재설계한다.
SimpleBaseline 스타일 백본의 표준 병목을 교체하고 업샘플링을 단순화하여 LPN을 구축한다.
작은 네트워크를 더 잘 최적화하기 위해 학습률 재설정으로 학습을 재시작하는 반복 학습 전략을 도입한다.
히트맵에서 연속적이고 더 정확한 키포인트 좌표를 얻기 위한 모델에 구애받지 않는 후처리로 Beta-Soft-Argmax를 제안한다.
COCO에서 기저 아키텍처와 비교하여 매개변수, FLOPs, AP 지표, CPU 추론 속도를 평가한다.

실험 결과

연구 질문

RQ1깊이wise 컨볼루션과 GC 어텐션을 갖는 경량 병목 블록이 모델 크기와 연산을 대폭 줄이면서 포즈 추정 성능을 유지할 수 있는가?
RQ2반복 학습 전략이 작은 네트워크의 성능을 대형 데이터셋에 대한 전통적 사전학습보다 향상시키는가?
RQ3Beta-Soft-Argmax가 학습 절차를 바꾸지 않고도 서로 다른 백본에서 키포인트 위치 정확도를 향상시킬 수 있는가?
RQ4COCO에서 정확도와 CPU 추론 속도 측면에서 LPN이 최첨단 방법들과 어떻게 비교되는가?

주요 결과

Method	Backbone	Input size	#Params	FLOPs	AP	AP50	AP75	APm	APL	AR
LPN (Ours)	ResNet-50	256×192	2.9 M	1.0 G	69.1	88.1	76.6	65.9	75.7	74.9
LPN (Ours)	ResNet-101	256×192	5.3 M	1.4 G	70.4	88.6	78.1	67.2	77.2	76.2
LPN (Ours)	ResNet-152	256×192	7.4 M	1.8 G	71.0	89.2	78.6	67.8	77.7	76.8
SimpleBaseline	ResNet-50	256×192	34.0 M	8.9 G	70.4	88.1	77.6	66.8	75.8	75.6
SimpleBaseline	ResNet-101	256×192	53.0 M	12.4 G	71.4	89.3	79.3	68.1	78.1	77.1

LPN-50은 검증/테스트 설정에서 2.7–2.9M 매개변수와 약 1.0G FLOPs, CPU에서 17 FPS로 68.7–69.1 AP를 달성한다.
SimpleBaseline-50과 비교하여 LPN-50은 매개변수의 9%, FLOPs의 11%를 사용하며 약 1.3 AP 차이만 있다.
GC 블록 추가로 작은 네트워크에서 유의미한 이득이 나타난다(예: LPN-50에서 최대 +2.5 AP).
반복 학습 전략은 일관되게 AP를 개선하며, LPN-50에서 단계별 누적 이익이 최대 약 2.0 AP에 이른다.
Beta-Soft-Argmax는 모델에 구애받지 않는 개선을 제공하며(약 0.3 AP까지), 베타가 약 160일 때 백본에 걸쳐 여전히 효과적이다.
Beta-Soft-Argmax는 다수의 아키텍처에서 일반 Argmax보다 성능이 뛰어나며, 백본의 복잡도가 커질수록 이득이 증가한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.