[Paper Review] Deep Learning-Based Human Pose Estimation: A Survey
This survey reviews recent deep learning approaches for 2D and 3D human pose estimation, summarizing methods, datasets, metrics, applications, and future directions. It covers over 260 papers, compares performances, and discusses challenges like occlusion and data scarcity.
Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide range of applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently developed deep learning-based solutions have achieved high performance in human pose estimation, there still remain challenges due to insufficient training data, depth ambiguities, and occlusion. The goal of this survey paper is to provide a comprehensive review of recent deep learning-based solutions for both 2D and 3D pose estimation via a systematic analysis and comparison of these solutions based on their input data and inference procedures. More than 250 research papers since 2014 are covered in this survey. Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included. Quantitative performance comparisons of the reviewed methods on popular datasets are summarized and discussed. Finally, the challenges involved, applications, and future research directions are concluded. A regularly updated project page is provided: \url{https://github.com/zczcwh/DL-HPE}
Motivation & Objective
- Provide a systematic review of recent deep learning-based 2D and 3D HPE methods.
- Categorize methods by 2D vs 3D, single-view vs multi-view, and input sources.
- Summarize datasets and evaluation metrics used in 2D/3D HPE.
- Compare state-of-the-art approaches and discuss their strengths and limitations.
- Highlight applications and outline future research directions.
Proposed method
- Classify HPE methods into 2D and 3D, then further into single-person vs multi-person (for 2D) and monocular vs sensor-based inputs (for 3D).
- Contrast regression-based and heatmap-based approaches for 2D single-person pose estimation.
- Describe top-down and bottom-up pipelines for 2D multi-person pose estimation.
- Summarize 3D HPE from monocular RGB (single-view and multi-view, skeleton-only vs mesh recovery) and from other sensors.
- Provide dataset and evaluation metric summaries and perform qualitative/quantitative method comparisons.
- Discuss applications and future directions in HPE.
Experimental results
Research questions
- RQ1What are the main deep learning approaches for 2D and 3D human pose estimation and how are they organized?
- RQ2How do 2D HPE methods compare across single-person vs multi-person, and top-down vs bottom-up frameworks?
- RQ3What are the data sources, datasets, and evaluation metrics used for 2D and 3D HPE, and how do methods perform on them?
- RQ4What challenges (e.g., occlusion, data scarcity, depth ambiguity) limit current HPE methods and what directions may address them?
- RQ5What are the prominent applications of DL-based HPE and what future research directions are identified?
Key findings
- Deep learning dramatically improved 2D HPE over classical methods, with approaches like heatmaps and regression shaping the field.
- HRNet and its variants, along with transformer-based models, have become widely adopted for accurate keypoint estimation.
- Occlusion, truncation, and computational efficiency remain central challenges in multi-person 2D HPE.
- 3D HPE from monocular RGB is ill-posed and data-hungry, with generalization across datasets being a notable issue; multi-view and sensor fusion can mitigate depth ambiguities.
- A breadth of datasets and metrics exist for evaluating 2D/3D HPE, enabling extensive comparative analyses of methods.
- The survey covers applications in AR/VR, surveillance, healthcare, and more, and provides directions for future research.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.