QUICK REVIEW

[论文解读] Survey on Hand Gesture Recognition from Visual Input

Manousos Linardakis, Iraklis Varlamis|ArXiv.org|Jan 21, 2025

Hand Gesture Recognition Systems被引用 4

一句话总结

本论文回顾基于视觉输入的手势识别最新研究进展（2018–2024），涵盖RGB、深度和视频数据、数据集、方法及现实世界挑战。

ABSTRACT

Hand gesture recognition has become an important research area, driven by the growing demand for human-computer interaction in fields such as sign language recognition, virtual and augmented reality, and robotics. Despite the rapid growth of the field, there are few surveys that comprehensively cover recent research developments, available solutions, and benchmark datasets. This survey addresses this gap by examining the latest advancements in hand gesture and 3D hand pose recognition from various types of camera input data including RGB images, depth images, and videos from monocular or multiview cameras, examining the differing methodological requirements of each approach. Furthermore, an overview of widely used datasets is provided, detailing their main characteristics and application domains. Finally, open challenges such as achieving robust recognition in real-world environments, handling occlusions, ensuring generalization across diverse users, and addressing computational efficiency for real-time applications are highlighted to guide future research directions. By synthesizing the objectives, methodologies, and applications of recent studies, this survey offers valuable insights into current trends, challenges, and opportunities for future research in human hand gesture recognition.

研究动机与目标

按输入数据类型、捕获设置和识别任务来组织手势识别研究。
回顾数据集与应用，突出其特征与局限性。
识别当前趋势、挑战与未来在视觉输入手势识别中的研究机会。
区分分类与估计任务，以及单目与多视角设置。

提出的方法

基于输入数据类型（RGB、RGB-D、视频）和相机设置（单目 vs 多视角）对方法进行分类。
区分手势的分类与估计，并指出结合两者的混合方法。
回顾手部捕捉表示（骨架 vs 框/滤波）及其相应优点。
考察识别技术（神经网络、非常规方法与混合方法）及其流行程度。
使用主题建模（NNMF）识别主要研究主题，如实时识别与多模态融合。

实验结果

研究问题

RQ1用于基于视觉输入的手势识别的主导输入数据类型和捕获设置是什么？
RQ2当前方法中手势是如何表示、分类或估计的？
RQ3普遍的识别技术及其在近年工作的性能趋势如何？
RQ4哪些数据集和应用驱动了最近的HGR研究，现实世界部署还存在哪些挑战？
RQ5最近的手势识别研究中，哪些主题成为核心并指向未来方向？

主要发现

基于视频的方法在研究中占比最大（大约一半）。
单目摄像机由于其实用性而占主导，多视角设置较少。
混合神经网络方法（如CNN与LSTM或 transformers）普及且有效。
基于骨架的表示具有精确的关节描述但计算量较高；基于框/滤波的方法更简单，在分类任务中更常见。
基于NNMF的主题建模揭示核心主题：手势分类、估计、手语识别、手部/身体重建、多模态融合与实时识别。
2024年发表数量明显上升，显示出日益增长的兴趣与进展。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。