QUICK REVIEW

[論文レビュー] Survey on Hand Gesture Recognition from Visual Input

Manousos Linardakis, Iraklis Varlamis|ArXiv.org|Jan 21, 2025

Hand Gesture Recognition Systems被引用数 4

ひとこと要約

この論文は、視覚入力に基づく手 gesture 認識の最近の研究（2018–2024）を網羅し、RGB、深度、ビデオデータ、データセット、手法、実世界の課題を扱う。

ABSTRACT

Hand gesture recognition has become an important research area, driven by the growing demand for human-computer interaction in fields such as sign language recognition, virtual and augmented reality, and robotics. Despite the rapid growth of the field, there are few surveys that comprehensively cover recent research developments, available solutions, and benchmark datasets. This survey addresses this gap by examining the latest advancements in hand gesture and 3D hand pose recognition from various types of camera input data including RGB images, depth images, and videos from monocular or multiview cameras, examining the differing methodological requirements of each approach. Furthermore, an overview of widely used datasets is provided, detailing their main characteristics and application domains. Finally, open challenges such as achieving robust recognition in real-world environments, handling occlusions, ensuring generalization across diverse users, and addressing computational efficiency for real-time applications are highlighted to guide future research directions. By synthesizing the objectives, methodologies, and applications of recent studies, this survey offers valuable insights into current trends, challenges, and opportunities for future research in human hand gesture recognition.

研究の動機と目的

入力データタイプ、取得環境、認識タスクで手指ジェスチャ認識研究を整理する。
データセットとアプリケーションを調査し、それらの特徴と限界を強調する。
視覚入力手指ジェスチャ認識の現在の動向、課題、今後の機会を特定する。
分類と推定タスク、単眼とマルチビュー設置の違いを区別する。

提案手法

入力データタイプ（RGB、RGB-D、ビデオ）とカメラ設置（単眼 vs マルチビュー）に基づいて手法を分類する。
手指ジェスチャの分類と推定を区別し、両方を組み合わせたハイブリッド手法にも言及する。
スケルトン表現とボックス/フィルタ表現など、手の取得表現とそれぞれの利点を検討する。
認識技術（ニューラルネットワーク、非ニューラル手法、ハイブリッド）の普及状況を検証する。
トピックモデリング（NNMF）を用いて、リアルタイム認識や多模態融合などの主要な研究テーマを特定する。

実験結果

リサーチクエスチョン

RQ1視覚入力の手指ジェスチャ認識において、支配的な入力データタイプと取得環境は何か？
RQ2現在のアプローチでは手指ジェスチャはどのように表現され、分類または推定されているか？
RQ3主要な認識手法と、最近の研究での性能傾向はどのようか？
RQ4最近のHGR研究を牽引するデータセットとアプリケーションは何か、実世界展開の課題は何か？
RQ5最近の手指ジェスチャ認識研究でコアテーマと今後の方向性として浮かび上がるトピックは何か？

主な発見

ビデオベースの手法が研究全体の最大の割合を占める（約半数）。
単眼カメラが実用性の観点で支配的で、多視点設置は少ない。
ハイブリッドなニューラルネットワーク手法（例：CNNとLSTMやトランスフォーマーの組み合わせ）が普及し有効。
スケルトンベースの表現は正確な関節表現を提供する一方で計算量が多い；ボックス/フィルタベースは分類タスクでより単純で一般的。
NNMFベースのトピックモデリングは、手指ジェスチャ分類、推定、手話認識、手/体の再構成、マルチモーダル融合、リアルタイム認識といった中核テーマを特定。
2024年の論文急増は、関心と進展が高まっていることを示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。