QUICK REVIEW

[논문 리뷰] A Real-time Hand Gesture Recognition and Human-Computer Interaction System

Pei Xu|arXiv (Cornell University)|2017. 04. 24.

Hand Gesture Recognition Systems참고 문헌 22인용 수 62

한 줄 요약

실시간으로 제스처 기반 HCI 시스템으로 단일 CNN(수정된 LeNet-5)을 사용하여 monocular 카메라 입력에서 16개의 정지 제스처를 높은 정확도로 인식하고, 칼만 필터링된 마우스 제어 및 간단한 확률적 응답 체계를 제공한다. 또한 ROS 기반 HRI 확장을 시연한다.

ABSTRACT

In this project, we design a real-time human-computer interaction system based on hand gesture. The whole system consists of three components: hand detection, gesture recognition and human-computer interaction (HCI) based on recognition; and realizes the robust control of mouse and keyboard events with a higher accuracy of gesture recognition. Specifically, we use the convolutional neural network (CNN) to recognize gestures and makes it attainable to identify relatively complex gestures using only one cheap monocular camera. We introduce the Kalman filter to estimate the hand position based on which the mouse cursor control is realized in a stable and smooth way. During the HCI stage, we develop a simple strategy to avoid the false recognition caused by noises - mostly transient, false gestures, and thus to improve the reliability of interaction. The developed system is highly extendable and can be used in human-robotic or other human-machine interaction scenarios with more complex command formats rather than just mouse and keyboard events.

연구 동기 및 목표

단일 모노큘러 카메라를 사용한 실시간, 저비용 제스처 기반 HCI를 동기화한다.
이미지 데이터에서 특징을 직접 학습하는 CNN 기반 제스처 인식기를 개발한다.
추적된 손 지점을 통해 추가 마커 없이도 안정적인 마우스 커서 제어를 가능하게 한다.
간단한 확률적 체계를 통해 일시적이고 잘못된 제스처를 거부함으로써 상호 작용의 신뢰성을 높인다.
ROS 메시지를 사용한 HRI로 시스템을 확장한다.

제안 방법

CNN 분류기(수정된 LeNet-5)가 전처리된 이진 손 이미지를 처리하여 16개의 정지 제스처를 높은 정확도로 인식한다.
손 detecti on은 배경 차감, 색상 필터링, 가우시안 블러, 임계처리, 형태학적 연산, 컨투어 추출, 손 영역 분리 등을 포함하며, 거리 변환으로 손의 중심을 식별한다.
다각형 근사(Ramer-Douglas-Peucker)는 손가락 끝 위치 추정을 위한 볼록 결손 탐지를 개선한다.
손바닥 제스처의 맨 위 점, 일반적으로 중지 손가락 끝을 추적하여 마우스 커서를 구동한다; 칼만 필터가 커서 모션을 부드럽게 한다.
응답 기간 동안의 간단한 확률적 구분 기능은 일시적/오류 제스처에 시스템이 반응하는 것을 방지하고 드래그처럼 유지되는 동작을 안정화한다.
64x64 입력 크기의 CNN은 학습률 0.0001 및 모멘텀 0.9에서 최적의 성능을 보였고; 모델은 테스트 세트에서 99.8%를 초과하는 정확도를 달성한다.

실험 결과

연구 질문

RQ1노이즈가 있는 모노큘러 카메라 입력에서 CNN을 학습시켜 최소한의 전처리로 높은 정확도의 정적 제스처 인식을 달성할 수 있는가?
RQ2칼만 필터링된 추적 지점이 마커 없이도 안정적이고 매끄러운 마우스 커서 제어를 제공할 수 있는가?
RQ3간단한 확률적 의사결정 체계가 유지 명령을 보존하면서 일시적인 잘못된 제스처를 신뢰성 있게 억제할 수 있는가?
RQ4제스처 기반 HCI 프레임워크를 ROS 기반 인간-로봇 상호 작용 시나리오로 확장할 수 있는가?

주요 결과

제스처 세트는 16개의 정지 제스처로 구성되며 다섯 명으로부터 19,852개의 샘플이 수집됐다.
CNN-based recognition with 64x64 input size achieves over 99.8% accuracy on the test set.
The Kalman filter improves mouse cursor smoothness during horizontal/vertical motion.
A simple probabilistic response model effectively rejects transient/false gestures and preserves held actions like drag.
System demonstrations include keyboard/mouse event triggering and ROS-based control of a simulated robot ( turtle ) via ROS topics.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.