QUICK REVIEW

[논문 리뷰] LSA64: An Argentinian Sign Language Dataset

Franco Ronchetti, Facundo Quiroga|arXiv (Cornell University)|2023. 10. 26.

Hand Gesture Recognition Systems참고 문헌 9인용 수 55

한 줄 요약

본 논문은 10명의 피실험자로부터 64개의 수화가 포함된 3200개의 비디오로 구성된 연구용 아르헨티나 수화 데이터셋인 LSA64를 소개하며, 전처리 버전 및 베이스라인 인식 결과를 함께 제공합니다.

ABSTRACT

Automatic sign language recognition is a research area that encompasses human-computer interaction, computer vision and machine learning. Robust automatic recognition of sign language could assist in the translation process and the integration of hearing-impaired people, as well as the teaching of sign language to the hearing population. Sign languages differ significantly in different countries and even regions, and their syntax and semantics are different as well from those of written languages. While the techniques for automatic sign language recognition are mostly the same for different languages, training a recognition system for a new language requires having an entire dataset for that language. This paper presents a dataset of 64 signs from the Argentinian Sign Language (LSA). The dataset, called LSA64, contains 3200 videos of 64 different LSA signs recorded by 10 subjects, and is a first step towards building a comprehensive research-level dataset of Argentinian signs, specifically tailored to sign language recognition or other machine learning tasks. The subjects that performed the signs wore colored gloves to ease the hand tracking and segmentation steps, allowing experiments on the dataset to focus specifically on the recognition of signs. We also present a pre-processed version of the dataset, from which we computed statistics of movement, position and handshape of the signs.

연구 동기 및 목표

인식 및 기계학습 작업을 지원하기 위한 연구 등급의 아르헨티나 수화(LSA) 데이터셋을 제공한다.
재현 가능성을 촉진하기 위해 원시 데이터와 전처리된 데이터를 모두 포함한 공개적으로 이용 가능한 자원을 제공한다.
모델 개발을 안내하기 위해 손 모양, 위치 및 궤적에 관한 통계로 데이터셋을 특성화한다.
LSA64에서 화자 의존 인식에 대한 참조 성능을 확립하기 위한 베이스라인 실험을 제시한다.

제안 방법

손 추적을 용이하게 하기 위해 컬러 글러브를 사용하여 10명의 피실험자가 수행한 64개의 수화에 대한 3200개의 비디오를 기록한다.
손/머리 위치, 분리된 손 이미지 및 정규화된 좌표를 포함한 전처리 버전을 제공한다.
손 위치, 움직임 및 손 모양 정보를 손별 분류기와 확률의 곱을 통해 융합하는 베이스라인 수화 인식 모델을 기술한다.
정확도를 보고하기 위해 화자 의존적 교차 검증(80-20 분할, 30회 실행)을 사용한다.
EM 학습 프레임워크에서 Gaussian Mixture Models 및 Hidden Markov Models을 사용하여 움직임, 위치 및 손모양 모달리티를 비교한다.

실험 결과

연구 질문

RQ1LSA64 데이터셋의 구성과 현실성은 어떤가요(수화 유형, 손모양, 움직임, 피실험자)?
RQ2위치, 움직임, 손모양 단서들을 사용한 베이스라인 화자 의존 모델이 LSA64에서 높은 정확도를 달성할 수 있는가?
RQ3전처리된 특징(손/머리 위치, 분할된 손 이미지)이 원시 비디오에 비해 인식에 어떻게 도움이 되나요?
RQ4움직임의 겹침, 초기/최종 위치, 손모양 등의 어떤 통계가 수화의 특성을 포착하여 모델 설계에 정보를 제공하나요?
RQ5이 데이터셋은 아르헨티나 수화(LSA)에 대한 수화 인식 시스템 개발에 적합합니까?

주요 결과

LSA64는 10명의 피실험자가 수행한 64개의 수화의 3200개 비디오를 포함하며, 한손 수화와 양손 수화가 모두 있습니다.
전처리 데이터는 손/머리 위치와 분할된 손 이미지를 제공하여 정규화된 특징 추출을 가능하게 합니다.
화자 의존 베이스라인 정확도는 테스트 세트에서 95.95%를 달성했습니다(n=30회 실행, 80-20 분할).
베이스라인은 위치, 움직임, 손모양 각각에 대해 손 분류기를 사용하고, 손 간의 확률을 곱하여 최종 클래스 가능성을 산출합니다.
움직임, 위치 및 손모양 단서는 다중 스트림, 손별 프레임워크에서 HMM-GMM 및 가우시안 분포로 모델링됩니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.