QUICK REVIEW

[論文レビュー] LSA64: An Argentinian Sign Language Dataset

Franco Ronchetti, Facundo Quiroga|arXiv (Cornell University)|Oct 26, 2023

Hand Gesture Recognition Systems参考文献 9被引用数 55

ひとこと要約

この論文はLSA64を紹介する。Argentinian Sign Languageの研究志向のデータセットで、10名の被験者から64のサインを3200本のビデオ、加えて前処理版とベースライン認識結果。

ABSTRACT

Automatic sign language recognition is a research area that encompasses human-computer interaction, computer vision and machine learning. Robust automatic recognition of sign language could assist in the translation process and the integration of hearing-impaired people, as well as the teaching of sign language to the hearing population. Sign languages differ significantly in different countries and even regions, and their syntax and semantics are different as well from those of written languages. While the techniques for automatic sign language recognition are mostly the same for different languages, training a recognition system for a new language requires having an entire dataset for that language. This paper presents a dataset of 64 signs from the Argentinian Sign Language (LSA). The dataset, called LSA64, contains 3200 videos of 64 different LSA signs recorded by 10 subjects, and is a first step towards building a comprehensive research-level dataset of Argentinian signs, specifically tailored to sign language recognition or other machine learning tasks. The subjects that performed the signs wore colored gloves to ease the hand tracking and segmentation steps, allowing experiments on the dataset to focus specifically on the recognition of signs. We also present a pre-processed version of the dataset, from which we computed statistics of movement, position and handshape of the signs.

研究の動機と目的

Argentinian Sign Language (LSA) の研究用データセットを提供し、認識と機械学習タスクを支援する。
生データと前処理済みデータの両方を公開資源として提供し、再現性を促進する。
データセットを手形、位置、軌跡の統計で特徴付け、モデル開発を導く。
LSA64における signer-dependent 認識の基準性能を確立するベースライン実験を提示する。

提案手法

着色手袋を使って10名の被験者が64サインを3200ビデオ記録し、手の追跡を容易にする。
手の位置・頭部の位置、セグメント化された手の画像、正規化座標を含む前処理版を提供する。
手の位置、動き、手形情報を手特有の分類器と確率の積によって統合するベースラインのサイン認識モデルを記述する。
signer-dependent クロスバリデーション（80-20分割、30回実行）で精度を報告する。
動き、位置、手形モダリティを、EMトレーニングフレームワークでGaussian Mixture ModelsとHidden Markov Modelsを用いて比較する。

実験結果

リサーチクエスチョン

RQ1LSA64 の構成と実在感はどの程度か（サインの種類、手形、動き、被験者）？
RQ2位置、動き、手形の合図を用いた signer-dependent ベースラインモデルは高精度を達成できるか？
RQ3前処理特徴（手/頭部の位置、セグメント化された手の画像）は生のビデオと比べて認識にどの程度役立つか？
RQ4動きの重複、初期/最終位置、手形などを統計として把握し、モデル設計の指針とする？
RQ5このデータセットは Argentinian Sign Language (LSA) の認識システム開発に適しているか？

主な発見

LSA64は3200本のビデオを含み、64サインを10名の被験者が実演、単手・両手サインを含む。
前処理データは手/頭部の位置とセグメント化された手の画像を提供し、正規化された特徴抽出を可能にする。
signer-dependent ベースライン精度はテストセットで95.95%を達成（n=30回、80-20分割）。
ベースラインは手ごとに位置、動き、手形の分類器を分離して用い、確率を手ごとに掛け合わせて最終クラスの尤度を算出。
Movement、Position、Handshape の手法は、HMM-GMMとGaussian分布を用いたマルチストリーム・手特異的フレームワークでモデル化。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。