QUICK REVIEW

[论文解读] LSA64: An Argentinian Sign Language Dataset

Franco Ronchetti, Facundo Quiroga|arXiv (Cornell University)|Oct 26, 2023

Hand Gesture Recognition Systems参考文献 9被引用 55

一句话总结

本文介绍了 LSA64，一份面向研究的阿根廷手语数据集，包含来自 10 名参与者的 64 个手势的 3200 段视频，以及经预处理的版本和基准识别结果。

ABSTRACT

Automatic sign language recognition is a research area that encompasses human-computer interaction, computer vision and machine learning. Robust automatic recognition of sign language could assist in the translation process and the integration of hearing-impaired people, as well as the teaching of sign language to the hearing population. Sign languages differ significantly in different countries and even regions, and their syntax and semantics are different as well from those of written languages. While the techniques for automatic sign language recognition are mostly the same for different languages, training a recognition system for a new language requires having an entire dataset for that language. This paper presents a dataset of 64 signs from the Argentinian Sign Language (LSA). The dataset, called LSA64, contains 3200 videos of 64 different LSA signs recorded by 10 subjects, and is a first step towards building a comprehensive research-level dataset of Argentinian signs, specifically tailored to sign language recognition or other machine learning tasks. The subjects that performed the signs wore colored gloves to ease the hand tracking and segmentation steps, allowing experiments on the dataset to focus specifically on the recognition of signs. We also present a pre-processed version of the dataset, from which we computed statistics of movement, position and handshape of the signs.

研究动机与目标

提供面向研究的阿根廷手语（LSA）数据集，以支持识别和机器学习任务。
提供一个公开可用的资源，包含原始数据和预处理数据，以促进可重复性。
用关于手形、位置和轨迹的统计特征来描述数据集，以指导模型开发。
展示基线实验，以在 LSA64 上为署名依赖识别建立参考性能。

提出的方法

使用彩色手套记录由 10 位参与者执行的 64 个手势的 3200 段视频，以便于手部跟踪。
提供带有手/头部位姿、分割的手部图像以及归一化坐标的预处理版本。
描述一个基线手语识别模型，通过针对各手的分类器将手部位置、运动和手形信息融合，并对概率进行乘积运算以得到最终类别概率。
使用署名依赖的交叉验证（80-20 划分，30 次运行）来报告准确率。
在 EM 训练框架下，使用高斯混合模型和隐马尔可夫模型比较运动、位置和手形模态。

实验结果

研究问题

RQ1LSA64 数据集的组成与现实性如何（手势类型、手形、动作、参与者）？
RQ2基线的署名依赖模型在 LSA64 上是否能够利用位置、运动和手形线索实现较高的准确率？
RQ3与原始视频相比，预处理特征（手/头部位置、分割的手部图像）如何帮助识别？
RQ4哪些统计特征描述手势（运动的重叠、初始/最终位置、手形）以指导模型设计？
RQ5该数据集是否适合用于开发阿根廷手语（LSA）的识别系统？

主要发现

LSA64 包含 3200 段视频，64 个手势，由 10 名参与者执行，包含单手和双手手势。
预处理数据提供手/头部位置和分割的手部图像，便于归一化特征提取。
署名依赖基线在测试集上达到 95.95% 的准确率（n=30 次运行，80-20 划分）。
基线使用针对位置、运动和手形的单独手分类器，并对两只手的概率进行乘积以得到最终类别似然度。
以多流、手部专用框架，在 HMM-GMM 与高斯分布下对运动、位置和手形线索建模。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。