Skip to main content
QUICK REVIEW

[论文解读] Time2Vec Transformer for Robust Gesture Recognition from Low-Density sEMG

Blagoj Hristov, Hristijan Gjoreski|arXiv (Cornell University)|Feb 2, 2026
Muscle activation and electromyography studies被引用 0
一句话总结

本文提出一种数据高效的 Time2Vec Transformer 框架,用于鲁棒、低密度两通道 sEMG 姿势识别,在多主体 F1 分数方面达到SOTA,并可快速对未见主体进行标定。

ABSTRACT

Accurate and responsive myoelectric prosthesis control typically relies on complex, dense multi-sensor arrays, which limits consumer accessibility. This paper presents a novel, data-efficient deep learning framework designed to achieve precise and accurate control using minimal sensor hardware. Leveraging an external dataset of 8 subjects, our approach implements a hybrid Transformer optimized for sparse, two-channel surface electromyography (sEMG). Unlike standard architectures that use fixed positional encodings, we integrate Time2Vec learnable temporal embeddings to capture the stochastic temporal warping inherent in biological signals. Furthermore, we employ a normalized additive fusion strategy that aligns the latent distributions of spatial and temporal features, preventing the destructive interference common in standard implementations. A two-stage curriculum learning protocol is utilized to ensure robust feature extraction despite data scarcity. The proposed architecture achieves a state-of-the-art multi-subject F1-score of 95.7% $\pm$ 0.20% for a 10-class movement set, statistically outperforming both a standard Transformer with fixed encodings and a recurrent CNN-LSTM model. Architectural optimization reveals that a balanced allocation of model capacity between spatial and temporal dimensions yields the highest stability. Furthermore, while direct transfer to a new unseen subject led to poor accuracy due to domain shifts, a rapid calibration protocol utilizing only two trials per gesture recovered performance from 21.0% $\pm$ 2.98% to 96.9% $\pm$ 0.52%. By validating that high-fidelity temporal embeddings can compensate for low spatial resolution, this work challenges the necessity of high-density sensing. The proposed framework offers a robust, cost-effective blueprint for next-generation prosthetic interfaces capable of rapid personalization.

研究动机与目标

  • 通过尽量少的传感器硬件,推动可及的肌电假手控制。
  • 开发适用于稀疏 sEMG 数据的数据高效深度学习模型。
  • 整合 Time2Vec 时序嵌入以捕捉生物信号中的随机时序扭曲。
  • 提出归一化加性融合以对齐时空特征分布。
  • 评估跨主体的鲁棒性并展示快速标定能力。

提出的方法

  • 使用面向稀疏两通道 sEMG 的混合 Transformer 架构。
  • 引入 Time2Vec 可学习的时序嵌入以建模时序扭曲。
  • 应用归一化加性融合以对齐空间与时序特征的潜在分布。
  • 采用两阶段课程学习协议以缓解数据稀缺。
  • 在时空维度之间平衡模型容量以实现稳定性。
  • 与具有固定编码的标准 Transformer 及 CNN-LSTM 基线进行对比。

实验结果

研究问题

  • RQ1 Time2Vec 时序嵌入是否能提升低密度 sEMG 下姿势识别的鲁棒性?
  • RQ2归一化加性融合是否能缓解稀疏传感设置中空间与时序特征的干扰?
  • RQ3在标注数据有限的情况下,课程学习对特征提取有何影响?
  • RQ4在空间与时序维度之间分配模型容量对稳定性与性能有何影响?
  • RQ5仅使用每个姿势少量试次,是否可实现对未见主体的快速标定?

主要发现

  • 在10类别动作集合上实现多主体 F1 分数 95.7%±0.20% 的SOTA。
  • 优于具有固定编码的标准 Transformer 以及基于 CNN-LSTM 的基线。
  • 直接迁移到未见主体的准确率较低,受域偏移影响,但通过对每个姿势两次试验的快速标定,性能从 21.0%±2.98% 提升至 96.9%±0.52%。
  • 高保真时序嵌入可以弥补低空间分辨率的不足,挑战了对高密度传感的必要性。
  • 在空间与时序维度之间对模型容量的均衡分配可获得更高的稳定性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。