QUICK REVIEW

[論文レビュー] Transformer-based Models to Deal with Heterogeneous Environments in Human Activity Recognition

Sannara EK, François Portet|arXiv (Cornell University)|Sep 22, 2022

Context-Aware Activity Recognition Systems被引用数 20

ひとこと要約

本論文はIMUベースHARのための軽量センサーワイズ・トランスフォーマーであるHARTを提案し、FLOPS/パラメータ数を削減しつつ精度を向上させ、デバイス/ポジションの異質性に対する頑健性を示す。

ABSTRACT

Human Activity Recognition (HAR) on mobile devices has been demonstrated to be possible using neural models trained on data collected from the device's inertial measurement units. These models have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), Transformers or a combination of these to achieve state-of-the-art results with real-time performance. However, these approaches have not been extensively evaluated in real-world situations where the input data may be different from the training data. This paper highlights the issue of data heterogeneity in machine learning applications and how it can hinder their deployment in pervasive settings. To address this problem, we propose and publicly release the code of two sensor-wise Transformer architectures called HART and MobileHART for Human Activity Recognition Transformer. Our experiments on several publicly available datasets show that these HART architectures outperform previous architectures with fewer floating point operations and parameters than conventional Transformers. The results also show they are more robust to changes in mobile position or device brand and hence better suited for the heterogeneous environments encountered in real-life settings. Finally, the source code has been made publicly available.

研究の動機と目的

HARにおける異なるデバイスや体位からのクライアントデータの異質性に対処する。
モバイルデバイス上のIMUベースHARに適した軽量トランスフォーマーアーキテクチャを開発する。
複数のHARデータセットにおいて、HARTとMobileHARTをCNN/CNN-LSTMおよびViTの変種と、効率と精度の観点で比較する。

提案手法

センサーワイズIMU入力に適したViTおよびMobileViTから適用したHARTとMobileHARTアーキテクチャを提案する。
センサーワイズのマルチヘッド自己注意を用い、センサーごとの埋め込みサイズを小さくして複雑性を低減する。
パラメータと計算量をさらに削減するために、共有MSA（OneMSA）を実装する。
計算量を削減するためにクラス・トークンの代わりにGlobal Average Poolingを適用する。
ウィンドウ化されたIMUデータ、50 Hzのサンプリング、データセットごとに70/10/20の訓練/検証/テスト分割で学習する。
UCI、MotionSense、HHAR、RealWorld、SHLを含む5つのHARデータセットで評価する。

実験結果

リサーチクエスチョン

RQ1異質なセンシングデバイスや体位の下で、トランスフォーマーベースのHARモデルはどのように性能を発揮するか。
RQ2センサーワイズ注意機構と軽量ブロックは、モバイルHARで精度を維持しつつ計算量を削減できるか。
RQ3共有MSAとセンサーワイズ融合が頑健性と効率に与える影響はどのようか。
RQ4実デバイス環境で、HARTの変種はCNN/CNN-LSTMおよびViTのベースラインとどのように比較されるか。

主な発見

Architecture	F-Score (↑)	Parameters (↓)	FLOPS (↓)
CNN	94.53	6,448,714	17,725,476
CNN-LSTM	92.79	559,558	1,350,180
ViT	93.66	3,783,238	17,069,949
HART	94.49	1,445,918	15,212,636
HART OneMSA	94.37	1,277,150	15,176,924
MobileViT (XS)	96.89	2,734,622	22,983,180
MobileHART (XS)	97.20	2,542,942	19,809,292
MobileViT (XXS)	96.55	1,352,054	8,995,612
MobileHART (XXS)	97.67	1,275,702	8,213,276

HARTとMobileHARTはViTおよびCNNベースラインよりも少ないパラメータ数とFLOPSで競争力のあるまたは優れたFスコアを達成する。
センサーワイズMSAとLightConvを用いたHARTの変種は、センサー間で注意を分布させることで計算を削減し、効率を向上させる。
MobileHART (XS)/(XXS)構成は、より大きなアーキテクチャよりも大幅に低いパラメータ数とFLOPSで高い精度を提供する。
モデルは、未見デバイスやオンボディポジションといったドメインシフトに対して複数のHARデータセットで頑健性を示す。
著者らはスマートフォン上での推論時間とメモリフットプリントの評価を報告し、実デバイスでの有効性を検証する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。