QUICK REVIEW

[논문 리뷰] Human Activity Recognition from Wearable Sensor Data Using Self-Attention

Saif Mahmud, M Tanjid Hasan Tonmoy|arXiv (Cornell University)|2020. 03. 17.

Context-Aware Activity Recognition Systems참고 문헌 27인용 수 44

한 줄 요약

This paper proposes a self-attention based non-recurrent neural network for HAR that uses sensor modality attention, self-attention blocks, and global temporal attention to improve activity recognition across four public datasets. It achieves state-of-the-art performance on benchmark and LOSO evaluations and provides interpretable sensor attention maps.

ABSTRACT

Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a self-attention based neural network model that foregoes recurrent architectures and utilizes different types of attention mechanisms to generate higher dimensional feature representation used for classification. We performed extensive experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD. Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-subject-out evaluation. We also observe that the sensor attention maps produced by our model is able capture the importance of the modality and placement of the sensors in predicting the different activity classes.

연구 동기 및 목표

Motivate and address the challenge of capturing spatio-temporal dependencies in time-series wearable sensor data for HAR.
Propose a non-recurrent, self-attention based architecture to learn robust feature representations from multi-sensor inputs.
Incorporate sensor modality attention and global temporal attention to capture spatial and temporal context.
Evaluate on four public HAR datasets and compare against state-of-the-art recurrent and attention-based models.
Demonstrate interpretability through sensor attention maps and analyze window-size effects.

제안 방법

Apply sensor modality attention to weight inputs from different sensors.
Use 1-D convolution to convert weighted sensor inputs into d-dimensional vectors with positional encoding.
Incorporate multi-head self-attention blocks to model intra-window temporal relationships.
Add a global temporal attention layer to compute a weighted temporal summary for classification.
Train with an end-to-end approach and use a final fully connected layer with softmax for activity labels.
Evaluate under sample-wise and window-wise setups, and perform Leave-One-Subject-Out (LOSO) validation.

실험 결과

연구 질문

RQ1Can a self-attention based architecture capture spatio-temporal dependencies in HAR without recurrent layers?
RQ2Does sensor modality attention improve the usefulness of multi-sensor inputs for HAR across diverse datasets?
RQ3How does the model perform under benchmark test subjects and LOSO-CV relative to state-of-the-art methods?
RQ4What is the impact of window size on recognition performance and model robustness?
RQ5Are the attention maps interpretable with respect to sensor placement and activity type?

주요 결과

Dataset	Sample-wise Proposed	Sample-wise DeepConvLSTM	Sample-wise ConvAE	Window-wise Proposed	Window-wise DeepConvLSTM	Window-wise ConvAE	Leave-One-Subject-Out Proposed	Leave-One-Subject-Out DeepConvLSTM	Leave-One-Subject-Out ConvAE
PAMAP2	0.95	0.96	0.71	0.70	0.52	0.80	0.88	0.90	0.89
Opportunity	0.61	0.67	0.66	0.58	0.60	0.60	0.71	-	-
USC-HAD	0.50	0.55	0.42	0.38	0.42	0.46	-	-	-
Skoda	0.93	0.97	0.96	0.88	0.82	0.79	0.91	0.94	0.93

The proposed model achieves higher window-wise macro F1 scores than DeepConvLSTM and ConvAE on PAMAP2, Opportunity, USC-HAD, and SKODA for benchmark tests.
The model shows strong LOSO-CV performance, outperforming DeepConvLSTM and ConvAE across datasets.
On PAMAP2, the model yields 0.95/0.96 (sample-wise) and 0.71/0.70/0.52/0.80/0.88/0.90/0.89 for various baselines, indicating competitive gains for attention-augmented variants.
For Opportunity, the window-wise macro F1 improves to 0.67 vs 0.58 (DeepConvLSTM) and 0.60 (ConvAE).
For USC-HAD and SKODA, the proposed model consistently matches or exceeds competing attention-based methods in window-wise metrics and surpasses in several sample-wise metrics.
The sensor modality attention maps align with intuitive importance of sensor placements for specific activities (e.g., ironing relies more on hand sensors).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.