QUICK REVIEW

[Paper Review] RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention

Bochao Zou, Zizheng Guo|arXiv (Cornell University)|Feb 20, 2024

Blind Source Separation Techniques8 citations

TL;DR

RhythmFormer introduces a fully end-to-end transformer that exploits rPPG quasi-periodicity with hierarchical temporal periodic transformers and a plug-and-play fusion stem to improve rPPG extraction and robustness across datasets.

ABSTRACT

Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals based on facial videos, holding high potential in various applications. Due to the periodicity nature of rPPG signals, the long-range dependency capturing capacity of the transformer was assumed to be advantageous for such signals. However, existing methods have not conclusively demonstrated the superior performance of transformers over traditional convolutional neural networks. This may be attributed to the quadratic scaling exhibited by transformer with sequence length, resulting in coarse-grained feature extraction, which in turn affects robustness and generalization. To address that, this paper proposes a periodic sparse attention mechanism based on temporal attention sparsity induced by periodicity. A pre-attention stage is introduced before the conventional attention mechanism. This stage learns periodic patterns to filter out a large number of irrelevant attention computations, thus enabling fine-grained feature extraction. Moreover, to address the issue of fine-grained features being more susceptible to noise interference, a fusion stem is proposed to effectively guide self-attention towards rPPG features. It can be easily integrated into existing methods to enhance their performance. Extensive experiments show that the proposed method achieves state-of-the-art performance in both intra-dataset and cross-dataset evaluations. The codes are available at https://github.com/zizheng-guo/RhythmFormer.

Motivation & Objective

Motivate and leverage the quasi-periodic nature of rPPG signals for more accurate remote PPG extraction.
Introduce a hierarchical temporal periodic transformer to capture multi-scale periodic features.
Propose a fusion stem to guide self-attention toward rPPG-relevant features and enable easy transfer to other methods.
Achieve state-of-the-art performance with reduced model size and computation across multiple datasets.

Proposed method

Propose RhythmFormer, a fully end-to-end transformer-based framework with a fusion stem, patch embedding, Hierarchical Temporal Periodic Transformer (TPT), and an rPPG predictor head.
Use a fusion stem that combines difference frames with raw frames to guide frame-level rPPG awareness.
Implement Hierarchical Temporal Periodic Transformer with three stages of TPT blocks and multi-scale temporal downsampling and top-k guided pre-attention to focus on high-correlation regions.
Apply temporal periodic sparse attention with a pre-attention stage (large receptive field) and a refined attention stage (top-k regions) plus an LCE module to enhance local positional cues.
Incorporate an HR Hybrid Loss combining temporal correlation, frequency guidance, and a learned heart-rate distribution through KL divergence to better align training with HR metrics.
Provide a plug-and-play fusion stem that improves other methods without changing their backbones.

Experimental results

Research questions

RQ1Can a transformer that explicitly models periodicity in rPPG signals outperform CNN-based and other transformer-based approaches?
RQ2Does multi-scale temporal processing with periodic sparse attention improve robustness to noise and complexity across datasets?
RQ3Is the fusion stem a transferable component that consistently enhances rPPG performance when integrated with other methods?
RQ4How does the HR-based hybrid loss influence learning and final heart-rate related metrics?

Key findings

RhythmFormer achieves state-of-the-art intra-dataset performance on PURE, with MAE 0.27, RMSE 0.47, and ρ 0.99; and on UBFC with MAE 0.50, RMSE 0.78, and ρ 0.99.
On the challenging MMPD dataset, RhythmFormer achieves MAE 3.07, RMSE 6.81, MAPE 3.24, ρ 0.86, and SNR 5.46, surpassing prior methods.
Cross-dataset evaluation shows strong generalization and domain-invariant rPPG feature learning, outperforming existing end-to-end methods.
Ablation studies demonstrate the effectiveness of the fusion stem, pre-attention, and multi-scale design for improving rPPG extraction and robustness.
RhythmFormer delivers fewer parameters (3.251M) and lower MACs (38.494G) compared to several baselines, indicating efficiency suitable for mobile deployment.
The fusion stem consistently improves performance when added to other methods, validating its transferability and impact on SNR and accuracy.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.