QUICK REVIEW

[논문 리뷰] RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement

Bochao Zou, Zizheng Guo|arXiv (Cornell University)|2024. 04. 09.

Non-Invasive Vital Sign Monitoring인용 수 6

한 줄 요약

RhythmMamba 는 다양한 길이의 비디오에서 준주기 신호를 효율적으로 추출하기 위해 다중-temporal Mamba 와 주파수 도메인 상호작용을 활용하는 엔드투엔드 Mamba 기반 모델로, 최첨단 성능과 더 낮은 복잡도를 달성합니다.

ABSTRACT

Remote photoplethysmography (rPPG) is a method for non-contact measurement of physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: understanding the periodic pattern of rPPG among long contexts and addressing large spatiotemporal redundancy in video segments. These represent a trade-off between computational complexity and the ability to capture long-range dependencies. In this paper, we introduce RhythmMamba, a state space model-based method that captures long-range dependencies while maintaining linear complexity. By viewing rPPG as a time series task through the proposed frame stem, the periodic variations in pulse waves are modeled as state transitions. Additionally, we design multi-temporal constraint and frequency domain feed-forward, both aligned with the characteristics of rPPG time series, to improve the learning capacity of Mamba for rPPG signals. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with 319% throughput and 23% peak GPU memory. The codes are available at https://github.com/zizheng-guo/RhythmMamba.

연구 동기 및 목표

헬스케어, 정서 컴퓨팅, 위조 방지 등을 위해 얼굴 영상으로부터 비접촉 생리 측정(rPPG)을 추정하는 것을 동기로 삼는다.
rPPG 모델링에서 계산 효율성과 장기 시간 의존성 사이의 트레이드오프를 다룬다.
성능 저하 없이 임의 길이의 비디오를 처리하는 엔드-투-엔드 RhythmMamba 프레임워크를 제안한다.

제안 방법

diff-fusion, 대 커널 컨볼루션, 자기 주의(attention)를 통해 공간 정보를 토큰 채널로 집계하는 프레임 스템(frame stem)을 도입한다.
단일 Mamba 블록 내에서 다양한 길이의 시퀀스를 처리하도록 다중-템포럴 Mamba를 개발하여 하나의 Mamba 블록에서 가변 길이 시퀀스를 처리하고, 장기 시퀀스의 주기성과 단기 시퀀스의 추세를 제약한다.
주파수 도메인에서 채널 간 상호작용을 가능하게 하는 주파수 도메인 피드포워드를 도입하여 준주기적 rPPG 패턴을 강조한다.
공간 정보를 시간적 모델링에 방해하지 않도록 프레임 스템 모듈(frame-level channel aggregation)을 추가한다.
심박수 PSD를 기반으로 한 주파수 도메인 제약과 함께 시간 상관관계(음의 Pearson) 를 결합한 손실 함수로 학습한다.
비디오 길이에 따른 선형 계산 증가를 보이고 임의 길이 입력 처리를 가능하게 한다.

실험 결과

연구 질문

RQ1RhythmMamba가 성능 저하 없이 임의 길이의 비디오에서 정확하게 rPPG를 추정할 수 있는가?
RQ2다중-템포럴 Mamba가 rPPG 신호의 장거리 주기 패턴과 단기 추세를 모두 효과적으로 포착하는가?
RQ3주파수 도메인 채널 상호작용이 rPPG에서 심박 관련 주기 성분의 식별력을 향상시키는가?
RQ4프레임 스템을 통해 공간 정보를 채널로 집계하는 것이 rPPG에서 Mamba 기반 시간 학습에 유익한가?

주요 결과

RhythmMamba는 같은 데이터세트 내 평가(PURE, UBFC)와 도전적인 MMPD 데이터셋에서 파라미터 수와 MACs를 줄이면서 최첨단 성능을 달성한다.
MMPD에서 RhythmMamba는 여러 베이스라인과 비교해 우수한 결과를 제공합니다(예: MAE 3.16, RMSE 7.27, MAPE 3.37, ρ 0.84, SNR 4.74).
크로스-데이터세트 평가에서 RhythmMamba는 PURE/UBFC에서 학습하고 PURE, UBFC, MMPD에서 테스트했을 때 일반화가 잘 되는 것을 보여준다.
제거 실험(ablation) 연구들은 diff-fusion 프레임 스템, 대 커널, 다중-템포얼 Mamba, 주파수 도메인 FFN의 중요성을 확인하여 성능 향상을 뒷받침한다.
RhythmMamba는 비디오 길이에 따라 선형 추론 비용을 보이며 결과를 분할-연결하여 초장 비디오를 처리할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.