QUICK REVIEW

[논문 리뷰] FTDMamba: Frequency-Assisted Temporal Dilation Mamba for Unmanned Aerial Vehicle Video Anomaly Detection

Cheng-Zhuang Liu, Si-Bao Chen|arXiv (Cornell University)|2026. 01. 16.

Anomaly Detection Techniques and Applications인용 수 0

한 줄 요약

FTDMamba는 다중 소스 모션을 처리하기 위한 주파수 분리 시공간 상관 모듈과 시간 확장 Mamba 모듈을 도입하여 정적 벤치마크와 새로운 MUVAD 데이터셋에서 최첨단 이상 탐지 성능을 달성한다.

ABSTRACT

Recent advances in video anomaly detection (VAD) mainly focus on ground-based surveillance or unmanned aerial vehicle (UAV) videos with static backgrounds, whereas research on UAV videos with dynamic backgrounds remains limited. Unlike static scenarios, dynamically captured UAV videos exhibit multi-source motion coupling, where the motion of objects and UAV-induced global motion are intricately intertwined. Consequently, existing methods may misclassify normal UAV movements as anomalies or fail to capture true anomalies concealed within dynamic backgrounds. Moreover, many approaches do not adequately address the joint modeling of inter-frame continuity and local spatial correlations across diverse temporal scales. To overcome these limitations, we propose the Frequency-Assisted Temporal Dilation Mamba (FTDMamba) network for UAV VAD, including two core components: (1) a Frequency Decoupled Spatiotemporal Correlation Module, which disentangles coupled motion patterns and models global spatiotemporal dependencies through frequency analysis; and (2) a Temporal Dilation Mamba Module, which leverages Mamba's sequence modeling capability to jointly learn fine-grained temporal dynamics and local spatial structures across multiple temporal receptive fields. Additionally, unlike existing UAV VAD datasets which focus on static backgrounds, we construct a large-scale Moving UAV VAD dataset (MUVAD), comprising 222,736 frames with 240 anomaly events across 12 anomaly types. Extensive experiments demonstrate that FTDMamba achieves state-of-the-art (SOTA) performance on two public static benchmarks and the new MUVAD dataset. The code and MUVAD dataset will be available at: https://github.com/uavano/FTDMamba.

연구 동기 및 목표

UAV 자기-운동과 물체의 운동이 얽힌 다이나믹 UAV 비디오에서 다중 소스 모션 결합을 해결한다.
전역 배경 모션을 로컬 전경 모션과 분리하기 위한 주파수 영역 분리 시공간 모델링 접근법을 개발한다.
Mamba 기반 아키텍처를 통해 다중 시간 스케일에서 시간적 연속성과 국부 공간 상관성을 모델링한다.
현실적인 배치 시나리오를 반영하기 위해 대규모 이동 UAV VAD 데이터셋(MUVAD)을 만든다.
공개 UAV VAD 벤치마크와 MUVAD 데이터셋에서 최첨단(SOTA) 성능을 입증한다.

제안 방법

주파수 분리 시공간 상관 모듈(FDSCM)을 제안한다. 이는 시간 FFT를 사용하여 모션 소스를 분리하고 2D FFT로 전역 시공간 의존성을 모델링한다.
위너-키친 정리를 활용하여 시공간 자기상관 행렬을 어텐션 맵으로 계산하여 특징 강화를 수행한다.
다중 시간 스케일에서 미세한 시간 역학과 국부 공간 구조를 함께 학습하기 위해 STMamba를 갖춘 Temporal Dilation Mamba 모듈(TDMM)을 도입한다.
다른 시간 스케일(Phi_eta)로 시퀀스를 확장하고 STMamba 출력을 융합하여 다중 스케일 시간 모델링 전략을 구현한다.
VAD를 미래 프레임 예측 작업으로 간주하고 독립된 디코더가 있는 네 단계 인코더를 사용한다; 손실은 강도, 그라디언트, SSIM 항을 결합한다.
추론은 PSNR 기반 이상치 점수와 표준화된 프레임 품질 점수를 사용하여 이상치를 식별한다.

실험 결과

연구 질문

RQ1주파수 도메인 분리가 동적 배경에서 UAV 자기-운동과 물체 운동을 분리할 수 있는가?
RQ2다중 스케일 시간 수용 영역이 UAV VAD에서 글로벌(장기) 및 로컬(단기) 모션 모델링을 향상시키는가?
RQ3FTDMamba 프레임워크가 정적 배경 벤치마크와 이동 배경 데이터셋 모두에서 최첨단 성능을 달성하는가?

주요 결과

FTDMamba는 두 개의 공개 정적 UAV VAD 벤치마크와 새로운 MUVAD 데이터셋에서 SOTA 성능을 달성한다.
MUVAD는 222,736 프레임, 240건의 이상 이벤트를 12개 유형에 걸쳐 포함하는 대규모 이동 UAV VAD 데이터셋이다.
모델은 시공간 주파수 분해와 시공간 자기상관 어텐션 및 다중 스케일 STMamba 기반 시퀀싱을 통합한다.
실험은 다이나믹한 배경에 대한 강건성과 다중 소스 모션의 효과적인 모델링을 입증한다.
데이터셋 MUVAD는 정적 배경 데이터셋에 잘 반영되지 않는 현실적인 이동 UAV 조건을 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.