QUICK REVIEW

[论文解读] A Survey on Deep Learning Techniques for Video Anomaly Detection

Jessie James P. Suarez, Prospero C. Naval|arXiv (Cornell University)|Sep 29, 2020

Anomaly Detection Techniques and Applications参考文献 31被引用 25

一句话总结

本综述全面概述了用于视频异常检测的深度学习技术，按异常识别方法对方法进行分类——重建法、未来帧预测法、分类法或评分法。综述指出，从手工设计特征向端到端学习的转变，强调了时空建模的重要性，并呼吁开发更鲁棒的评估指标以及弱监督学习方法，以应对实际部署中的挑战。

ABSTRACT

Anomaly detection in videos is a problem that has been studied for more than a decade. This area has piqued the interest of researchers due to its wide applicability. Because of this, there has been a wide array of approaches that have been proposed throughout the years and these approaches range from statistical-based approaches to machine learning-based approaches. Numerous surveys have already been conducted on this area but this paper focuses on providing an overview on the recent advances in the field of anomaly detection using Deep Learning. Deep Learning has been applied successfully in many fields of artificial intelligence such as computer vision, natural language processing and more. This survey, however, focuses on how Deep Learning has improved and provided more insights to the area of video anomaly detection. This paper provides a categorization of the different Deep Learning approaches with respect to their objectives. Additionally, it also discusses the commonly used datasets along with the common evaluation metrics. Afterwards, a discussion synthesizing all of the recent approaches is made to provide direction and possible areas for future research.

研究动机与目标

基于异常检测的最终识别机制，对近期基于深度学习的视频异常检测方法进行系统性分类。
分析常用数据集和评估指标，指出当前基准的局限性，并强调需要更真实、大规模的数据。
识别当前研究中的空白，特别是缺乏上下文感知建模，以及需要弱监督或无监督学习以减轻标注负担。
通过总结趋势并提出方向（如端到端架构、注意力机制和改进的评估标准），为未来研究提供指导。

提出的方法

将基于深度学习的异常检测方法分为四类：基于重建的方法、基于未来帧预测的方法、基于分类的方法和基于评分的方法。
回顾了时空特征（如光流、运动模式和外观表征）作为深度网络输入的应用。
分析注意力机制和变换器在提升异常检测中上下文建模能力方面的探索。
评估自编码器、变分自编码器和生成对抗网络在通过重建或生成建模学习正常视频模式中的作用。
讨论将低级特征（如光流、梯度直方图）整合到深度网络中，以在无全监督条件下引导学习。
提出需要统一的、端到端的深度学习框架，联合学习特征并检测异常，以提升可部署性和鲁棒性。

实验结果

研究问题

RQ1不同深度学习架构（如自编码器、生成对抗网络、变换器）在多样化的视频数据集上检测异常的性能如何比较？
RQ2当前评估指标（帧级和像素级）在多大程度上能准确反映模型在真实世界异常检测任务中的表现？
RQ3现有数据集在多大程度上无法真实反映现实中的监控场景？如何改进？
RQ4弱监督或无监督学习在多大程度上可减少视频异常检测中对昂贵人工标注的依赖？
RQ5注意力机制和上下文建模在多大程度上可提升深度学习模型对细微或罕见异常的检测鲁棒性？

主要发现

深度学习方法通过自动学习具有判别性的时空特征，显著提升了视频异常检测性能，优于传统手工设计特征方法。
基于重建和未来帧预测的方法占主导地位，自编码器和生成对抗网络在UCSD和UCF-Crime等基准数据集上表现优异。
在深度网络中整合光流和外观特征可提升检测精度，尤其对基于运动的异常检测效果显著。
尽管已有进展，当前评估指标仍无法充分评估异常的空间定位能力，亟需更鲁棒且具备上下文感知能力的评估指标。
大规模数据集（如Sultani et al. (2018) 和 Liu et al. (2018) 的数据集）正推动更好的模型训练，但标注仍是主要瓶颈。
未来研究应优先关注端到端深度学习框架、上下文感知建模以及弱监督学习，以提升实际应用能力并减少对标注的依赖。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。