QUICK REVIEW

[论文解读] Real-Time Anomaly Detection and Localization in Crowded Scenes

Mohammad Sabokrou, Mahmood Fathy|arXiv (Cornell University)|Nov 21, 2015

Anomaly Detection Techniques and Applications参考文献 18被引用 48

一句话总结

本文提出了一种基于双视角描述符（全局与局部特征）的实时异常检测与定位方法，通过稀疏自编码器学习这些特征。通过使用高斯分布建模正常块模式并计算马氏距离，该方法在帧级和像素级异常检测中均实现了高精度，其像素级性能优于最先进方法，且运行速度达25 fps（在可容忍轻微误差时可达200 fps）。

ABSTRACT

In this paper, we propose a method for real-time anomaly detection and localization in crowded scenes. Each video is defined as a set of non-overlapping cubic patches, and is described using two local and global descriptors. These descriptors capture the video properties from different aspects. By incorporating simple and cost-effective Gaussian classifiers, we can distinguish normal activities and anomalies in videos. The local and global features are based on structure similarity between adjacent patches and the features learned in an unsupervised way, using a sparse auto- encoder. Experimental results show that our algorithm is comparable to a state-of-the-art procedure on UCSD ped2 and UMN benchmarks, but even more time-efficient. The experiments confirm that our system can reliably detect and localize anomalies as soon as they happen in a video.

研究动机与目标

解决在拥挤视频场景中实时异常检测与定位的挑战，现有方法常因计算成本过高或定位能力差而失效。
通过引入双视角描述符方法，克服基于轨迹与低级特征方法的局限性，以捕捉局部与全局时空模式。
开发一种计算高效的框架，实现实时处理（25 fps）同时保持高检测精度，尤其在像素级定位方面表现优异。
通过融合全局与局部特征表示与高斯分类方法，改进以往缺乏实时能力或无法精确定位异常的方法。

提出的方法

将每段视频表示为非重叠的立方体时空块，以实现对运动与结构的局部化分析。
通过在正常视频块上无监督训练的稀疏自编码器，学习具有区分性的全局与局部特征。
计算相邻块之间的结构相似性度量，以检测指示异常的突发时空变化。
在推理阶段，使用高斯分布对所有正常块进行建模，并应用马氏距离进行异常分类。
采用加权决策策略融合全局与局部视角的预测结果，以提升检测与定位精度。
使用带参数β的双像素级评估指标，以评估定位精度，实现对异常区域的细粒度检测。

Figure 1: The scheme of our algorithm ( left to right ): Input frames, two views of patches (global and local), modeling the data using Gaussian distributions, and making the final decision

实验结果

研究问题

RQ1与单视角或低级特征方法相比，双视角特征表示（全局与局部）是否能提升拥挤场景中实时异常检测与定位的性能？
RQ2通过稀疏自编码器进行特征学习，在多大程度上增强了正常块建模的判别能力，从而提升异常检测性能？
RQ3在像素级定位精度与计算效率方面，该方法与最先进方法相比表现如何？
RQ4全局与局部描述符的融合是否能实现实时视频流中更可靠的异常检测，并保持较低的误报率？

主要发现

在UCSD ped2数据集上，该方法的像素级等错误率（EER）为24%，优于次优方法（Li et al.，EER为29.9%）。
在UCSD ped2数据集上，该方法的帧级EER为19%，仅次于Li et al.（18.5%），差距为0.5%。
该方法在标准硬件（3.5 GHz CPU，8GB RAM）上以25 fps处理视频，当允许轻微误差时最高可达200 fps，显著快于竞争方法。
在UMN数据集上，该方法的EER为2.5%，AUC为99.6%，优于最佳先前结果（EER：2.8%），在帧级检测中表现出最先进性能。
双像素级评估表明，即使在β = 0.05和β = 0.10时，该方法仍保持高定位精度，且像素级性能与帧级结果高度一致。
全局与局部视角的融合显著提升了检测可靠性，各分类器表现均佳，尤其在UMN数据集上全局模型表现更优。

Figure 2: Video representation: Each video is represented through a number of non-overlapping cubic patches, covering the whole space-time in the video.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。