QUICK REVIEW

[论文解读] Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

Dan Xu, Elisa Ricci|arXiv (Cornell University)|Oct 6, 2015

Anomaly Detection Techniques and Applications参考文献 36被引用 64

一句话总结

本文提出外观与运动深度网络（AMDN），一种新颖的无监督深度学习框架，通过堆叠去噪自编码器联合学习外观与运动表征，并采用结合早期融合与晚期融合优势的双重融合策略进行特征融合。该方法在UCSD和Train数据集上达到最先进性能，在帧级与像素级异常检测中分别取得Ped1和Ped2数据集上0.952和0.938的AUC分数。

ABSTRACT

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.

研究动机与目标

解决在复杂、拥挤的视频监控场景中检测异常事件的挑战，此类场景下手工设计特征受限于先验假设。
通过深度自编码器无监督地学习外观与运动模式的丰富、判别性表征。
通过一种新颖的双重融合策略结合早期与晚期融合优势，融合外观、运动及联合表征，以提升异常检测性能。
在异常检测与定位方面均优于现有最先进方法，实现卓越性能。

提出的方法

该框架采用堆叠去噪自编码器（SDAE）从视频片段中分别学习外观与运动特征的深层表征。
通过在输入第三个小SDAE前拼接外观与运动特征，学习联合表征，实现模态特异性特征的早期融合。
在外观、运动和联合表征上独立训练一类SVM，以生成各自的异常分数。
采用晚期融合策略，通过学习得到的权重（αA, αM, αJ）组合三个异常分数，生成最终检测输出。
网络通过随机梯度下降（SGD）带动量进行预训练，采用高斯噪声污染（方差0.0003），并设定固定超参数（λ=0.01, λF=0.0001, Nb=256）。
融合权重通过交叉验证调优，Ped1设置为[0.2,0.5,0.3]，Ped2设置为[0.2,0.4,0.4]。

实验结果

研究问题

RQ1深度自编码器能否有效学习视频中用于无监督异常检测的判别性外观与运动表征？
RQ2结合早期与晚期融合优势的混合融合策略是否在异常检测中优于单独使用早期或晚期融合？
RQ3与仅使用模态特异性特征相比，联合外观-运动表征是否能提升检测性能？
RQ4在帧级与像素级异常检测准确率方面，所提出的AMDN框架与最先进方法相比表现如何？

主要发现

在UCSD Ped1数据集上，所提出的AMDN在帧级评估中达到0.952的AUC与0.126的EER，优于大多数现有方法。
在像素级异常定位中，AMDN在Ped1上实现0.938的AUC与0.152的EER，超越所有对比方法。
双重融合策略显著提升性能，AMDN优于仅使用联合表征的早期融合基线与仅使用外观与运动表征的晚期融合基线。
在Train数据集上，PR曲线表明AMDN优于所有基线方法，包括主导行为学习与高斯混合模型。
Ped1与Ped2的融合权重分别为[0.2,0.5,0.3]与[0.2,0.4,0.4]，表明在异常检测中对运动特征的依赖更高。
该方法在不同数据集间泛化能力强，在复杂且异构的监控场景中表现出强鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。