QUICK REVIEW

[论文解读] RWF-2000: An Open Large Scale Video Database for Violence Detection

Ming Shien Cheng, Kunjing Cai|arXiv (Cornell University)|Nov 14, 2019

Human Pose and Action Recognition参考文献 43被引用 40

一句话总结

本文介绍了 RWF-2000 大规模暴力检测数据集（来自真实世界监控的 2,000 段剪辑）以及 Flow Gated Network，将 RGB 与光流进行自学习时序池化融合，在 RWF-2000 上的测试准确率达到 87.25%。

ABSTRACT

In recent years, surveillance cameras are widely deployed in public places, and the general crime rate has been reduced significantly due to these ubiquitous devices. Usually, these cameras provide cues and evidence after crimes are conducted, while they are rarely used to prevent or stop criminal activities in time. It is both time and labor consuming to manually monitor a large amount of video data from surveillance cameras. Therefore, automatically recognizing violent behaviors from video signals becomes essential. This paper summarizes several existing video datasets for violence detection and proposes the RWF-2000 database with 2,000 videos captured by surveillance cameras in real-world scenes. Also, we present a new method that utilizes both the merits of 3D-CNNs and optical flow, namely Flow Gated Network. The proposed approach obtains an accuracy of 87.25% on the test set of our proposed database. The database and source codes are currently open to access.

研究动机与目标

推动在真实世界监控中实现自动暴力检测，以减少人工监控工作负担。
提供一个真实、规模较大的数据集（RWF-2000），包含真实监控画面且暴力/非暴力片段均衡。
提出一种新颖模型，利用外观（RGB）与运动（光流）并结合自学习池化，以改进时间特征聚合。
将所提出的方法与现有暴力检测数据集及基线进行对比评估，以证明其实用性和鲁棒性。

提出的方法

引入 Flow Gated Network，具有两个输入流（RGB 和光流），共享相似的 3D CNN 骨干。
实现深度可分离的 3D 卷积，以在降低参数的同时保持性能。
使用自学习的池化机制，其中光流门控在进行时序最大池化之前对 RGB 特征进行缩放。
通过合并块将 RGB 与光流输出融合，并使用最终的全连接分类器。
使用 64 帧剪辑，分辨率 224x224，5 通道输入（RGB + 两个光流分量）并进行数据增强；使用 SGD 动量 (0.9) 及学习率衰减进行训练。

实验结果

研究问题

RQ1一个大规模的真实世界监控视频数据集能否提高暴力检测的鲁棒性和泛化能力？
RQ2将 RGB 外观与光流驱动的门控整合，是否能在时间特征池化方面优于传统池化方案？
RQ3在此任务中，深度可分离 3D 卷积与标准 3D 卷积之间的权衡是什么？

主要发现

RWF-2000 包含 2,000 段剪辑，80% 为训练，20% 为测试，暴力与非暴力样本均衡混合。
带融合（P3D）的 Flow Gated Network 在 RWF-2000 上实现 87.25% 的测试准确率，优于若干基线。
RGB-only 与 OPT-only 变体的表现均劣于融合模型，突出多模态融合的好处。
深度可分离的 3D 卷积在显著减少参数的同时，与标准 3D 卷积相比表现略优或相近。
在 RWF-2000 数据集上，表现最好的模型（fusion P3D）使用 272,690 个参数，达到 87.25% 的测试准确率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。