QUICK REVIEW

[論文レビュー] Anomaly Locality in Video Surveillance

Federico Landi, Cees G. M. Snoek|arXiv (Cornell University)|Jan 29, 2019

Anomaly Detection Techniques and Applications参考文献 16被引用数 36

ひとこと要約

本論文は、全フレーム動画の代わりに時空間アクションチューブを用いることで異常検知の局所性を探究し、境界ボックス注釈を備えた UCFCrime2Local を導入し、局所性が検出を改善し弱教師あり提案を可能にすることを示す。

ABSTRACT

This paper strives for the detection of real-world anomalies such as burglaries and assaults in surveillance videos. Although anomalies are generally local, as they happen in a limited portion of the frame, none of the previous works on the subject has ever studied the contribution of locality. In this work, we explore the impact of considering spatiotemporal tubes instead of whole-frame video segments. For this purpose, we enrich existing surveillance videos with spatial and temporal annotations: it is the first dataset for anomaly detection with bounding box supervision in both its train and test set. Our experiments show that a network trained with spatiotemporal tubes performs better than its analogous model trained with whole-frame videos. In addition, we discover that the locality is robust to different kinds of errors in the tube extraction phase at test time. Finally, we demonstrate that our network can provide spatiotemporal proposals for unseen surveillance videos leveraging only video-level labels. By doing, we enlarge our spatiotemporal anomaly dataset without the need for further human labeling.

研究の動機と目的

監視ビデオ内の異常に局在性を活用するために時空間チューブの使用を動機づける。
チューブ抽出モジュール、3D CNNビデオエンコーダー、異常スコアリングの回帰ヘッドからなる学習可能なモデルを提案する。
トレーニングとテストのための境界ボックス注釈付き異常データセットである UCFCrime2Local を作成・公開する。
局所化誤差への頑健性を示し、弱教師ありの時空間提案の可能性を示す。

提案手法

Tube extraction module はフレームを切り抜き、時空間チューブを形成するようにリサイズする。
I3D（inflated 3D convnet）を RGB と optical flow の二ストリーム特徴でエンコードに用いる。
特徴を減らすための1x1畳み込みを適用し、それに続く全結合層(1024, 256, 64, 1) に ReLU と 50% ドロップアウトを用いてA(X)を[0,1]の異常スコアとして回帰する。
SGD（lr=0.001、Nesterov momentum 0.9）を用いた平均二乗誤差で、16フレームのセグメントを対象にミニバッチあたり5セグメント、10エポックで学習。
UCFCrime に境界ボックスを付与してトレイン/テストで UCFCrime2Local を作成し、full-frame ベースラインに対するチューブベース評価を可能にする。
局所化誤差への頑健性を実験的に評価し、 unseen videos の弱教師ありチューブ提案を検証する。

実験結果

リサーチクエスチョン

RQ1時空間チューブに焦点を当てることは、全フレーム動画セグメントと比較して異常検知を改善するか？
RQ2チューブ抽出における局所化誤差に対するチューブベースの異常検知の頑健性はどの程度か？
RQ3チューブベースの提案は unseen videos に対する弱教師あり学習を可能にするか？
RQ4実世界の監視データにおける局所性が異常検知性能に与える影響は何か？

主な発見

設定	AUC (%)
ビデオセグメント	56.12
オラクルチューブ	74.73

Tube-based (oracle tube) 異常検知は、full-frame video segment ベースラインを substantial に上回る（彼らの設定で AUC が 18.61 ポイント改善）。
局所化誤差に対して頑健で、ボックスサイズが地真実ボックスの約75%〜400%の範囲で性能が安定する。
複数のチューブを用いてスコアを統合することで、 unseen data からの提案を活用した場合に、厳密に監視されたアプローチを上回ることもある強力な弱教師あり性能を得られる。
UCFCrime2Local データセットは training と test の両方で境界ボックス監督を提供し、時空間異常解析とより広い弱教師あり機会を可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。