QUICK REVIEW

[論文レビュー] Time Series Anomaly Detection; Detection of anomalous drops with limited features and sparse examples in noisy highly periodic data

Dominique T. Shipmon, Jason M. Gurevitch|arXiv (Cornell University)|Aug 11, 2017

Anomaly Detection Techniques and Applications被引用数 63

ひとこと要約

本論文は、ノイズが多く高い周期性を持つ時系列における異常なドロップを、二段階アプローチで検出することを提案する：TensorFlowモデルによる回帰ベースの予測とルールベースの異常検出。持続的な異常と限られたラベル付きデータを重視する。

ABSTRACT

Google uses continuous streams of data from industry partners in order to deliver accurate results to users. Unexpected drops in traffic can be an indication of an underlying issue and may be an early warning that remedial action may be necessary. Detecting such drops is non-trivial because streams are variable and noisy, with roughly regular spikes (in many different shapes) in traffic data. We investigated the question of whether or not we can predict anomalies in these data streams. Our goal is to utilize Machine Learning and statistical approaches to classify anomalous drops in periodic, but noisy, traffic patterns. Since we do not have a large body of labeled examples to directly apply supervised learning for anomaly classification, we approached the problem in two parts. First we used TensorFlow to train our various models including DNNs, RNNs, and LSTMs to perform regression and predict the expected value in the time series. Secondly we created anomaly detection rules that compared the actual values to predicted values. Since the problem requires finding sustained anomalies, rather than just short delays or momentary inactivity in the data, our two detection methods focused on continuous sections of activity rather than just single points. We tried multiple combinations of our models and rules and found that using the intersection of our two anomaly detection methods proved to be an effective method of detecting anomalies on almost all of our models. In the process we also found that not all data fell within our experimental assumptions, as one data stream had no periodicity, and therefore no time based model could predict it.

研究の動機と目的

ノイズが多く高い周期性を持つ時系列ストリームにおける持続的なドロップの検出を動機づける。
限られたラベル付きデータで機能する二部構成の検出フレームワークを開発する。
Forecasting に機械学習（DNN、RNN、LSTM）を活用し、異常識別には統計法/ルールベースを用いる。
回帰ベースの予測と異常ルールの交差を評価し、検出性能を向上させる。
データに明確な周期性や時系列予測性が欠ける場合の限界を強調する。

提案手法

回帰を行い予測時系列の期待値を推定するために TensorFlow モデル（DNN、RNN、LSTM）を訓練する。
実値と予測値の乖離を検出する異常検出ルールを開発する。
孤立したポイントではなく持続的な異常活動のセグメントを検出することに焦点を当てる。
複数のモデル/ルールの出力を組み合わせ、それらの交差を最終検出器として用いる。

実験結果

リサーチクエスチョン

RQ1周期性はあるがノイズが多い時系列で、限られたラベル付き例で異常を検出できるか？
RQ2予測ベースの異常検出とルールベースの異常検出は互補して検出を改善できるか？
RQ3持続的な異常セグメントの検出を強制することは単一点の異常検出よりも優れているか？
RQ4周期性や予測可能性の欠如により時系列モデルが失敗するデータストリームはあるか？
RQ5モデル/ルールを組み合わせることの検出性能への影響は？

主な発見

二つの異常検出手法の交差は、テストされたほとんどのモデルで効果的であることが示された。
DNN、RNN、LSTMを回帰に用いて期待値を予測することを検討した。
異常は瞬間的なイベントではなく持続的な偏差として定義された。
一部のデータストリームは周期性を示さず、時系列ベースのモデリングには抵抗した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。