QUICK REVIEW

[논문 리뷰] Time Series Anomaly Detection; Detection of anomalous drops with limited features and sparse examples in noisy highly periodic data

Dominique T. Shipmon, Jason M. Gurevitch|arXiv (Cornell University)|2017. 08. 11.

Anomaly Detection Techniques and Applications인용 수 63

한 줄 요약

본 논문은 소음이 많고 고도로 주기적인 시계열 데이터에서 이상 하락을 탐지하기 위하여 두 부분으로 구성된 접근법을 제시한다: TensorFlow 모델을 이용한 회귀 기반 예측과 규칙 기반 이상 탐지, 지속적인 이상에 대한 강조와 제한된 라벨링 데이터에 중점을 둔다.

ABSTRACT

Google uses continuous streams of data from industry partners in order to deliver accurate results to users. Unexpected drops in traffic can be an indication of an underlying issue and may be an early warning that remedial action may be necessary. Detecting such drops is non-trivial because streams are variable and noisy, with roughly regular spikes (in many different shapes) in traffic data. We investigated the question of whether or not we can predict anomalies in these data streams. Our goal is to utilize Machine Learning and statistical approaches to classify anomalous drops in periodic, but noisy, traffic patterns. Since we do not have a large body of labeled examples to directly apply supervised learning for anomaly classification, we approached the problem in two parts. First we used TensorFlow to train our various models including DNNs, RNNs, and LSTMs to perform regression and predict the expected value in the time series. Secondly we created anomaly detection rules that compared the actual values to predicted values. Since the problem requires finding sustained anomalies, rather than just short delays or momentary inactivity in the data, our two detection methods focused on continuous sections of activity rather than just single points. We tried multiple combinations of our models and rules and found that using the intersection of our two anomaly detection methods proved to be an effective method of detecting anomalies on almost all of our models. In the process we also found that not all data fell within our experimental assumptions, as one data stream had no periodicity, and therefore no time based model could predict it.

연구 동기 및 목표

소음이 많고 고도로 주기적인 시계열 스트림에서 지속적인 하락 탐지의 필요성을 제시한다.
제한된 라벨링 데이터로 작동하는 두 부분으로 구성된 탐지 프레임워크를 개발한다.
예측을 위해 기계 학습(DNN, RNN, LSTM)을 활용하고 이상 식별을 위해 통계적/규칙 기반 방법을 활용한다.
회귀 기반 예측과 이상 규칙의 교차를 평가하여 탐지 성능을 향상시킨다.
데이터가 명확한 주기성이나 시간 기반 예측 가능성이 없을 때의 한계를 부각한다.

제안 방법

TensorFlow 모델(DNN, RNN, LSTM)을 학습시켜 회귀를 수행하고 기대 시계열 값을 예측한다.
실제 값과 예측 값 간의 편차를 표시하는 이상 탐지 규칙을 개발한다.
고립된 포인트가 아니라 이상 활동의 연속 구간을 탐지하는 데 초점을 맞춘다.
여러 모델/규칙의 출력을 결합하고 교집합을 최종 탐지기로 사용한다.

실험 결과

연구 질문

RQ1제한된 라벨링 예제로도 주기적이지만 잡음이 많은 시계열의 이상이 탐지될 수 있는가?
RQ2예측 기반 이상 탐지기와 규칙 기반 탐지기가 서로 보완하여 탐지를 향상시키는가?
RQ3지속적인 이상 구간의 탐지를 강제하는 것이 단일 포인트 탐지보다 우수한가?
RQ4주기성이나 예측 가능성의 부족으로 시간 기반 모델이 실패하는 데이터 스트림이 있는가?
RQ5모델/규칙을 결합하는 것이 탐지 성능에 미치는 영향은 무엇인가?

주요 결과

두 이상 탐지 방법의 교집합은 테스트된 대부분의 모델에서 효과적인 것으로 확인됐다.
기대 값을 예측하기 위한 회귀에 대해 DNN, RNN, LSTM이 탐색되었다.
이상은 순간적 이벤트가 아니라 지속된 편차로 정의되었다.
일부 데이터 스트림은 주기성을 나타내지 않아 시간 기반 모델링에 저항했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.