[论文解读] Time Series Anomaly Detection; Detection of anomalous drops with limited features and sparse examples in noisy highly periodic data
本文探讨在嘈杂、高度周期性的时间序列中检测异常下降,采用两部分方法:基于回归的预测(使用 TensorFlow 模型)和基于规则的异常检测,强调持续异常和标注数据有限。
Google uses continuous streams of data from industry partners in order to deliver accurate results to users. Unexpected drops in traffic can be an indication of an underlying issue and may be an early warning that remedial action may be necessary. Detecting such drops is non-trivial because streams are variable and noisy, with roughly regular spikes (in many different shapes) in traffic data. We investigated the question of whether or not we can predict anomalies in these data streams. Our goal is to utilize Machine Learning and statistical approaches to classify anomalous drops in periodic, but noisy, traffic patterns. Since we do not have a large body of labeled examples to directly apply supervised learning for anomaly classification, we approached the problem in two parts. First we used TensorFlow to train our various models including DNNs, RNNs, and LSTMs to perform regression and predict the expected value in the time series. Secondly we created anomaly detection rules that compared the actual values to predicted values. Since the problem requires finding sustained anomalies, rather than just short delays or momentary inactivity in the data, our two detection methods focused on continuous sections of activity rather than just single points. We tried multiple combinations of our models and rules and found that using the intersection of our two anomaly detection methods proved to be an effective method of detecting anomalies on almost all of our models. In the process we also found that not all data fell within our experimental assumptions, as one data stream had no periodicity, and therefore no time based model could predict it.
研究动机与目标
- 激励在嘈杂、高度周期性时间序列流中检测持续下降。
- 开发一个在有限标注数据下可工作的两部分检测框架。
- 利用机器学习(DNNs、RNNs、LSTMs)进行预测,以及统计/基于规则的方法来识别异常。
- 评估回归预测与异常规则交叉的效果以提升检测。
- 强调当数据缺乏明确周期性或时间预测性时的局限性。
提出的方法
- 训练 TensorFlow 模型(DNNs、RNNs、LSTMs)以执行回归并预测预期的时间序列值。
- 开发异常检测规则,用于标记实际值与预测值之间的偏差。
- 专注于检测持续的异常活动片段,而非孤立点。
- 结合多个模型/规则的输出,并将它们的交集作为最终检测器。
实验结果
研究问题
- RQ1在有限标注样本的情况下,周期性但嘈杂的时间序列中能否检测到异常?
- RQ2基于预测的和基于规则的异常检测器是否互补以提高检测?
- RQ3强制检测持续的异常片段是否优于单点异常检测?
- RQ4是否存在由于缺乏周期性或可预测性而导致时间模型失败的数据流?
- RQ5将模型/规则结合对检测性能有什么影响?
主要发现
- 两种异常检测方法的交集在大多数测试模型中都被证明是有效的。
- 探索了 DNN、RNN 和 LSTM 用于回归来预测预期值。
- 异常被定义为持续偏差,而非瞬时事件。
- 某些数据流没有呈现周期性,因此难以进行时间建模。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。