QUICK REVIEW

[论文解读] MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

Dan Li, Dacheng Chen|arXiv (Cornell University)|Jan 15, 2019

Anomaly Detection Techniques and Applications参考文献 25被引用 138

一句话总结

MAD-GAN 使用基于 LSTM 的 GAN 来建模多变量时间序列，并通过结合判别分数和重建分数来检测异常，在 SWaT 和 WADI 网络攻击数据集上进行评估。

ABSTRACT

The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.

研究动机与目标

通过在带有限标签异常的情况下，推动对网络物理系统中多变量时间序列数据的异常检测。
提出 MAD-GAN，以通过基于 LSTM 的 GAN 捕获时序和跨变量的依赖关系。
开发将判别损失和重建损失结合的 DR-score 以进行异常检测。
在真实 CPS 数据集 SWaT 与 WADI 上评估 MAD-GAN，以评估入侵检测性能。

提出的方法

构建带有 LSTM-RNN 生成器和判别器的 GAN，以将正常多变量时间序列建模为序列。
通过滑动窗口将多变量时间序列分割为重叠子序列，以捕捉时序动态。
在标准 GAN 的极小极大框架下训练 G 和 D，以学习正常数据的分布。
使用训练好的生成器进行基于重建的异常打分，利用判别器进行基于判别的打分。
将重建损失和判别损失结合成 DR-Score，以检测跨子序列和映射回原始时间序列的异常。
使用在不同窗口大小上的精确度、召回率和 F1 进行异常检测评估；并将 MAD-GAN 与 PCA、KNN、Feature Bagging、AE 和 EGAN 进行对比。

实验结果

研究问题

RQ1MAD-GAN 是否能够有效建模多变量时间序列依赖关系，以实现 CPS 数据的无监督异常检测？
RQ2通过同时利用 GAN 的判别器和生成器（通过 DR-Score）是否能在异常检测上优于仅使用单一方面的方法？
RQ3在真实的 CPS 数据集 SWaT 和 WADI 的网络攻击情景下，MAD-GAN 与其他无监督方法相比的表现如何？

主要发现

MAD-GAN 在 SWaT 的最佳 F1 平衡配置下实现接近 100% 的精确率并具有较高的召回率，优于若干基线。
在 WADI 上，MAD-GAN 的召回率高（在某些配置中最高可达 99.99%），精确度波动，表明对入侵检测有效且误报可控。
在各数据集上，MAD-GAN 通常在至少一个评价指标上优于 PCA、KNN、FB、AE 和 EGAN；SWaT 上的最佳 F1 达到 0.70，KDDCUP99 上达到 0.90。
研究显示多变量建模的收益；多变量 GAN 训练（相较于单变量）能生成更快且更真实的样本（MMD 收敛）。
DR-Score 能有效融合重建残差与判别输出，以在子序列与时间步上检测异常。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。