QUICK REVIEW

[论文解读] RLAD: Time Series Anomaly Detection through Reinforcement Learning and Active Learning

Tong Wu, Jorge Ortiz|arXiv (Cornell University)|Mar 31, 2021

Anomaly Detection Techniques and Applications参考文献 34被引用 23

一句话总结

RLAD 是一种新颖的半监督时间序列异常检测框架，结合了深度强化学习（DRL）与主动学习，仅需极少的标注数据即可实现最先进性能。它能动态适应非平稳数据，并在所有对比的无监督与半监督方法中表现最优，与最佳无监督基线相比，F1 分数最高提升 4.4 倍，且仅使用 0.1% 的标注数据。

ABSTRACT

We introduce a new semi-supervised, time series anomaly detection algorithm that uses deep reinforcement learning (DRL) and active learning to efficiently learn and adapt to anomalies in real-world time series data. Our model - called RLAD - makes no assumption about the underlying mechanism that produces the observation sequence and continuously adapts the detection model based on experience with anomalous patterns. In addition, it requires no manual tuning of parameters and outperforms all state-of-art methods we compare with, both unsupervised and semi-supervised, across several figures of merit. More specifically, we outperform the best unsupervised approach by a factor of 1.58 on the F1 score, with only 1% of labels and up to around 4.4x on another real-world dataset with only 0.1% of labels. We compare RLAD with seven deep-learning based algorithms across two common anomaly detection datasets with up to around 3M data points and between 0.28% to 2.65% anomalies.We outperform all of them across several important performance metrics.

研究动机与目标

为解决时间序列异常检测中标签数据有限的挑战，尤其是在非平稳现实环境中的挑战。
减少对人工超参数调优以及对数据分布强先验假设的依赖。
开发一种动态自适应模型，通过与数据的交互和选择性标注实现持续改进。
在最小化标注数据的前提下，超越现有无监督与半监督深度学习方法在异常检测准确率方面的表现。

提出的方法

RLAD 在流式时间序列设置中，使用深度 Q 网络（DQN）智能体选择最具信息量的样本进行标注。
它采用主动学习机制，仅查询最不确定或最具信息量的样本，从而最小化标注工作量。
该模型整合了标签传播机制，利用已标注与未标注数据共同优化预测结果。
它基于 F1 分数设计奖励函数，引导 DRL 智能体做出最优标注决策。
该框架通过变分自编码器的重建损失与表示学习中的互信息最大化相结合，实现端到端训练。
智能体根据环境反馈持续调整其策略，从而实现对概念漂移的长期适应。

实验结果

研究问题

RQ1结合主动学习的深度强化学习能否显著减少有效时间序列异常检测所需的标签数量？
RQ2在低标签率条件下，RLAD 相较于最先进无监督与半监督异常检测方法的表现如何？
RQ3在无需重新训练的情况下，RLAD 能在多大程度上适应非平稳数据分布？
RQ4DRL 与主动学习的结合是否能带来比现有方法更快的收敛速度与更好的泛化能力？

主要发现

在 A1Benchmark 数据集上，仅使用 1% 的标注数据，RLAD 的 F1 分数比表现最佳的无监督方法（SPOT）高出 59%。
在 KPI 数据集上，RLAD 仅使用 0.1% 的标注样本即获得 F1 分数 0.778，较 Deep-SAD（F1 = 0.128）高出逾 6 倍。
在 A2Benchmark 数据集上，使用 1% 标注数据时，RLAD 的 F1 分数比最佳无监督方法高出 1.58 倍。
在 KPI 数据集上，RLAD 最快仅需 300 个训练周期即可收敛，训练仅需 1500 个样本（0.05%）与 3000 个样本（0.1%）的标注数据。
在 Yahoo 数据集上，RLAD 在 A1Benchmark 与 A2Benchmark 上分别实现了接近完美的 F1 分数（10% 标注数据下分别为 0.8 与 1.0）。
在各项实验中，RLAD 的 F1 分数最高比最先进半监督方法 Deep-SAD 提升 10 倍，且仅使用了其分数之一的标注数据。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。