Skip to main content
QUICK REVIEW

[Paper Review] Automatic Anomaly Detection in the Cloud Via Statistical Learning

Jordan Hochenbaum, Owen Vallis|arXiv (Cornell University)|Apr 24, 2017
Anomaly Detection Techniques and Applications65 references132 citations
TL;DR

The paper introduces two seasonal anomaly detection methods, S-ESD and S-H-ESD, to automatically identify anomalies in cloud metrics by removing seasonality and using robust statistics.

ABSTRACT

Performance and high availability have become increasingly important drivers, amongst other drivers, for user retention in the context of web services such as social networks, and web search. Exogenic and/or endogenic factors often give rise to anomalies, making it very challenging to maintain high availability, while also delivering high performance. Given that service-oriented architectures (SOA) typically have a large number of services, with each service having a large set of metrics, automatic detection of anomalies is non-trivial. Although there exists a large body of prior research in anomaly detection, existing techniques are not applicable in the context of social network data, owing to the inherent seasonal and trend components in the time series data. To this end, we developed two novel statistical techniques for automatically detecting anomalies in cloud infrastructure data. Specifically, the techniques employ statistical learning to detect anomalies in both application, and system metrics. Seasonal decomposition is employed to filter the trend and seasonal components of the time series, followed by the use of robust statistical metrics -- median and median absolute deviation (MAD) -- to accurately detect anomalies, even in the presence of seasonal spikes. We demonstrate the efficacy of the proposed techniques from three different perspectives, viz., capacity planning, user behavior, and supervised learning. In particular, we used production data for evaluation, and we report Precision, Recall, and F-measure in each case.

Motivation & Objective

  • Motivate the need for automatic anomaly detection in cloud/service-oriented architectures with seasonal and multimodal time series.
  • Develop methods that separate seasonal/trend components and apply robust statistics for anomaly detection.
  • Evaluate the proposed techniques on production data across capacity planning, user behavior, and supervised learning scenarios.
  • Assess robustness to high anomaly percentages and provide guidance for production deployment.

Proposed method

  • Apply time series decomposition to obtain seasonal, trend, and residual components (modified STL with LOESS).
  • Use ESD on the residual component to detect anomalies (S-ESD).
  • Introduce Seasonal Hybrid ESD (S-H-ESD) that replaces mean/SD with robust statistics median and MAD for higher anomaly prevalence.
  • Extend STL to mitigate spurious residual anomalies by using the series median as a stable trend representation.
  • Provide an incremental deployment approach and compare S-ESD vs. S-H-ESD under varying anomaly rates.

Experimental results

Research questions

  • RQ1How can seasonality and multimodality in cloud time series be effectively handled for anomaly detection?
  • RQ2Do robust statistics (median and MAD) improve anomaly detection when a large fraction of data are anomalous?
  • RQ3What is the comparative performance of S-ESD and S-H-ESD in production settings across capacity planning, user behavior, and supervised anomaly injection?
  • RQ4What are practical considerations for deploying these methods in real-time cloud monitoring environments?

Key findings

  • S-ESD can detect both global and local anomalies after removing seasonal components.
  • S-H-ESD using median and MAD is more robust when a high percentage of data are anomalous.
  • The methods were evaluated on production data for system metrics and core drivers, reporting Precision, Recall, and F-measure across perspectives.
  • In production deployment, S-H-ESD with alpha=0.05 was selected for detecting anomalies in the monitored metrics.
  • Anomaly detection was demonstrated to support capacity planning, analysis of user behavior changes, and supervised learning experimentation with injected anomalies.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.