Skip to main content
QUICK REVIEW

[Paper Review] High-Dimensional, Multiscale Online Changepoint Detection

Yudong Chen, Tengyao Wang|arXiv (Cornell University)|Mar 7, 2020
Statistical Methods and Inference39 references15 citations
TL;DR

This paper proposes a high-dimensional, multiscale online changepoint detection method for streaming p-variate Gaussian data with unknown mean shifts. It uses likelihood ratio tests across multiple scales and coordinates, enabling online processing with constant storage and computation per observation, while maintaining theoretical guarantees on patience (average run length under the null) and response delay that depend on the sparsity of the mean change vector.

ABSTRACT

Abstract We introduce a new method for high-dimensional, online changepoint detection in settings where a p-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations; in practice, it may even be significantly faster than this. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package ocd, and we also demonstrate its utility on a seismology data set.

Motivation & Objective

  • Address the challenge of detecting small, sparse mean changes in high-dimensional streaming data where traditional univariate methods lack sensitivity.
  • Develop an online algorithm with constant storage and computational complexity per observation, independent of historical data size.
  • Ensure theoretical control over false positive rates (patience) and response delay under the alternative, especially for sparse mean changes.
  • Enable practical detection in real-time applications such as seismology, where early detection of subtle shifts is critical.
  • Provide a unified framework that aggregates evidence across scales and coordinates to enhance detection power without prior knowledge of change magnitude or location.

Proposed method

  • Perform likelihood ratio tests against simple alternatives of varying scales in each of the p coordinates to detect local deviations.
  • Aggregate test statistics across scales and coordinates using a multiscale scanning approach to enhance sensitivity to sparse, low-magnitude changes.
  • Maintain online operation by storing only compact sufficient statistics, ensuring computational and storage complexity per new observation is independent of past data.
  • Use a thresholding strategy based on the maximum of aggregated test statistics to trigger alarms when a changepoint is detected.
  • Leverage concentration inequalities and Gaussian tail bounds to derive theoretical guarantees on patience and response delay.
  • Incorporate sparsity-aware bounds via unit vector decomposition (Lemma 17) and recursive sequence analysis (Lemma 18) to control error probabilities.

Experimental results

Research questions

  • RQ1Can we design an online changepoint detection procedure for high-dimensional data streams that maintains constant computational and storage costs per observation?
  • RQ2How can we improve detection sensitivity for small, sparse mean changes by aggregating evidence across multiple scales and coordinates?
  • RQ3What theoretical guarantees can be provided for the patience (average run length under the null) and worst-case response delay under the alternative?
  • RQ4How does the method’s performance depend on the sparsity of the mean change vector, and can it adapt to unknown change magnitude and location?
  • RQ5Can the method be practically applied to real-world streaming data, such as seismic signals, with reliable detection and low false alarm rates?

Key findings

  • The proposed method achieves a patience (average run length under the null) that is at least at the nominal level, ensuring controlled false positive rates.
  • The worst-case response delay is bounded and depends on the ℓ2-norm of the mean change vector, with tighter bounds for sparse changes.
  • The average-case response delay is uniformly controlled over the alternative hypothesis space, with explicit dependence on the sparsity of the change vector.
  • Simulations confirm the method’s practical effectiveness in detecting small, sparse changes in high-dimensional streams, outperforming univariate and non-adaptive multiscale methods.
  • The method is implemented in the R package ocd, enabling real-world deployment in applications such as seismology.
  • On a real seismology data set, the method successfully detected subtle seismic shifts earlier and more reliably than baseline approaches, demonstrating its utility in high-stakes monitoring scenarios.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.