QUICK REVIEW

[Paper Review] Deterministic parallel analysis

Edgar Dobriban, Art B. Owen|arXiv (Cornell University)|Nov 11, 2017

Random Matrices and Applications24 references1 citations

TL;DR

This paper introduces Deterministic Parallel Analysis (DPA), a faster and more reproducible alternative to traditional parallel analysis for selecting the number of factors in factor analysis. By replacing random simulations with deterministic computation, DPA maintains accuracy in detecting large factors while reducing computational cost; a deflated variant (DDPA) and its improved version (DDPA+) further mitigate the shadowing effect, enabling detection of smaller, meaningful factors—demonstrated effectively on HGDP genomic data.

ABSTRACT

Factor analysis is widely used in many application areas. The first step, choosing the number of factors, remains a serious challenge. One of the most popular methods is parallel analysis (PA), which compares the observed factor strengths to simulated ones under a noise-only model. This paper presents a deterministic version of PA (DPA), which is faster and more reproducible than PA. We show that DPA selects large factors and does not select small factors just like [Dobriban, 2017] shows for PA. Both PA and DPA are prone to a shadowing phenomenon in which a strong factor makes it hard to detect smaller but more interesting factors. We develop a deflated version of DPA (DDPA) that counters shadowing. By raising the decision threshold in DDPA, a new method (DDPA+) also improves estimation accuracy. We illustrate our methods on data from the Human Genome Diversity Project (HGDP). There PA and DPA select seemingly too many factors, while DDPA+ selects only a few. A Matlab implementation is available.

Motivation & Objective

To address the computational inefficiency and lack of reproducibility in traditional parallel analysis (PA) due to reliance on random simulations.
To develop a deterministic alternative to PA that maintains statistical validity while improving speed and reproducibility.
To mitigate the shadowing phenomenon, where strong factors obscure the detection of smaller but scientifically meaningful factors.
To enhance factor selection accuracy by raising decision thresholds in a deflated framework, leading to more parsimonious and interpretable results.

Proposed method

Proposes Deterministic Parallel Analysis (DPA), which replaces random simulation in PA with a deterministic algorithm based on the Marchenko-Pastur distribution.
Uses the empirical spectral distribution of a random matrix under the null hypothesis to compute critical eigenvalues without Monte Carlo sampling.
Introduces Deflated DPA (DDPA), which iteratively removes the influence of selected factors before reapplying DPA to detect weaker factors.
Develops DDPA+, a variant of DDPA that raises the decision threshold to improve estimation accuracy and reduce overfitting.
Employs a deflation mechanism that projects out the contribution of already-selected factors to reduce bias in subsequent eigenvalue comparisons.
Validates the method using real data from the Human Genome Diversity Project (HGDP), comparing results across PA, DPA, DDPA, and DDPA+.

Experimental results

Research questions

RQ1Can a deterministic alternative to parallel analysis be developed that maintains statistical power while eliminating randomness and improving reproducibility?
RQ2To what extent does DPA preserve the factor selection properties of traditional PA, particularly in detecting large factors and avoiding false positives for small ones?
RQ3How does the shadowing effect—where strong factors mask weaker but meaningful ones—affect DPA, and can it be mitigated?
RQ4Can a deflation-based extension of DPA (DDPA) effectively recover smaller, scientifically relevant factors that are obscured in standard DPA?
RQ5Does increasing the decision threshold in DDPA (yielding DDPA+) lead to improved estimation accuracy and more parsimonious factor selection?

Key findings

DPA achieves comparable factor detection performance to PA but with significantly reduced computation time and full reproducibility due to deterministic computation.
DPA successfully identifies large factors and avoids selecting spurious small factors, confirming its consistency with the theoretical properties of PA as shown in Dobriban (2017).
The shadowing effect remains a challenge in DPA, where dominant factors prevent detection of smaller, potentially meaningful factors.
DDPA effectively mitigates the shadowing effect by iteratively deflating the data, enabling detection of previously obscured smaller factors.
DDPA+ further improves estimation accuracy by raising the decision threshold, resulting in fewer, more interpretable factors—demonstrated by selecting only a few factors on HGDP data where PA and DPA selected many.
On the HGDP dataset, PA and DPA selected a large number of factors, while DDPA+ selected a more parsimonious and biologically plausible number, supporting its practical utility.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.