Skip to main content
QUICK REVIEW

[Paper Review] Order-independent constraint-based causal structure learning

Diego Colombo, Marloes H. Maathuis|arXiv (Cornell University)|Jan 1, 2014
Bayesian Modeling and Causal Inference30 references466 citations
TL;DR

This paper proposes order-independent modifications to constraint-based causal structure learning algorithms like PC, FCI, and RFCI, which traditionally produce variable results depending on input variable order—especially problematic in high-dimensional settings. The authors introduce consistent, high-dimensional adaptations that eliminate order-dependence while preserving statistical consistency, demonstrating improved stability and performance in simulations and real yeast gene data.

ABSTRACT

We consider constraint-based methods for causal structure learning, such as the PC-, FCI-, RFCI- and CCD- algorithms (Spirtes et al., 1993, 2000; Richardson, 1996; Colombo et al., 2012; Claassen et al., 2013). The first step of all these algorithms consists of the adjacency search of the PC-algorithm. The PC-algorithm is known to be order-dependent, in the sense that the output can depend on the order in which the variables are given. This order-dependence is a minor issue in low-dimensional settings. We show, however, that it can be very pronounced in high-dimensional settings, where it can lead to highly variable results. We propose several modifications of the PC-algorithm (and hence also of the other algorithms) that remove part or all of this order-dependence. All proposed modifications are consistent in high-dimensional settings under the same conditions as their original counterparts. We compare the PC-, FCI-, and RFCI-algorithms and their modifications in simulation studies and on a yeast gene expression data set. We show that our modifications yield similar performance in low-dimensional settings and improved performance in high-dimensional settings. All software is implemented in the R-package pcalg.

Motivation & Objective

  • Address the issue of order-dependence in constraint-based causal structure learning algorithms such as PC, FCI, RFCI, and CCD, which can produce inconsistent results depending on variable input order.
  • Identify that order-dependence becomes particularly severe in high-dimensional settings, leading to highly variable and unreliable causal graphs.
  • Develop modifications to the PC-algorithm and its extensions that reduce or eliminate order-dependence while maintaining statistical consistency under the same conditions as the original methods.
  • Ensure the proposed modifications are applicable to other constraint-based algorithms like FCI, RFCI, and CCD by leveraging shared algorithmic components.
  • Validate the improvements through comprehensive simulation studies and real-world application on a yeast gene expression dataset.

Proposed method

  • Modify the adjacency search phase of the PC-algorithm to be order-independent by introducing a consistent variable ordering strategy based on conditional independence tests.
  • Introduce a dynamic variable ordering mechanism that adapts during the skeleton discovery phase, ensuring that the same skeleton is recovered regardless of input order.
  • Preserve the conditional independence testing framework of the original PC-algorithm but restructure the order in which v-structures and conditional independence queries are processed.
  • Extend the order-independent approach to FCI and RFCI algorithms by adapting the same principles to their respective skeleton and orientation phases.
  • Ensure all modifications remain statistically consistent in high-dimensional settings under the same assumptions as the original algorithms, including faithfulness and sparse conditional independence structures.
  • Implement all proposed algorithms in the R package pcalg for reproducibility and broad accessibility to the research community.

Experimental results

Research questions

  • RQ1To what extent does variable order affect the output of constraint-based causal structure learning algorithms like PC, FCI, and RFCI in high-dimensional settings?
  • RQ2Can modifications be designed to eliminate order-dependence in the PC-algorithm while preserving statistical consistency in high-dimensional data?
  • RQ3How do the order-independent variants of PC, FCI, and RFCI compare to their original counterparts in terms of accuracy and stability across simulation studies?
  • RQ4Do the proposed modifications improve performance on real-world high-dimensional data, such as gene expression datasets?
  • RQ5Can the same order-independent principles be generalized across multiple constraint-based algorithms, including FCI and RFCI?

Key findings

  • The proposed order-independent modifications significantly reduce result variability in high-dimensional settings, where original PC-algorithm outputs were highly sensitive to input variable order.
  • In low-dimensional settings, the modified algorithms achieve performance comparable to the original PC, FCI, and RFCI algorithms, indicating no loss in accuracy.
  • On a yeast gene expression dataset, the order-independent variants produced more stable and biologically plausible causal structures than the original algorithms.
  • The simulation studies confirmed that the modified algorithms maintain statistical consistency under the same conditions as the original methods, even in high-dimensional regimes.
  • The modifications were successfully extended to FCI and RFCI, demonstrating that order-independence can be achieved across multiple constraint-based causal discovery frameworks.
  • All implementations are available in the R package pcalg, enabling widespread adoption and reproducibility in empirical research.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.