[Paper Review] Order-independent constraint-based causal structure learning
This paper proposes order-independent modifications to constraint-based causal structure learning algorithms like PC, FCI, and RFCI, which traditionally produce variable results depending on input variable order—especially problematic in high-dimensional settings. The authors introduce consistent, high-dimensional adaptations that eliminate order-dependence while preserving statistical consistency, demonstrating improved stability and performance in simulations and real yeast gene data.
We consider constraint-based methods for causal structure learning, such as the PC-, FCI-, RFCI- and CCD- algorithms (Spirtes et al., 1993, 2000; Richardson, 1996; Colombo et al., 2012; Claassen et al., 2013). The first step of all these algorithms consists of the adjacency search of the PC-algorithm. The PC-algorithm is known to be order-dependent, in the sense that the output can depend on the order in which the variables are given. This order-dependence is a minor issue in low-dimensional settings. We show, however, that it can be very pronounced in high-dimensional settings, where it can lead to highly variable results. We propose several modifications of the PC-algorithm (and hence also of the other algorithms) that remove part or all of this order-dependence. All proposed modifications are consistent in high-dimensional settings under the same conditions as their original counterparts. We compare the PC-, FCI-, and RFCI-algorithms and their modifications in simulation studies and on a yeast gene expression data set. We show that our modifications yield similar performance in low-dimensional settings and improved performance in high-dimensional settings. All software is implemented in the R-package pcalg.
Motivation & Objective
- Address the issue of order-dependence in constraint-based causal structure learning algorithms such as PC, FCI, RFCI, and CCD, which can produce inconsistent results depending on variable input order.
- Identify that order-dependence becomes particularly severe in high-dimensional settings, leading to highly variable and unreliable causal graphs.
- Develop modifications to the PC-algorithm and its extensions that reduce or eliminate order-dependence while maintaining statistical consistency under the same conditions as the original methods.
- Ensure the proposed modifications are applicable to other constraint-based algorithms like FCI, RFCI, and CCD by leveraging shared algorithmic components.
- Validate the improvements through comprehensive simulation studies and real-world application on a yeast gene expression dataset.
Proposed method
- Modify the adjacency search phase of the PC-algorithm to be order-independent by introducing a consistent variable ordering strategy based on conditional independence tests.
- Introduce a dynamic variable ordering mechanism that adapts during the skeleton discovery phase, ensuring that the same skeleton is recovered regardless of input order.
- Preserve the conditional independence testing framework of the original PC-algorithm but restructure the order in which v-structures and conditional independence queries are processed.
- Extend the order-independent approach to FCI and RFCI algorithms by adapting the same principles to their respective skeleton and orientation phases.
- Ensure all modifications remain statistically consistent in high-dimensional settings under the same assumptions as the original algorithms, including faithfulness and sparse conditional independence structures.
- Implement all proposed algorithms in the R package pcalg for reproducibility and broad accessibility to the research community.
Experimental results
Research questions
- RQ1To what extent does variable order affect the output of constraint-based causal structure learning algorithms like PC, FCI, and RFCI in high-dimensional settings?
- RQ2Can modifications be designed to eliminate order-dependence in the PC-algorithm while preserving statistical consistency in high-dimensional data?
- RQ3How do the order-independent variants of PC, FCI, and RFCI compare to their original counterparts in terms of accuracy and stability across simulation studies?
- RQ4Do the proposed modifications improve performance on real-world high-dimensional data, such as gene expression datasets?
- RQ5Can the same order-independent principles be generalized across multiple constraint-based algorithms, including FCI and RFCI?
Key findings
- The proposed order-independent modifications significantly reduce result variability in high-dimensional settings, where original PC-algorithm outputs were highly sensitive to input variable order.
- In low-dimensional settings, the modified algorithms achieve performance comparable to the original PC, FCI, and RFCI algorithms, indicating no loss in accuracy.
- On a yeast gene expression dataset, the order-independent variants produced more stable and biologically plausible causal structures than the original algorithms.
- The simulation studies confirmed that the modified algorithms maintain statistical consistency under the same conditions as the original methods, even in high-dimensional regimes.
- The modifications were successfully extended to FCI and RFCI, demonstrating that order-independence can be achieved across multiple constraint-based causal discovery frameworks.
- All implementations are available in the R package pcalg, enabling widespread adoption and reproducibility in empirical research.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.