Which causal discovery algorithms can scale to complex AI models?

Causal discovery for large, complex AI models requires algorithms that trade off statistical identifiability, computational cost, and integration with representation learning. In practice, approaches that relax combinatorial search into continuous optimization or that exploit invariance across environments have proven most scalable to high-dimensional, nonlinear settings.

Algorithmic families that scale

Continuous optimization methods such as NOTEARS convert the discrete directed acyclic graph search into a differentiable objective that can be optimized with gradients; the original formulation was developed by Xun Zheng and colleagues including Eric P. Xing at Carnegie Mellon University. Variants extend NOTEARS to handle nonlinearities and neural-network parameterizations, letting structure learning be embedded inside deep learning pipelines so that modern GPUs and stochastic optimizers can be used. Graph neural network approaches and variational formulations likewise push scalability by learning latent representations together with graph structure, enabling application to large feature sets and complex models.

Methods based on invariance and interventions scale differently: instead of exhaustively searching graph space, they test for stable predictive relationships across environments. The theoretical framing of causality and interventions by Judea Pearl at University of California Los Angeles underpins the rationale for using experimental or naturally heterogeneous data to identify causal directions. Practical invariant-prediction techniques often require fewer structural assumptions and can be efficient when multiple environments or randomized perturbations are available.

Traditional constraint-based algorithms (for example the PC family) and score-based searches (for example greedy equivalence search) remain important, but without strong sparsity or dimensionality reductions their worst-case complexity limits application to very large feature spaces. Work by researchers such as Bernhard Schölkopf at Max Planck Institute for Intelligent Systems emphasizes combining domain knowledge and invariance principles to reduce the effective search space.

Practical constraints, causes and consequences

Scalability is caused by a combination of algorithmic design (continuous relaxations, gradient-based learning), hardware (GPU-parallel training), and data conditions (sufficient perturbations, interventions, or strong conditional-independence signals). Consequences include improved interpretability when causal structure can be estimated for complex predictors, but also risks: learned causal graphs can propagate sampling biases, reflect cultural or territorial measurement differences, and mislead policy if interventions are infeasible. Practical use therefore typically couples scalable algorithms with expert knowledge, targeted experiments, and careful validation on out-of-distribution environments.