How can data assimilation combine machine learning with PDE-constrained models?

Data assimilation fuses observations and models to infer the state of a physical system. Combining machine learning with PDE-constrained models uses data-driven approximations alongside governing equations to reduce bias, accelerate computation, and quantify uncertainty. This hybridization is increasingly important in weather, oceanography, hydrology, and environmental monitoring where observations are sparse and models are imperfect but physically grounded.

Methods that bridge ML and PDEs

A common strategy treats the PDE model as a strong prior and uses ML to represent unknown components such as subgrid closures or model error. The Bayesian formulation of data assimilation emphasized by Andrew Stuart Caltech frames state estimation and parameter learning as inverse problems, making uncertainty explicit. Ensemble approaches developed by Geir Evensen University of Bergen, notably the Ensemble Kalman Filter, provide scalable ways to assimilate observations while learning corrections from data. Physics-informed neural networks pioneered by George Karniadakis Brown University embed differential operators into the ML loss so that learning respects PDE constraints. In practice, workflows alternate between (i) running a PDE-constrained forecast, (ii) comparing forecasts to observations, (iii) updating states/parameters via variational adjoints or ensemble updates, and (iv) training ML surrogates or correction models to reduce future errors.

Relevance, causes, and consequences

The impetus for hybrid assimilation arises from growing data volumes and the computational cost of high-fidelity PDE simulations. Machine learning can serve as surrogate models, accelerate adjoint calculations, or supply adaptive priors, improving forecast timeliness. Consequences include faster turnaround for operational forecasts and better uncertainty quantification when ML components are trained within the assimilation loop. However, improperly constrained ML models can violate conservation laws or amplify biases, producing overconfident or regionally skewed results that disproportionately affect vulnerable communities and ecosystems where data coverage is limited.

Practical considerations

Robust hybrid assimilation requires explicit uncertainty propagation, physical regularization, and transparent validation against independent observations. Domain expertise is essential: modelers must choose which PDE structure to retain, which terms to learn, and how to preserve interpretability. When done carefully, combining ML with PDE-constrained data assimilation improves predictive skill while maintaining scientific accountability and operational reliability.