Reading time: 37 minute
...

📝 Original Info

  • Title:
  • ArXiv ID: 2512.22421
  • Date:
  • Authors: Unknown

📝 Abstract

We present a latent diffusion-based differentiable inversion method (LD-DIM) for PDEconstrained inverse problems involving high-dimensional spatially distributed coefficients. LD-DIM couples a pretrained latent diffusion prior with an end-to-end differentiable numerical solver to reconstruct unknown heterogeneous parameter fields in a low-dimensional nonlinear manifold, improving numerical conditioning and enabling stable gradient-based optimization under sparse observations. The proposed framework integrates a latent diffusion model (LDM), trained in a compact latent space, with a differentiable finite-volume discretization of the forward PDE. Sensitivities are propagated through the discretization using adjoint-based gradients combined with reverse-mode automatic differentiation. Inversion is performed directly in latent space, which implicitly suppresses ill-conditioned degrees of freedom while preserving dominant structural modes, including sharp material interfaces. The effectiveness of LD-DIM is demonstrated using a representative inverse problem for flow in porous media, where heterogeneous conductivity fields are reconstructed from spatially sparse hydraulic head measurements. Numerical experiments assess convergence behavior and reconstruction quality for both Gaussian random fields and bimaterial coefficient distributions. The results show that LD-DIM achieves consistently improved numerical stability and reconstruction accuracy of both parameter fields and corresponding PDE solutions compared with physicsinformed neural networks (PINNs) and physics-embedded variational autoencoder (VAE) baselines, while maintaining sharp discontinuities and reducing sensitivity to initialization.

📄 Full Content

Inverse problems constrained by partial differential equations (PDEs) arise ubiquitously in physics and engineering, where unknown spatially distributed parameters need to be inferred from sparse and indirect observations. Such problems are typically ill-posed and high-dimensional, exhibiting severe non-uniqueness and strong sensitivity to data availability, parameterization, and numerical conditioning, particularly when the governing physics involves heterogeneous media and sharp spatial contrasts.

A canonical example of such challenges arises in subsurface flow modeling, where hydraulic properties such as conductivity exert a dominant influence on flow and transport processes in groundwater systems [7]. These properties cannot be measured exhaustively due to the prohibitive cost and spatial sparsity of borehole investigations, and must instead be inferred indirectly from hydraulic or tracer observations [21,27,8]. As a result, inverse modeling has become a central computational tool for subsurface characterization [61,45,38]. However, subsurface inverse problems are notoriously ill-posed [66,62], owing to the combination of high-dimensional parameter spaces [40], limited and unevenly informative observations [41], and spatially variable parameter sensitivities. These difficulties are further exacerbated by strong geological heterogeneity, which introduces sharp spatial contrasts in subsurface properties and amplifies numerical ill-conditioning [24].

A widely adopted strategy for mitigating ill-posedness is to reduce the effective dimensionality of the parameter space [40,11]. Numerous low-dimensional parameterization methods have been proposed, including transform-based approaches such as the discrete cosine transform (DCT) [32], statistical techniques such as principal component analysis (PCA) and its variants (K-PCA, O-PCA) [48,55,37], and stochastic representations such as the Karhunen-Loève expansion (KLE) [53,63]. While effective in certain settings, these methods are typically built on linear or Gaussian assumptions and rely primarily on second-order statistics. As a consequence, they struggle to represent nonlinear, non-Gaussian geological features, often producing overly smooth parameter fields that fail to preserve connectivity and sharp interfaces essential for realistic flow and transport behavior [9,38].

Recent advances in deep generative modeling have opened new avenues for nonlinear dimensionality reduction in inverse problems. Autoencoders and variational autoencoders (VAEs) learn compact latent representations that capture complex spatial patterns beyond the reach of linear techniques [28,39]. In subsurface applications, Laloy et al. [35] demonstrated that VAE-based parameterizations can achieve orders-of-magnitude dimensionality reduction while substantially accelerating Markov chain Monte Carlo (MCMC) inversion. Subsequent studies have integrated such latent representations with differentiable forward solvers to enable gradient-based inversion [58]. Nevertheless, VAE-based approaches often yield overly smooth reconstructions and exhibit limited ability to preserve sharp geological contrasts when observational data are sparse. To preserve geological realism and multiple-point statistics, Laloy et al. [34] proposed spatial generative adversarial networks (GANs) to learn lowdimensional representations. However, GAN-based approaches usually suffer from training instability and strong sensitivity to hyperparameter choices.

More recently, generative diffusion models have emerged as a powerful and stable paradigm for high-fidelity image synthesis [30,46]. By progressively transforming noise into structured samples, diffusion models have demonstrated superior expressiveness and robustness compared to VAEs and GANs, and have shown strong potential as priors for inverse problems [46,56]. Latent diffusion models (LDMs) further enhance computational efficiency by performing diffusion in a compact latent space while retaining strong generative capacity [46]. In the subsurface context, recent studies have shown that LDMs can learn geologically informed latent manifolds that encode multiscale heterogeneity and complex structural patterns [56,64,36,15,14]. For example, Zhan et al. [64] demonstrated that LDMs can learn crossscale representations of subsurface heterogeneity and establish consistent mappings among geological structures, flow responses, and concentration fields within a unified framework. Related work has also explored conditional LDMs for subsurface modeling, including the generation of reservoir facies distributions and associated responses constrained by prescribed geological patterns [36]. These studies suggest that LDMs are capable of learning compact, geologically informed latent manifolds and well suited for inverse modeling.

Building on this direction, Di Federico and Durlofsky [15] proposed a physics-guided LDM for low-dimensional parameterization of facies-based geomodels and demonstrated successful history matching via ensemble smoother with multiple data assimilation (ESMDA) in two-dimensional channelized systems. However, most existing deep generative model-based inversion approaches [14,15,10,12] usually rely on ensemble-based calibration, such as ESMDA [19,15], or Bayesian posterior sampling, which require numerous computationally prohibitive forward simulations. This computational burden limits scalability and motivates inversion frameworks that directly exploit differentiability and latent-space optimization for large-scale subsurface flow problems.

In parallel, advances in differentiable numerical simulation [31] have enabled discretized PDE solvers to be embedded directly within optimization and inverse modeling pipelines via automatic differentiation [2]. This paradigm has received increasing attention in computational geosciences [49,23,58], solid mechanics [60,16,17], and fluid mechanics [3]. By propagating gradients through the numerical discretization, differentiable solvers provide a flexible alternative to classical adjoint implementations and enable end-to-end gradient-based inversion. It is worth noting that automatic differentiation of PDEs also forms a core component in the development of physics-informed neural networks (PINNs) [44,33]. However, the key distinction between these two paradigms lies in how the governing equations are incorporated into the learning process: PINNs embed the continuous PDE formulation directly into the loss function, whereas differentiable numerical solvers differentiate through a fully discretized forward model within the optimization loop.

Motivated by these developments, this work proposes the latent Diffusion-based differentiable inversion method (LD-DIM), a computational framework that couples a pretrained latent diffusion model with a fully differentiable numerical solver to reconstruct PDEconstrained inverse problems in a low-dimensional latent space. In LD-DIM, the LDM provides a nonlinear mapping from low-dimensional latent variables to spatially distributed parameter fields, acting as an implicit prior that suppresses ill-conditioned modes while preserving complex geological structures. The forward physics is solved using a differentiable finite-volume discretization of Darcy flow, and sensitivities with respect to latent variables are obtained via end-to-end reverse-mode automatic differentiation combined with adjoint-based gradient evaluation [18,43,59]. This formulation enables direct gradient-based optimization in latent space, substantially improving numerical stability and computational efficiency relative to inversion in the original high-dimensional parameter space.

The inverse problem for flow in heterogeneous porous media (Darcy flow) is employed as a representative testbed to examine the computational behavior of the proposed framework, including convergence characteristics, sensitivity to observation sparsity, and the ability to recover both smooth Gaussian fields and sharp-interface bimaterial structures. Numerical experiments demonstrate that LD-DIM outperforms existing approaches, including PINNs [44] applied to subsurface flow [25] and physics-embedded VAEs [58], achieving up to a fivefold reduction in reconstruction error while preserving geological discontinuities. To support efficient forward and backward computations, the finite-volume solver is implemented in JAX [22,5], enabling efficient reverse-mode automatic differentiation for gradient-based optimization.

The remainder of this paper is organized as follows. Section 2 presents the proposed methodology, including latent diffusion model training (Section 2.2) and the LD-DIM framework (Section 2.4). Section 3 provides comprehensive numerical validation through two case studies: Gaussian conductivity fields with varying correlation lengths (Section 3.2) and complex bimaterial distributions with sharp interfaces (Section 3.3), along with performance comparisons against established methods. Finally, Section 4 concludes the paper with a discussion of key findings and future research directions.

This section introduces the latent diffusion differentiable inversion model (LD-DIM), a physics-constrained inversion framework that integrates a pretrained latent diffusion model with a differentiable discretization-based solver for groundwater flow. The core idea is to encode the high-dimensional conductivity field in a compact latent space learned by the LDM, enabling efficient inversion over a significantly reduced set of unknowns using adjoint-based gradients while strictly enforcing the governing flow physics.

We consider steady-state groundwater flow in a confined, isotropic aquifer as a representative example. For single-phase flow in heterogeneous porous media, the governing equations consist of Darcy’s law:

where u, K, and h denote the Darcy velocity, hydraulic conductivity, and pressure head, respectively, and the continuity equation:

which applies in the absence of volumetric sources or sinks. The spatial heterogeneity of the conductivity field is represented through a continuum description K(x).

The forward problem can be expressed compactly as the mapping M : K → h, where M denotes the solution operator that computes the hydraulic head field corresponding to a given conductivity distribution.

In this study, we develop a differentiable finite-volume solver to compute the steady-state solution of Eq. (3), as detailed in Section 2.3.

As discussed in the Introduction, estimating a heterogeneous conductivity field from sparse observations leads to a severely ill-posed inverse problem. The high dimensionality of the unknown field, together with the limited and spatially biased nature of available data, results in strong non-uniqueness and uncertainty. To alleviate this ill-posedness, in this study we employ the generative latent diffusion model (LDM) that learns a compact latent representation of conductivity fields. This low-dimensional latent space reduces the effective number of unknowns while embedding prior geological structure, yielding a more tractable and physically consistent inversion framework.

LDMs offer a powerful parameterization framework by encoding high-dimensional hydrological fields into low-dimensional latent representations [15,64,65]. As illustrated in Fig. 1, an LDM consists of two major components: a variational autoencoder (VAE) that compresses input fields into latent variables, and a U-Net denoiser [47] that performs the reverse diffusion process within this latent space. In the remainder of this subsection, we review the basic components of LDMs for completeness. We first introduce classical diffusion models, followed by the VAE architecture used for latent compression. We then describe the training procedure and diffusion process of the full LDM within the latent space. For additional details, we refer readers to [30,46].

Diffusion models are a class of generative models that transform data into noise through a forward diffusion process and recover it through a learned reverse denoising process. In the standard formulation, denoising diffusion probabilistic models (DDPMs) [30] add Gaussian noise to data via a Markov chain governed by a noise schedule {β t } T t=1 and train a neural network to predict the injected noise at each timestep. Denoising diffusion implicit models (DDIMs) [52] extend DDPMs by introducing a deterministic reverse process that shares the same forward procedure and training objective. This deterministic formulation in DDIMs enables sampling with reduced reverse steps, which results in significantly faster inference, without requiring retraining, and is therefore adopted in this study.

I. Forward process. The forward (diffusion) process follows a Markov chain that gradually adds Gaussian noise to the data according to a noise schedule {β t } T t=1 , where T denotes the total number of diffusion steps. Define α t := 1 -β t and ᾱt := t s=1 α s . Each transition step is a Gaussian distribution:

Accordingly, the marginal distribution at any timestep admits a closed-form expression:

which can equivalently be sampled using the reparameterization

II. Reverse process. The reverse process removes noise by predicting the noise component at each timestep. The learnable denoiser ε θ (z t , t), which is typically implemented with a U-Net architecture and parameterized by θ (see Fig. 1b), is used together with the noise schedule {α t } T t=1 . The DDIM deterministic update step is given by

for t = T, . . . , 1. This deterministic denoising process constitutes the core generative mechanism of the DDIM-based diffusion model. III Loss function. The neural network model ε θ in Eq. ( 7) is trained using the simplified noise-matching objective:

where ε ∼ N (0, I) is a standard Gaussian noise sample, which is also the ground truth noise to be predicted by the network. This objective trains the U-Net denoiser to predict the noise injected at each forward diffusion timestep. The same trained model supports both DDPM stochastic sampling and DDIM deterministic generation. In DDIM sampling, one initializes z T ∼ N (0, I) and iteratively applies the update in Eq. ( 7) for t = T, . . . , 1 to obtain z 0 .

To enable diffusion processes to operate in a compact domain, a VAE is employed to learn a latent representation of conductivity fields. As shown in Fig. 1a, the VAE consists of an encoder E ϕ and a decoder D ψ .

Given a conductivity field K, the encoder produces a pair (z µ , z σ ) = E ϕ (K), which parameterize an approximate Gaussian posterior over latent variables conditioned on the data:

where z µ , z σ ∈ R d are the predicted mean and standard deviation vectors, and d is the latent dimension. To enable backpropagation through the stochastic sampling process, a latent variable is drawn using the reparameterization trick:

where z 0 ∈ R d , and then mapped back to the conductivity space using the decoder,

where K represents the reconstructed approximation of the original input K.

The VAE is trained to reconstruct input fields while enforcing a well-structured latent space. Assume a standard normal prior p(z 0 ) = N (0, I), the model is trained by minimizing the following loss:

where ϕ and ψ are the trainable parameters, respectively, D KL (•∥•) denotes the Kullback-Leibler divergence, and λ KL is a weighting factor that balances reconstruction fidelity and latent regularization. The first term encourages accurate recovery of the input data, while the KL divergence regularizes the latent posterior to remain close to the prior, making the learned space smooth, compact, and suitable for sampling in the subsequent diffusion model. Specifically, the ℓ 1 -norm is used for reconstruction loss to promote sparsity and preserve sharp contrasts in the reconstructed conductivity fields, which is particularly important for capturing geological features such as sharp interfaces and channelized structures.

After training, the encoder defines the latent manifold in which the diffusion process (formulated in Eqs. ( 4) and ( 7)) is applied.

The training procedures to construct a LDM surrogate are carried out in two sequential stages, as shown in Fig. 1b. First, given a dataset of conductivity field samples, we train a VAE using the loss function defined in (10) to learn a low-dimensional latent representation (Fig. 1a). The encoder maps each conductivity field K to a latent code z 0 = E ϕ (K) following Eq. ( 9), while the decoder reconstructs the field as K = D ψ (z 0 ). Second, once the VAE (i.e., E ϕ and D ψ ) is trained and fixed, a diffusion model is trained in the learned latent space (Fig. 1b). As described in Section 2.2.1, a U-Net denoiser ε θ (z t , t) is trained to model the reverse diffusion process by minimizing the noise-prediction loss defined in Eq. (8).

During sampling (i.e., the generation phase), the model initializes with a random latent vector z T ∼ N (0, I) and iteratively denoises it using the DDIM update (Eq. ( 7)) to obtain a clean latent code z 0 . This latent code is then decoded by D ψ (z 0 ) to generate a conductivity realization K.

To solve the steady-state Darcy flow equation in Eq. ( 3) given a reconstructed conductivity field, we employ a finite volume method (FVM) [29,20]. The core idea of FVM is to integrate the governing equation over a finite control volume V l , corresponding a grid cell:

Applying the divergence theorem, the volume integral of the divergence term is converted into a surface integral over the cell boundary ∂V l :

where n is the outward unit normal vector. Note that the face fluxes are approximated using a two-point flux approximation (TPFA), which models each flux as the product of a face transmissibility and the head jump. This cell-centered TPFA scheme preserves local conservation and achieves first-order accuracy on Cartesian grids [20,54]. Discretizing the domain using a uniform Cartesian grid with N degrees of freedom, the FVM yields a sparse linear system:

where K, ĥ ∈ R N are the discrete conductivity and pressure head fields, A ∈ R N ×N is the stiffness matrix, and b ∈ R N encodes the boundary conditions. Dirichlet boundary conditions are imposed by replacing the corresponding matrix rows with identity rows and assigning the prescribed head values to the right-hand side [42,54], while homogeneous Neumann (no-flux) boundaries are handled by zeroing out the normal transmissibility on boundary faces [20]. A detailed derivation of the 2D FVM discretization for heterogeneous isotropic media is provided in Appendix A.

To support gradient-based inversion, the FVM solver is implemented in JAX [22,5], enabling efficient computation of reverse-mode gradients via automatic differentiation. This makes the discrete forward operator M : K → ĥ fully differentiable, allowing for exact adjoint-based sensitivity analysis. Compared to conventional finite-difference sensitivity analysis, this differentiable solver ensures seamless integration with the LDM framework and supports scalable optimization of latent variables within the coupled PDE-constrained system.

The proposed LD-DIM framework integrates a pretrained LDM (Section 2.2) with the differentiable finite-volume solver (Section 2.3) to reconstruct spatially varying parameter fields from sparse observations (Fig. 2). By treating the LDM as a learned prior, the resulting inversion is formulated as a deterministic optimization problem in a low-dimensional latent space. This formulation constrains the inverse solution to a physically admissible latent manifold while rigorously enforcing the governing equations through the embedded physicsbased solver. Furthermore, an efficient optimization procedure (Algorithm 1) is developed that operates entirely in the low-dimensional latent space, combining automatic differentiation through the generative model with adjoint-based gradient computation for fully end-to-end optimization.

Given an initial latent guess ẑ ∼ N (0, I), the LDM generates a conductivity field K = G( ẑ; θ, ψ), where G denotes the generative sampling operator involving the learned reverse diffusion process ε θ and the trained decoder D ψ . The details of the generative process are provided in Section 2.2.3.

With the pretrained LDM surrogate K = G( ẑ), the LD-DIM framework is formulated as the following PDE-constrained optimization problem:

Here, the constraint is enforced using the differentiable finite-volume solver introduced earlier, and the objective function J is detailed in the following subsection.

As outlined in Section 2.2, the LDM is pretrained on synthetic conductivity fields. The VAE is first trained using Eq. ( 10) to learn the encoder E ϕ and decoder D ψ , followed by training the diffusion model in the latent space using Eq. ( 8).

During inversion, the pretrained LDM defines a generative mapping G(•) that transforms latent variables ẑ into corresponding conductivity fields K. Therefore, the optimization is carried out with respect to the latent space, ensuring that the inferred parameters remain confined to the geologically consistent manifold learned from the synthetic training data.

The inversion minimizes the objective function:

where ℓ measures the data misfit and R is a regularization term. The misfit term is defined as:

with h * i and ĥi denoting the measured and predicted hydraulic heads at observation locations {x i } i∈I obs , respectively. The regularization term

encourages the latent variables to remain close to the standard normal prior, with β controlling the regularization strength.

Leveraging the end-to-end differentiable framework, we employ efficient hybrid gradientbased optimization to solve for the optimal latent variables in Eq. ( 14). The sensitivity of the objective function J ( ẑ) in Eq. ( 15) with respect to the latent variable ẑ is computed via the chain rule:

where ∂ K/∂ ẑ is computed through automatic differentiation with the pretrained LDM. Since the simulated pressure field ĥ depends on the conductivity field K through the PDE constraint, the gradient ∇ K ℓ is given by

To compute this efficiently, we adopt the discrete adjoint method [18,43] to compute this sensitivity efficiently. Letting r = ∇ ĥℓ, the adjoint equation is

where λ ∈ R N is the discrete adjoint state. Once λ is obtained, the conductivity gradient is computed as

where

represents the sensitivity of the stiffness matrix with respect to the conductivity field, which can be constructed analytically using the local dependence of K on each grid cell. Implementation details can be referred to Appendix B. We remark that the matrix A is reused in both the forward solve and adjoint computation, ensuring consistency between the primal and adjoint problems while improving computational efficiency. This adjoint-based approach avoids the explicit construction or storage of the dense Jacobian ∂ ĥ ∂ K , thereby substantially reducing memory requirements and computational overhead. The overall computational workflow of the proposed LD-DIM framework for estimating unknown conductivity fields is summarized in Algorithm 1.

While the present study focuses on a steady-state elliptic PDE, the proposed hybrid adjoint-automatic differentiation strategy, combining discrete adjoint methods for the numerical solver with automatic differentiation through the latent generative model, is not specific to Darcy flow. The same computational paradigm applies to a broad class of PDE-constrained inverse problems with differentiable discretizations. Extensions to more complex forward models are conceptually straightforward but require problem-specific adjoint formulations, such as those reflected in Eqs. ( 20) and ( 21). The pretrained latent diffusion model maps a low-dimensional latent vector ẑ into a conductivity field via reverse denoising and decoding. The generated conductivity field is input to a differentiable finite-volume solver that computes the hydraulic head ĥ, which is compared with observations h * in the objective function. Gradients with respect to latent variable ẑ are efficiently computed by combining adjoint-based gradients through the finite-volume solver and automatic differentiation through the pre-trained latent diffusion model, enabling end-to-end optimization in the compact latent space.

1: Initialize latent vector ẑ(0) ∼ N (0, I) 2: for ν = 0 to maxIter do 3:

Generate conductivity field: K ← G( ẑ(ν) )

4:

Solve PDE system (13) for ĥ

Compute loss J via (15) 6:

Compute gradient ∇ ẑ ℓ via chain rule and adjoint (Eqs. ( 18)-( 21))

7:

Compute total gradient:

Update latent vector: ẑ(ν+1) ← ẑ(ν) -η ∇ ẑ J 9: end for 10: return Final conductivity estimate: K ← G( ẑ * )

We evaluate the effectiveness of the proposed LD-DIM framework through a series of numerical experiments on PDE-constrained inverse problems for flow in porous media. The objective is to reconstruct spatially heterogeneous conductivity fields from sparse hydraulic head observations using two representative test cases: (1) Gaussian conductivity fields with varying correlation lengths (Section 3.2), and (2) complex bimaterial distributions with sharp interfaces (Section 3.3).

For each case, the LDM is pretrained on the corresponding synthetic datasets generated using the procedures described below. This setup enables a systematic evaluation of reconstruction performance under controlled conditions.

where ∂/∂n denotes the outward normal derivative on the boundary. The domain is discretized using a uniform 100 × 100 Cartesian grid (i.e., N = 10, 000), with grid spacing ∆x = ∆y = 1 m. Synthetic head observations are sampled at interior points according to uniform spatial patterns of varying density to assess the robustness of LD-DIM under different levels of data sparsity.

To assess the accuracy of reconstructed conductivity and pressure head fields, we employ 3 complementary metrics that quantify both global error magnitude and structural fidelity:

• Relative L 2 error:

where the symbols (•) and (•) * denote the reconstructed and reference (ground-truth) quantities, respectively, and ∥•∥ 2 denotes the L 2 norm over the discretized grid.

• Mean-corrected relative L 2 error (for Bimaterial fields): For two-phase bimaterial cases characterized by discontinuous interfaces, we adopt a mean-adjusted error to better capture spatial pattern mismatch:

where K * is the spatial mean of K * . This error metric is used exclusively for bimaterial cases in Section 3.3.

• Structural Similarity Index (SSIM): Originally developed for image quality assessment [57], the SSIM metric measures the structural similarity between two images by considering luminance, contrast, and structure. For arbitrary two random fields, p and q, the SSIM metric is defined as:

where µ p and µ q are local means, σ 2 p and σ 2 q are local variances, and σ pq is the local covariance between p and q. In our context, p and q correspond to the ground-truth and reconstructed conductivity fields, respectively. The stabilizing constants are

with L denoting the dynamic range and default values k 1 = 0.01, k 2 = 0.03. SSIM values range from -1 to 1, with higher scores indicating greater structural similarity. This metric is especially useful for evaluating the spatial coherence of reconstructed conductivity fields [6].

Unless otherwise specified, all error metrics are evaluated over the entire grid.

We begin with a benchmark scenario involving Gaussian random fields to represent heterogeneous conductivity distributions. Section 3.2.1 outlines the synthetic data generation process and the training setup for the latent diffusion model. Next, we present quantitative and visual inversion results (Section 3.2.2) and compare LD-DIM with a physics-informed neural network (PINN) baseline that directly enforces Darcy’s law during training (Section 3.2.3). Through these analyses, we assess the reconstruction accuracy, structural consistency, and robustness of LD-DIM across different correlation lengths and observation densities.

In this example, the spatially correlated conductivity fields are generated using a spectral synthesis procedure [50]. The log-conductivity field Y (x) is constructed through applying an inverse Fourier transformation to spectrally filtered white noise:

Here, ξ ∼ N (0, 1) is complex white noise, F and F -1 denote discrete Fourier and inverse Fourier transforms, k is the wavenumber vector, and λ controls the spatial correlation length. Increasing λ leads to smoother and more spatially coherent fields by suppressing high-frequency components. To ensure numerical stability, the log-field Y in Eq. ( 26) is standardized to zero mean and unit variance, then clipped to the interval [-3, 3] before applying the exponential transformation, the resulting conductivity field K is subsequently normalized to the range [0, 1]. A total of 2,000 conductivity field samples are generated using Eq. ( 26), evenly distributed across seven correlation lengths λ ∈ [0.1, 0.7] to capture a broad spectrum of spatial variability. Of these, 1,400 samples are randomly selected to train the LDM, 400 for LDM validation, and the remaining 200 are reserved for inverse modeling tests. Figure 3 shows representative samples, highlighting the transition from highly heterogeneous fields to smoother spatial structures as λ increases. The LDM prior is trained on the generated fields using the approach described in Section 2.2.3. A variational autoencoder compresses each conductivity field into a 12 × 12 latent representation, followed by diffusion training in the latent space. This enables efficient sampling and supports gradient-based inversion within a geologically informed manifold.

For the following inversion tests, reference hydraulic head solutions are computed using the FVM solver introduced in Section 2.3, based on the corresponding conductivity fields. Some ground-truth conductivity-head pairs used in inverse tests for λ = 0.1, 0.2, 0.3, and 0.4 are displayed in Fig. 4.

We evaluate the ability of LD-DIM to recover various testing conductivity fields from sparse hydraulic head observations using the reference solutions illustrated in Fig. 4, where 256 hydraulic head observations arranged in a 16 × 16 uniform grid are used. We note that these observations are generated from conductivity fields not seen during training, thereby assessing out-of-distribution inversion capability. Fig. 5 presents representative inversion results across correlation lengths from λ = 0.1 to 0.4, with each row showing the reconstructed conductivity field, and the corresponding absolute error in K and hydraulic head h. With a fixed number of h observations, the quality of inversion is strongly influenced by the spatial correlation length. Nevertheless, LD-DIM successfully captures the overall structure of the underlying conductivity fields across all cases. The conductivity reconstruction error ϵ K decreases significantly from 4.87 × 10 -1 at λ = 0.1 to 3.19 × 10 -2 at λ = 0.4, while SSIM scores increase from 0.709 to 0.987. These trends highlight the influence of spatial correlation: larger λ leads to more coherent patterns that are easier to infer, whereas small λ presents significant challenges due to limited data support for fine-scale heterogeneity.

In addition to conductivity, LD-DIM achieves high-quality predictions of the full hydraulic head field. As shown in Fig. 5, the pointwise error in h remains below 10% across all cases, regardless of correlation length. The largest discrepancies in h tend to occur in regions with highly localized head variations, especially in the λ = 0.1 and λ = 0.3 cases (see Fig. 4). Despite this, the overall predictive performance remains robust, confirming the effectiveness of LD-DIM in recovering both field structure and flow response from limited head data alone.

The ability of LD-DIM to infer physically plausible patterns under such sparse supervision underscores the critical role of the pretrained LDM as a learned geological prior. This prior regularizes the solution space, enabling stable and meaningful inversion even in severely ill-posed settings. Unlike conventional PDE-constrained or PINN-based methods (e.g., Fig. 7 and Fig. 8), which often in the absence of explicit conductivity data, LD-DIM avoids numerical instability or over-smoothing by constraining the solution within a geologically plausible latent manifold.

Moreover, Fig. 6 shows the loss trajectory for the representative case with λ = 0.4, illustrating the efficient and stable convergence of gradient-based optimization within the latent space under the LD-DIM framework.

In Section 3.2.4, we further extend the framework to a hybrid setting that incorporates sparse direct conductivity observations. This extension enhances reconstruction accuracy, particularly for cases with small correlation lengths, where limited supervision would otherwise hinder performance.

Here, we introduce a PINN-based approach for comparison, which enforces Darcy’s law as a soft constraint in the loss function [44,25]. In this method, both the conductivity field and the hydraulic head are modeled using fully connected neural networks, each comprising three hidden layers with 60 neurons and hyperbolic tangent activation functions. The networks are trained by minimizing a composite loss function that combines data misfit with the residuals of the governing PDE and boundary conditions:

where Following [25], we set w b = w f = 1 and w reg = 10 -6 . Importantly, unlike the original formulation in [25], Eq. ( 27) does not include direct measurements of the conductivity field K, aligning the experiment setup with the LD-DIM inversion in Section 3.2.2. Figure 7 and Figure 8 show representative inverse modeling results for the two extremes of spatial correlation length: λ = 0.1 (high heterogeneity) and λ = 0.4 (smooth fields). In both cases, LD-DIM significantly outperforms the PINN-based method. At λ = 0.1, LD-DIM achieves a relative conductivity error of ϵ K = 4.87 × 10 -1 and an SSIM of 0.709, while PINN yields a much larger error ϵ K = 1.064 and fails to reconstruct meaningful structure (SSIM These results demonstrate that LD-DIM consistently achieves higher inversion accuracy by effectively capturing essential spatial heterogeneity, while avoiding the over-smoothed reconstructions commonly produced by PINN. The PINN approach, lacking explicit supervision on the conductivity field, struggles with high-dimensional optimization and does not possess inductive bias to reliably constrain the solution.

The superior performance of LD-DIM stems from its core design: it constrains the inverse problem within a low-dimensional, geologically informed latent space learned by the pretrained diffusion model. This constraint inherently steers the generative process toward physically plausible conductivity patterns, even under sparse supervision. Additionally, the differentiable forward model enables efficient gradient-based updates via reverse-mode automatic differentiation. In contrast, the PINN method relies solely on soft enforcement of PDE residuals and generic network expressiveness, which prove inadequate in severely ill-posed inverse problem.

In addition to the 256 hydraulic head observations, 25 conductivity measurements arranged in a 5 × 5 grid are incorporated into the inverse formulation. Figs. 9 and 10 show that including this sparse conductivity information significantly improves the inversion accuracy for both approaches, especially for PINN. Compared with cases without conductivity supervision (Figs. 7 and8), the relative error in PINN decreases from 1.064 to 0.539 for λ = 0.1 and from 0.979 to 0.188 for λ = 0.4, while the corresponding SSIM improves from -0.054 to 0.466 and 0.208 to 0.799, respectively. These results indicate that even sparse conductivity data serve as informative priors that effectively enhance PINN optimization.

Consistent with the previous observations, when conductivity measurements are included, LD-DIM continues to outperform PINN, as evidenced by its higher estimation accuracy and structural fidelity: ϵ K = 0.460 (SSIM = 0.767) and ϵ K = 0.026 (SSIM = 0.993) for λ = 0.1 and λ = 0.4, respectively.

These controlled comparisons in Sections 3.2.3 and 3.2.4 demonstrate that LD-DIM’s advantage arises not merely from access to additional data, but from its ability to integrate prior knowledge through a learned latent space, which regularizes the inverse problem more effectively. By encoding geological plausibility through the learned latent manifold, LD-DIM more effectively regularizes the inversion while preserving spatial heterogeneity, making it particularly valuable in practical settings where observations are limited.

Moreover, LD-DIM enforces physical constraints through a differentiable solver that satisfies the governing equations and boundary conditions exactly. This hard physical enforcement offers a substantially stronger constraint than the PINN-based methods, where physics is only weakly imposed through loss terms with empirically tuned weights. As a result, LD-DIM avoids the need for manual hyperparameter tuning and delivers more robust performance across diverse spatial configurations.

This section extends the application of LD-DIM to bimaterial conductivity fields characterized by sharp material interfaces. Such patterns commonly arise in fractured aquifers, karst systems, and layered geological units, where strong hydraulic contrasts invalidate the smooth-conductivity assumptions typically used in hydrological modeling. Unlike the smoothly varying Gaussian fields in Section 3.2, these discontinuous two-phase structures pose a significantly more challenging reconstruction problem, requiring accurate recovery of abrupt transitions and preservation of interface topology.

In this example, we first describe the generation of the bimaterial dataset (Section 3.3.1) and then assess the generative capacity of the latent diffusion model (Section 3.3.2). Subsequent sections analyze inversion results under varying initialization seeds and observation sparsity (Section 3.3.3) and compare LD-DIM against baseline methods (Section 3.3.3). Overall, these experiments evaluate LD-DIM’s ability to maintain geometric fidelity, recover interface morphology, and remain stable in heterogeneous two-phase systems.

In this example, the bimaterial conductivity fields representing geologically complex scenarios with sharp permeability contrasts are constructed following the indicator-based stochastic modeling practices established in geostatistical reservoir characterization [13]. Each sample is formed by first constructing two log-normal fields representing the high-and low-conductivity materials,

where Φ i are Gaussian processes with a Matérn covariance kernel (ν = 1, λ 1 = λ 2 = 50).

Then, a correlated splitting field S(x) with correlation length λ split = 200 is used to assign the two components across the domain. The partition is determined by a quantile threshold, α = 0.25 + 0.5 p, p ∼ U(0, 1), so that each material occupies between 25% and 75% of the domain. The spatial correlation in S(x) leads to coherent two-phase patterns with sharp interfaces between two distinct conductivity regions.

Following the bimaterial data construction described above, a total of 2,000 conductivityfield samples are generated to capture a wide range of two-phase configurations and interface geometries. Among the 2,000 samples, 1,400 are used for training the generative prior, 400 for validation, and 200 for inverse modeling tests.

To assess the capability of the LDM generative prior, we compare it against a baseline variational autoencoder (VAE). Figure 11 presents representative ground-truth bimaterial conductivity fields alongside conductivity samples generated by the LDM and the VAE. The results show that the LDM faithfully preserves sharp interfaces and the global two-phase structure, whereas the VAE produces overly smoothed transitions and blurred boundaries.

To quantify these differences, we apply two standard distributional metrics: the Fréchet Inception Distance (FID) [26] and the Kernel Inception Distance (KID) [4], defined in Eqs. ( 28) and ( 29), respectively. Both metrics operate on feature embeddings extracted from the edge representations of the fields. For each field I, the Sobel-Feldman edge detector S(•) is used to compute the gradient magnitudes [51], which indicate steep material jumps and suppresses smooth interior regions. These edge maps are normalized and encoded through a fixed feature extractor ϕ(•), implemented as the pretrained Inception-based feature mapping used in standard FID/KID evaluations. The extractor is not trained within our pipeline; it provides a consistent embedding ψ(I) = ϕ(S(I)) that reflects the geometry, contrast, and spatial arrangement of interfaces.

Fréchet Inception Distance (FID). Let (µ, Σ) and (µ ′ , Σ ′ ) denote the mean and covariance of the embeddings {ψ(x i )} and {ψ(y j )} extracted from the real and generated fields. The distance is

which measures discrepancies in the global statistics of the embedded edge features. Lower FID values indicate closer agreement in interface geometry and contrast. Kernel Inception Distance (KID). Given feature sets {ψ(x i )} N i=1 and {ψ(y j )} M j=1 from real and generated fields, the unbiased estimator is

where k(u, v) = 1 d u ⊤ v + 1 3 . Because KID directly compares pairwise similarities in the embedded edge features, lower values indicate a better match to the reference distribution in terms of interface sharpness and fine-scale structural patterns.

The numerical values of these metrics, summarized in Table 1, further reinforce the qualitative observations above: the LDM achieves an FID more than five times lower than the VAE, and its KID is reduced by over fourfold. These substantially lower scores indicate a closer match to the reference distribution in both local boundary sharpness and overall interface organization, demonstrating that the LDM is well suited for generating physically realistic bimaterial fields to serve as priors in inverse modeling.

We evaluate the inversion capability of LD-DIM on bimaterial conductivity fields characterized by sharp interfaces and strong conductivity contrasts. Two key aspects are investigated: (i) sensitivity to random initialization and (ii) the effect of observation density on reconstruction quality and stability. In addition, we compare LD-DIM with a physics-embedded VAE-based framework [58], demonstrating that LD-DIM produces sharper interface representations and consistently higher reconstruction fidelity.

As in the Gaussian case study presented in Section 3.2, all bimaterial inversion experiments are conducted on sample fields excluded from the training dataset, ensuring an unbiased assessment of generalization performance.

The inherent ill-posedness of the inverse problem introduces variability in reconstruction outcomes, making it important to examine how different random initialization seeds affect the predicted fields. To assess this effect, we consider three representative realizations obtained from different seed initializations for a bimaterial test case using 25 head observations on a 5 × 5 uniform grid.

Fig. 12 shows conductivity reconstructions from three different random initialization seeds. Despite starting from different initial latent vectors, all three realizations successfully recover the sharp two-phase structure and major geometric features. The mean-corrected relative errors εK associated with the 3 different initializations remain within a narrow range (0.275, 0.301, and 0.329), with corresponding SSIM values of 0.818, 0.807, and 0.773, respectively. Only minor variations in interface location are observed across seeds. This robustness across initializations can be attributed to two key mechanisms: first, the latent diffusion prior constrains the solution space to geologically plausible bimaterial realizations, preventing convergence to physically unrealistic field patterns; second, the explicit regularization term in Eq. ( 18) maintains proximity to the learned latent distribution, further stabilizing the optimization process.

The hydraulic head reconstructions are shown in Fig. 13, exhibiting even lower sensitivity to initial seeds. The three realizations produce relative errors of 2.856 × 10 -2 , 3.241 × 10 -2 , and 3.074 × 10 -2 , with SSIM values of 0.992, 0.982, and 0.987. The predicted head fields differ only slightly across seeds and remain very close to the reference solution. This consistency indicates that hydraulic head fields are substantially easier to reconstruct than the high-contrast conductivity field, due to both the direct availability of head measurements and the inherently smoother nature of the governing PDE solution. As a result, even when conductivity reconstructions exhibit geometric variations near material interfaces, the corresponding head predictions show very limited seed-induced variability.

In practical field applications, the number of available head measurements is often limited. To understand how observation density influences reconstruction performance, we systematically evaluate the framework using four uniform observation grids: 3 × 3, 5 × 5, 12 × 12, and 16 × 16 grids. This analysis quantifies the trade-off between data availability and reconstruction fidelity for both conductivity and hydraulic head fields, as shown in Fig. 14 and Fig. 15.

For the conductivity reconstruction (Fig. 14), the 3 × 3 grid results in a median relative error of approximately 0.44 and a median SSIM of 0.71, indicating that 9 head observations provide only limited constraint on the conductivity field reconstruction. Increasing the density to a 5 × 5 grid reduces the median error to about 0.30 and raises the SSIM to roughly 0.81, reflecting improved recovery of the conductivity field. The 12 × 12 and 16 × 16 grids further enhance the consistency across different seeds, with the latter achieving a median error of 0.28 and a median SSIM of 0.82, representing the highest interface fidelity of the tested configurations. The progressive narrowing of the interquartile ranges across the four densities demonstrates reduced sensitivity to initialization and increased robustness of the reconstructed conductivity fields. This robustness stems from the inherent regularization of the LD-DIM framework: by constraining inversion within a geologically informed latent manifold, the diffusion prior compensates for sparse observations and favors structurally plausible conductivity patterns. Consequently, LD-DIM delivers stable and reliable reconstructions even under limited data availability.

The hydraulic head reconstruction results (Fig. 15) further illustrate the influence of the number of head observations on both the accuracy and the stability of the reconstructed head fields. The median relative error decreases sharply from 0.11 to 0.03 as the observation grid increases from a 3 × 3 to a 5 × 5, showing that even a modest increase in observation density substantially constrains the groundwater flow solution. Beyond 5 × 5, additional observations provide diminishing returns in absolute accuracy, e.g., the SSIM is already close to 0.99, but they do meaningfully reduce the variance across different initial seeds.

We compare LD-DIM with the physics-embedded VAE approach of [58], which employs a variational autoencoder trained on bimaterial fields as a generative prior. During inversion, the latent variables are optimized to minimize discrepancy between simulated and observed hydraulic heads while maintaining proximity to the learned latent distribution. The comparison in Figure 16 highlights the advantages of LD-DIM for bimaterial reconstruction. LD-DIM reduces the relative error εK from 0.387 to 0.272 and achieves a higher SSIM, reflecting improved recovery of both global structure and local boundary detail. While The VAE-based method is able to capture the overall bimaterial interface geometry, it produces noticeably smoothed boundaries and fails to preserve the sharp transitions characteristic of high-contrast fields. This limitation stems from the Gaussian prior commonly used in VAEs, which biases reconstructions toward smooth, low-frequency patterns and struggles to represent discontinuities.

In contrast, the latent diffusion model leverages an iterative denoising process that approximates a richer, non-Gaussian latent distribution and progressively refines highfrequency spatial features essential for accurate sharp-interface representations. As a result, LD-DIM provides a more expressive and geologically realistic prior, leading to substantially improved interface fidelity and more accurate reconstruction of flow-controlling structures. Overall, the successful reconstruction of complex sharp-interface structures underscores the potential of LD-DIM for practical applications in fractured aquifers, layered subsurface systems, and other geologically heterogeneous environments.

This study introduced LD-DIM, a latent diffusion-based differentiable inversion framework that integrates a physics-informed generative prior with a differentiable finite-volume solver for subsurface inverse modeling. By embedding inversion within a low-dimensional latent space while enforcing physical consistency through exact numerical discretization, LD-DIM effectively mitigates the ill-posedness inherent in high-dimensional subsurface inverse problems. The effectiveness of the proposed LD-DIM framework arises from three key features. First, latent-space regularization confines optimization to a compact yet geologically plausible manifold, alleviating the curse of dimensionality that limits conventional gradient-based inversion in high-dimensional parameter spaces. Second, the latent diffusion prior preserves both smooth spatial correlations and sharp geological discontinuities, overcoming common limitations of existing approaches, including interface blurring in VAE-based methods, training instability in GAN-based models, and insufficient regularization in PINN-based inversions. Third, the proposed hybrid adjoint-automatic differentiation strategy enables exact gradient computation for PDE-constrained inverse problems, supporting efficient endto-end optimization with transparent sensitivity analysis while maintaining strict physical consistency.

We evaluate the proposed LD-DIM framework using two representative case studies involving Gaussian random fields and bimaterial conductivity fields, and systematically examine its generative capability as well as its sensitivity to measurement availability and initialization. For Gaussian conductivity fields with varying correlation lengths (λ = 0.1, 0.2, 0.3, 0.4), LD-DIM demonstrates consistently strong performance, achieving low reconstruction error and high structural similarity even in the most challenging low-correlation regime. Compared with the PINN-based inversion baseline, LD-DIM reduces reconstruction error by approximately 2.2× at λ = 0.1 (from ϵ K = 1.064 to 0.487) and by more than 30× at λ = 0.4 (from ϵ K = 0.9793 to 0.0319), while consistently preserving fine-scale heterogeneity.

For bimaterial conductivity fields, LD-DIM effectively resolved sharp geological discontinuities. Relative to a baseline VAE, the latent diffusion prior exhibits superior generative expressiveness, and the resulting inversion achieves substantially lower reconstruction error (ϵ K = 0.272 vs. 0.387) while accurately recovering material interfaces critical to fractured and layered subsurface systems. Comprehensive sensitivity analyses across different observation densities and random initialization seeds further demonstrate the robustness of the proposed framework. Statistical analyses of reconstructed conductivity and hydraulic head fields reveal stable medians and decreasing interquartile ranges as observation density increases, confirming the consistency and reliability of LD-DIM.

Looking ahead, the proposed LD-DIM framework is readily extensible to a broad class of physics-constrained inverse problems beyond subsurface flow, including seismic full-waveform inversion for velocity characterization, electromagnetic inversion for mineral exploration, thermal modeling in geothermal systems, and multiphase flow inversion in energy and environmental applications. These results underscore the broader potential of physicsconstrained latent diffusion models as a scalable and computationally efficient paradigm for inverse modeling in complex geophysical and engineering systems.

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut