Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants

Reading time: 8 minute
...

📝 Original Info

  • Title: Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants
  • ArXiv ID: 2512.10857
  • Date: 2025-12-11
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineering domains, clean data are often unavailable: instead, we only observe measurements corrupted through a noisy, ill-conditioned channel. A generative model for the original data thus requires solving an inverse problem at the level of distributions. In this work, we introduce a novel approach to this task based on Stochastic Interpolants: we iteratively update a transport map between corrupted and clean data samples using only access to the corrupted dataset as well as black box access to the corruption channel. Under appropriate conditions, this iterative procedure converges towards a self-consistent transport map that effectively inverts the corruption channel, thus enabling a generative model for the clean data. We refer to the resulting method as the self-consistent stochastic interpolant (SCSI). It (i) is computationally efficient compared to variational alternatives, (ii) highly flexible, handling arbitrary nonlinear forward models with only black-box access, and (iii) enjoys theoretical guarantees. We demonstrate superior performance on inverse problems in natural image processing and scientific reconstruction, and establish convergence guarantees of the scheme under appropriate assumptions.

💡 Deep Analysis

Deep Dive into Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants.

Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineering domains, clean data are often unavailable: instead, we only observe measurements corrupted through a noisy, ill-conditioned channel. A generative model for the original data thus requires solving an inverse problem at the level of distributions. In this work, we introduce a novel approach to this task based on Stochastic Interpolants: we iteratively update a transport map between corrupted and clean data samples using only access to the corrupted dataset as well as black box access to the corruption channel. Under appropriate conditions, this iterative procedure converges towards a self-consistent transport map that effectively inverts the corruption channel, thus enabling a generative model for the clean data. We refer to the resulting method as the self-consistent stochastic interpolant (SCSI). It (i) is computationally effi

📄 Full Content

Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants Chirag Modi1*, Jiequn Han2*, Eric Vanden-Eijnden3,1, Joan Bruna1,2 1New York University 2Flatiron Institute 3Machine Learning Lab, Capital Fund Management December 12, 2025 Abstract Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineering domains, clean data are often unavailable: instead, we only observe measurements corrupted through a noisy, ill-conditioned channel. A generative model for the original data thus requires solving an inverse problem at the level of distributions. In this work, we introduce a novel approach to this task based on Stochastic Interpolants: we iteratively update a transport map between corrupted and clean data samples using only access to the corrupted dataset as well as black box access to the corruption channel. Under appropriate conditions, this iterative procedure converges towards a self-consistent transport map that effectively inverts the corruption channel, thus enabling a generative model for the clean data. We refer to the resulting method as the self-consistent stochastic interpolant (SCSI). It (i) is computationally efficient compared to variational alternatives, (ii) highly flexible, handling arbitrary nonlinear forward models with only black-box access, and (iii) enjoys theoretical guarantees. We demonstrate superior performance on inverse problems in natural image processing and scientific reconstruction, and establish convergence guarantees of the scheme under appropriate assumptions. Contents 1 Introduction 2 2 Preliminaries 4 2.1 Stochastic Interpolants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Self-Consistent Stochastic Interpolants 6 3.1 Iterative Scheme for Self-Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Truncated Inner-Loop Optimization for Efficiency . . . . . . . . . . . . . . . . . . . . . . . 7 4 Theoretical Analysis 7 4.1 Contraction in Wasserstein Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Contraction in KL Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 ∗Equal contribution. 1 arXiv:2512.10857v1 [cs.LG] 11 Dec 2025 5 Case Study: AWGN with Gaussian prior 10 6 Experiments 13 6.1 Low Dimensional Synthetic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2 Imaging Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.3 Quasar Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 7 Conclusion 17 A Proofs 21 A.1 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 A.2 Proof of Proposition 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 A.3 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 A.4 Proof of Proposition 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 B Detailed Algorithm Description 26 B.1 Algorithm Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 B.2 Conditional Generation via Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 C Implementation Details 27 C.1 Architecture of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 C.2 Training Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 D Additional Results 29 D.1 Impact of Network Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 D.2 SDE Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 D.3 Additional Metrics for Restored Performance Comparison . . . . . . . . . . . . . . . . . . . 31 D.4 Varying Levels of Corruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 D.5 Non-Gaussian Noise: Gaussian Blurring with Poisson noise . . . . . . . . . . . . . . . . . 33 1 Introduction Generative modeling has become a central aspect of high-dimensional learning. Transport-based methods, including diffusion-based models [HJA20, SSDK+21] and flow-based models [AVE23, LCBH+22, LGL23], have emerged as leading frameworks with a wide range of applications from natural image synthesis [RBL+22] to molecular design [WJB+23]. These methods rely on access to clean samples 𝑥∼𝜋of the target distribution, which are plentiful in many machine learning tasks. However, in many scientific and engineering applications, such clean data of interest are unavailable. Instead, we only observe corrupted measurements 𝑦through a forward map 𝑦= F (𝑥) that is typically noisy and ill-conditioned. Examples include med

…(Full text truncated)…

📸 Image Gallery

cifar10-96-long.png cifar10-96-long.webp cifar10-long.png cifar10-long.webp edm_ranmask0.5.png edm_ranmask0.5.webp gaussian_blur_1.00-0.10_combined.png gaussian_blur_1.00-0.10_combined.webp gaussian_blur_1.00-0.10_dps.png gaussian_blur_1.00-0.10_dps.webp gaussian_blur_pnoise_1.00-0.10_all.png gaussian_blur_pnoise_1.00-0.10_all.webp gaussian_blur_pnoise_1.00-0.25_all.png gaussian_blur_pnoise_1.00-0.25_all.webp jpeg_compress_10.00-100.00-0.01.png jpeg_compress_10.00-100.00-0.01.webp jpeg_compress_10.00-100.00-0.01_all.png jpeg_compress_10.00-100.00-0.01_all.webp jpeg_compress_10.00-50.00-0.01.png jpeg_compress_10.00-50.00-0.01.webp jpeg_compress_10.00-50.00-0.01_all.png jpeg_compress_10.00-50.00-0.01_all.webp net_size_ode.png net_size_ode.webp ode_vs_em.png ode_vs_em.webp qso_highsnr_wf_ode.png qso_highsnr_wf_ode.webp qso_highsnr_wf_sde.png qso_highsnr_wf_sde.webp qso_lowsnr_wf_ode.png qso_lowsnr_wf_ode.webp qso_lowsnr_wf_sde.png qso_lowsnr_wf_sde.webp random_mask_0.25-0.00-1.00_all.png random_mask_0.25-0.00-1.00_all.webp random_mask_0.50-0.00-1.00_all.png random_mask_0.50-0.00-1.00_all.webp random_mask_0.50-0.00-1.00_combined.png random_mask_0.50-0.00-1.00_combined.webp random_mask_0.50-0.00-1.00_dps.png random_mask_0.50-0.00-1.00_dps.webp random_mask_0.75-0.00-1.00_all.png random_mask_0.75-0.00-1.00_all.webp random_motion_5.00-0.00_combined.png random_motion_5.00-0.00_combined.webp random_motion_5.00-0.10_all.png random_motion_5.00-0.10_all.webp random_motion_5.00-0.10_dps.png random_motion_5.00-0.10_dps.webp random_motion_7.00-0.00_all.png random_motion_7.00-0.00_all.webp random_motion_9.00-0.00_all.png random_motion_9.00-0.00_all.webp schematic_1123.png schematic_1123.webp w2t1g.png w2t1g.webp w2t2g.png w2t2g.webp w2t3g.png w2t3g.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut