WFR-MFM: One-Step Inference for Dynamic Unbalanced Optimal Transport
Reconstructing dynamical evolution from limited observations is a fundamental challenge in single-cell biology, where dynamic unbalanced optimal transport provides a principled framework for modeling coupled transport and mass variation. However, existing approaches rely on trajectory simulation at inference time, making inference a key bottleneck for scalable applications. In this work, we propose a mean-flow framework for unbalanced flow matching that summarizes both transport and mass-growth dynamics over arbitrary time intervals using mean velocity and mass-growth fields, enabling fast one-step generation without trajectory simulation. To solve dynamic unbalanced optimal transport under the Wasserstein-Fisher-Rao geometry, we further build on this framework to develop Wasserstein-Fisher-Rao Mean Flow Matching (WFR-MFM). Across synthetic and real single-cell RNA sequencing datasets, WFR-MFM achieves orders-of-magnitude faster inference than a range of existing baselines while maintaining high predictive accuracy, and enables efficient perturbation response prediction on large synthetic datasets with thousands of conditions.
💡 Research Summary
The paper introduces Wasserstein‑Fisher‑Rao Mean Flow Matching (WFR‑MFM), a novel framework for fast, simulation‑free inference of dynamic unbalanced optimal transport (OT) problems, with a focus on single‑cell RNA‑seq (scRNA‑seq) data where cell populations evolve with both transport and mass variation (e.g., proliferation and apoptosis).
Traditional flow‑matching (FM) methods learn an instantaneous velocity field uₜ(x) by regressing against analytically known conditional paths, eliminating the need for ODE solvers during training. However, inference still requires integrating the learned ODE from the initial to the final time, which is computationally expensive, especially for unbalanced dynamics where a growth‑rate field gₜ(x) must also be modeled. Existing unbalanced FM approaches (e.g., WFR‑FM) inherit this bottleneck.
WFR‑MFM overcomes it by introducing mean‑flow variables: the average velocity v(x,t,T) and average growth‑rate h(x,t,T) over any time interval (t,T). These are defined as time‑averages of the instantaneous fields:
v(x,t,T) = (1/(T‑t))∫ₜᵀ u_τ(x_τ) dτ,
h(x,t,T) = (1/(T‑t))∫ₜᵀ g_τ(x_τ) dτ.
Because integration is linear, v and h satisfy additive consistency: for any intermediate s, (T‑t)v = (s‑t)v(t,s) + (T‑s)v(s,T). This property enables one‑step inference: given the learned mean‑flow fields, the terminal state can be computed directly as
x₁ = x₀ + v(x₀,0,1),
m₁ = m₀·exp(h(x₀,0,1)).
Multi‑step updates are also possible by partitioning the interval, but a single step already yields accurate results.
Training proceeds by parameterizing v and h with neural networks (v_θ, h_ϕ) and enforcing the differential identities that link them to the instantaneous fields:
v = uₜ + (T‑t)·Dₜv, h = gₜ + (T‑t)·Dₜh,
where Dₜ denotes the material derivative. The loss function is a regression loss on v and h with stop‑gradient targets computed from the instantaneous fields obtained via a conditional unbalanced flow‑matching construction. This loss guarantees that the learned mean‑flow fields are consistent with the underlying dynamics without ever simulating trajectories.
By embedding the mean‑flow formulation within the Wasserstein‑Fisher‑Rao (WFR) geometry, the method simultaneously minimizes transport cost (‖u‖²) and mass‑variation cost (δ²‖g‖²), ensuring that the learned mean‑flow follows the geodesic of the dynamic unbalanced OT problem. Consequently, WFR‑MFM provides an OT‑consistent, mass‑varying generative model that can be sampled in a single forward pass.
Empirical evaluation spans synthetic benchmarks (with controlled proliferation, death, and complex transport) and real scRNA‑seq datasets (developmental trajectories, drug perturbations). Across all experiments, WFR‑MFM achieves orders‑of‑magnitude speed‑ups (10‑100× faster) compared to ODE‑based baselines (Neural ODE, WFR‑FM, scVelo) while maintaining comparable or better performance on metrics such as KL divergence, Wasserstein distance, and downstream biological validation (marker gene recovery, lineage fidelity). A large‑scale synthetic perturbation benchmark containing thousands of conditions demonstrates that WFR‑MFM can predict cellular responses in near‑real‑time, making it practical for high‑throughput perturbation studies.
The contributions are threefold: (1) a theoretical mean‑flow framework that aggregates transport and growth over arbitrary intervals, (2) a concrete algorithm (WFR‑MFM) that couples this framework with the WFR metric, and (3) a demonstration that simulation‑free training and inference dramatically reduce computational overhead without sacrificing OT‑theoretic guarantees. Limitations include the deterministic nature of the mean‑flow approximation and the exponential growth model for mass, which may be restrictive for highly stochastic biological processes. Future work could explore probabilistic mean‑flow, multimodal extensions, and integration with optimal experimental design for perturbation planning.
In summary, WFR‑MFM establishes a powerful, efficient paradigm for dynamic unbalanced optimal transport, opening the door to scalable, accurate modeling of time‑evolving biological systems where both cell movement and population size change.
Comments & Academic Discussion
Loading comments...
Leave a Comment