Population Annealing as a Discrete-Time Schrödinger Bridge

We present a theoretical framework that reinterprets Population Annealing (PA) through the lens of the discrete-time Schrödinger Bridge (SB) problem. We demonstrate that the heuristic reweighting step in PA is derived by analytically solving the Schr…

Authors: Masayuki Ohzeki

Journal of the Physical Society of Japan Population Annealing as a Discrete-Time Schr ¨ odinger Bridge Masayuki Ohzeki Graduate School of Information Sciences, T ohoku University , Sendai 980-8579, Japan Department of Physics, T okyo Institute of T echnolo gy , Me gur o, T okyo 152-8551, Japan Resear ch and Education Institute for Semiconductors and Informatics, K umamoto University , K umamoto 860-8555, J apan Sigma-i Co., Ltd., Minato, T okyo 108-0075, J apan W e present a theoretical framework that reinterprets Population Annealing (P A) through the lens of the discrete-time Schr ¨ odinger Bridge (SB) problem. W e demonstrate that the heuristic reweighting step in P A is deriv ed by analytically solving the Schr ¨ odinger system without iterative computation via instantaneous projection. In addition, we identify the thermodynamic work as the optimal control potential that solv es the global variational problem on path space. This perspectiv e unifies non-equilibrium thermodynamics with the geometric framew ork of optimal transport, interpreting the Jarzynski equality as a consistency condition within the Donsker -V aradhan variational principle, and elucidates the thermodynamic optimality of P A. 1. Introduction Simulating equilibrium states of complex systems char - acterized by rough energy landscapes is a central challenge in statistical physics. Standard Marko v Chain Monte Carlo (MCMC) methods often su ff er from slow con ver gence due to trapping in local minima. T o overcome this ergodicity break- ing, various generalized ensemble methods have been pro- posed, such as Replica Exchange Monte Carlo (Parallel T em- pering) 1) and Simulated T empering. 2) Among these, Popula- tion Annealing (P A) 3, 4) has recently emerged as a particu- larly po werful approach. By evolving a population of repli- cas under a changing temperature schedule, P A enables e ffi - cient parallel sampling and, crucially , provides a direct esti- mator of free energy di ff erences via the measurement of non- equilibrium work based on the Jarzynski equality , which often supports an e ffi cient computing in stochastic dynamics and quantum gate model as well. 5, 6) In a separate dev elopment within applied mathematics and machine learning, the Schr ¨ odinger Bridge (SB) prob- lem 7, 8) has gained renewed attention.Originally posed by Schr ¨ odinger to find the most likely ev olution of a probability density between two fixed marginals, SB is now recognized as a formulation of entropy-regularized optimal transport. 9) It has become a foundational framework for modern generative modeling, particularly in di ff usion models 10–12) and stochastic control, 13) where iterati ve algorithms are typically employed to learn the optimal control force. In this context, algorithmic connections between Sequen- tial Monte Carlo (SMC) methods and SB ha ve also been ex- plored. For instance, Bernton et al. 14) proposed improving SMC proposals using iterati ve SB formulations (Iterativ e Pro- portional Fitting), and Liu et al. 15) dev eloped e ffi cient adjoint- based samplers. These works primarily focus on algorithmic refinements to enhance sampling e ffi ciency through iterati ve optimization and learning. While these studies position SMC methods as algorithmic components within iterati ve SB solvers, Population Anneal- ing, which is one of the SMC methods, is used in physics as a standalone, non-iterative sampler . Howe ver , the standard for- mulation of the SB problem typically requires iterativ e pro- cedures, such as the Sinkhorn algorithm, to strictly satisfy boundary constraints at both the initial and final distributions. It is therefore not immediately obvious how P A, which oper- ates purely sequentially without such global iterations, relates to the rigorous optimal transport frame work. Does P A merely approximate the transport, or does it constitute a valid solu- tion to the SB problem under specific conditions? The phys- ical geometry underlying this e ffi ciency—specifically , ho w P A bypasses the need for iteration via thermodynamic prin- ciples—has not been fully explicated. In this Letter, we reor ganize the theoretical understanding of P A by explicitly mapping it to the discrete-time SB frame- work. Unlike previous works that focus on iterative refine- ment, we demonstrate that the heuristic reweighting step in P A corresponds exactly to the optimal control term required to solv e the SB problem analytically in the instantaneous pro- jection limit. Furthermore, we extend the discussion from a single step to the entire annealing schedule, proving that min- imizing the local transport cost step-by-step is equi valent to solving the global optimal transport problem. This formula- tion unifies the physical concept of the Jarzynski equality with the geometric perspectiv e of optimal transport, suggesting that P A is a natural solver for the global Donsker-V aradhan variational problem on path space. 2. Population Annealing and Free Energy Estimation W e consider a system with state x ∈ X and ener gy function E ( x ). Our goal is to sample from the Boltzmann distrib ution π k ( x ) ∝ e − β k E ( x ) across a sequence of in verse temperatures β 0 < β 1 < · · · < β K . In the P A algorithm, at step k , the temperature is lowered from β k to β k + 1 . P A adjusts the population by calculating a weight w ( i ) k for each replica i : w k ( x ) = e − β k + 1 E ( x ) e − β k E ( x ) = e − ( β k + 1 − β k ) E ( x ) . (1) The population is then resampled proportional to these weights, followed by an MCMC transition (mutation). This process has a direct thermodynamic interpretation where 1 J. Phys. Soc. Jpn. ln w k ( x ) = − ∆ β k E ( x ) corresponds to the thermodynamic work W k . The av erage of these weights estimates the free energy di ff erence via the Jarzynski equality: 16) ⟨ e − W k ⟩ π k = e − ∆ F k . 3. The Discrete-Time Schr ¨ odinger Bridge T o formalize P A within the optimal transport frame work, we consider the Schr ¨ odinger Bridge (SB) problem on the path space X 0: K = ( x 0 , . . . , x K ). Let Q be the probability measure of the reference Markov process (uncontrolled MCMC) with joint distrib ution Q ( x 0: K ) = π 0 ( x 0 ) Q k M k ( x k + 1 | x k ), where π t ( x ) is the Gibbs-Boltzmann distribution as a steady state. On the other hand, we define the modified path probability as P ( x 0: K ) = π 0 ( x 0 ) Q k M ′ k ( x k + 1 | x k ), which is given by a di ff erent kernel M ′ k ( x k + 1 | x k ). The global SB problem seeks a path measure P that mini- mizes the KL div er gence from Q : min P D K L ( P∥Q ) . (2) The nature of the solution depends critically on the constraints imposed on P . The standard formulation of the SB problem imposes con- straints only on the initial and final distributions: P 0 ( x 0 ) = π 0 ( x 0 ) , P K ( x K ) = π K ( x K ) . (3) Here P t ( x t ) is an instantaneous probability distrib ution. In this case, the optimal path measure P ∗ is kno wn to be a Marko v bridge. The KL div er gence is reduced to the following form: D K L ( P∥Q ) = K − 1 X k = 0 X x k + 1 , x k P k ( x k ) M ′ k ( x k + 1 | x k ) log M ′ k ( x k + 1 | x k ) M k ( x k + 1 | x k ) . (4) In general, the KL di vergence between path measures decom- poses into the expectation of the sum of conditional KL di- ver gences giv en the full history x 0: k . Howe ver , in this case, the stochastic dynamics are gov erned by a Markov process. The optimal “joint distribution” P ∗ k ( x k , x k + 1 ) at adjacent time steps k , k + 1 is given by the symmetric modification of the reference measure: P ∗ k ( x k , x k + 1 ) = φ k ( x k ) Q k ( x k , x k + 1 ) ψ k + 1 ( x k + 1 ) , (5) where Q k ( x k , x k + 1 ) = π k ( x k ) M k ( x k + 1 | x k ) is the reference joint distribution. φ k and ψ k are space-time potentials. Notice that P ∗ k ( x k , x k + 1 ) is not necessarily normalized. The following ma- nipulation does not require kernel normalization. In this case, the optimal path measure P ∗ is known to be a Markov bridge gi ven by the multiplicativ e modification of the reference measure. The space-time potentials φ k and ψ k must satisfy the Schr ¨ odinger system, which consists of the forward and backward recurrence relations: φ k + 1 ( y ) = X x φ k ( x ) Q k ( x , y ) , (6) ψ k ( x ) = X y Q k ( x , y ) ψ k + 1 ( y ) . (7) These equations describe the propagation of boundary infor- mation: φ k carries the initial constraint π 0 forward, while ψ k propagates the final constraint π K backward. It implies that the marginal distribution at any time step k is giv en by the pointwise product of the two potentials: P k ( x ) = φ k ( x ) ψ k ( x ) . (8) This symmetric factorization highlights the role of φ k as the forward message from the initial state and ψ k as the backward message from the final constraint. The solution is determined by the boundary conditions: φ 0 ( x ) ψ 0 ( x ) = π 0 ( x ) , φ K ( x ) ψ K ( x ) = π K ( x ) . (9) Since the equations are coupled globally across time 0 ≤ k ≤ K , one typically employs the iterativ e Sinkhorn algorithm to find the potentials that satisfy both boundaries simultane- ously . Howe ver , this iterativ e procedure is computationally ex- pensiv e, especially for high-dimensional systems, as it re- quires storing and updating the entire history of potentials. 4. Bridging P A and SB T o circumvent this computational burden, we introduce a stronger set of constraints to simplify the problem structure. Instead of fixing only the endpoints, we enforce the equilib- rium distrib ution π k as a marginal constraint at e very interme- diate time step: P k ( x k ) = π k ( x k ) for all k = 0 , . . . , K . (10) By pinning the distribution to the equilibrium target π k at each moment, we break the global temporal coupling. This trans- forms the global optimization problem into a sequence of lo- cal, independent transport problems between adjacent steps π k and π k + 1 , which allows for a direct, non-iterati ve solution. Moreov er , this formulation aligns naturally with the physi- cal reality of annealing protocols. In such experiments or sim- ulations, an external operator arbitrarily manipulates control parameters (such as the inv erse temperature β k ) according to a predetermined schedule. This operation explicitly defines a sequence of target equilibrium distrib utions π k that the system is intended to track at each step, rather than merely imposing a constraint on the final state. The potentials φ k and ψ k + 1 are determined by the marginal constraints imposed by the local minimization of the KL div er gence. Mar ginalizing the optimal joint distribution P ∗ k ( x k , x k + 1 ) with respect to x k + 1 and x k must reco ver the fixed distributions π k ( x k ) and π k + 1 ( x k + 1 ), respectively . This yields the following coupled equations: φ k ( x k ) = π k ( x k ) P x k + 1 Q k ( x k , x k + 1 ) ψ k + 1 ( x k + 1 ) , (11) ψ k + 1 ( x k + 1 ) = π k + 1 ( x k + 1 ) P x k φ k ( x k ) Q k ( x k , x k + 1 ) . (12) One can recognize that this system is overdetermined if we attempt to solve both equations analytically for a single step. Howe ver , we can exploit the freedom to fix one potential. If we substitute ψ k + 1 ( x k + 1 ) = 1 into Eq. (11), we obtain the trivial solution φ k ( x k ) = 1 (since P x k + 1 Q k = π k ), which im- plies no distributional change. In contrast, if we postulate φ k ( x k ) = 1 in Eq. (12), we can calculate ψ k + 1 ( x k + 1 ) ana- lytically . This choice allo ws us to strictly satisfy the target marginal condition at each time step. The solution obtained from this latter path corresponds exactly to the Population An- nealing algorithm. 2 J. Phys. Soc. Jpn. The “forward equation” in the Sinkhorn algorithm yields the analytical solution for the potential: ψ k + 1 ( y ) = π k + 1 ( y ) π k ( y ) . (13) This result signifies that since the mutation kernel M k does not alter the equilibrium distribution π k , the entire distributional shift to π k + 1 must be induced solely by the potential ψ k + 1 . In P A, this is implemented by the resampling step. Substituting this into the joint distribution P ∗ : P ∗ ( x k + 1 , x k ) = π k + 1 ( x k + 1 ) π k ( x k + 1 ) M k ( x k + 1 | x k ) π k ( x k ) . (14) Since we established that the global cost function is the sum of local costs, applying this optimal control at e very step en- sures global optimality across the entire schedule 0 → K . Substituting the P A optimal solution into the cost function, the log-likelihood ratio simplifies significantly: D K L ( P∥Q ) = K − 1 X k = 0 D K L ( π k + 1 ∥ π k ) . (15) Substituting the definition of the canonical distributions, we find that this cost function is exactly equal to the thermody- namic dissipated work. Let us define the generalized work performed during the parameter update from β k to β k + 1 as W k ( x ) = ( β k + 1 − β k ) E ( x ) and the dimensionless free energy as ϕ k = − ln Z k . The KL div er gence can be expanded as: D K L ( π k + 1 ∥ π k ) = ∆ ϕ − ⟨ W k ⟩ π k + 1 . This quantity represents the entropy production or the dissi- pated work defined by the second law of thermodynamics: the di ff erence between the free energy change and the average work. Thus, minimizing the transport cost in P A is rigorously equiv alent to minimizing the strictly defined thermodynamic dissipation. Beyond the known connection to the Jarzynski equality , our w ork re veals the fundamental v ariational struc- ture of the algorithm. W e assert that the P A algorithm is ob- tained by minimizing the Schr ¨ odinger Bridge objecti ve sub- ject to strict intermediate constraints. This formulation reveals that the mathematical cost function is strictly equiv alent to the physical entropy production defined in thermodynamics. Furthermore, if we consider the limit of small step sizes ∆ β = β k + 1 − β k → 0, this exact quantity can be approximated by the second-order expansion inv olving the Fisher informa- tion (energy v ariance): D K L ( π k + 1 ∥ π k ) ≈ 1 2 V ar( E ) β k ( ∆ β k ) 2 . (16) Since the magnitude of the resultant cost quantifies the in- herent di ffi culty of the distributional transport, minimizing it corresponds to realizing control with minimal thermodynamic e ff ort. This metric serves as a fundamental criterion for opti- mizing the annealing schedule: step sizes should be reduced in regions where the cost (ener gy v ariance) is high to main- tain “e ff ortless” control, and can be increased where the cost is low . Interestingly , the criterion of constant thermodynamic speed is physically equiv alent to the optimal scheduling con- dition in Replica Exchange Monte Carlo (EMC, also known as Parallel T empering), 1) where the temperature interv als are chosen to keep the acceptance probabilities of replica swaps constant. Both algorithms ultimately demand the same geo- metric optimization: ensuring su ffi cient overlap between adja- cent energy distributions to facilitate e ffi cient transport (swap- ping in EMC, or resampling in P A). This geometric equiv a- lence strongly suggests that the transition rules of EMC can also be formally derived as an optimal solution within the discrete-time Schr ¨ odinger Bridge frame work. Howe v er , a rig- orous mapping of the EMC algorithm onto the SB frame work requires a detailed treatment of the joint distributions and their corresponding potentials. Therefore, we leav e this compre- hensiv e discussion for a separate future publication. W e now address the temporal structure of the control. In standard SB, the solution is often a “dynamical” control (modifying M k during transition). Howe v er , the discrete-time setting allo ws for an alternati ve: instantaneous control. In P A, the parameter switch β k → β k + 1 creates a discrepancy . The likelihood ratio required to bridge this gap is related to the resampling weight w k ( x ) defined in Eq. (1) and the partition functions: π k + 1 ( x ) π k ( x ) = e − β k + 1 E ( x ) / Z k + 1 e − β k E ( x ) / Z k = w k ( x ) Z k Z k + 1 . (17) By resampling with w k ( x ), P A activ ely utilizes this work to “bridge the gap” immediately . It e ff ectiv ely resets the empir - ical measure to π k + 1 before the unmodified kernel M k is ap- plied. In the language of the Schr ¨ odinger system, this implies that the iterati ve Sinkhorn procedure is rendered unnecessary . Be- cause the control is applied as an instantaneous projection (re- sampling) and the reference kernel preserves the source distri- bution, the optimal potential ψ is analytically determined by the Boltzmann factor ratio. Thus, the resampling step in P A is an exact algorithmic im- plementation of the optimal control potentials. The “work” W k corresponds directly to the log-potential required to per- form the optimal control, and the sequence of these operations constitutes the optimal transport path for the entire annealing process. 5. Geometric Interpretation on Path Space The identification of P A as a fully constrained global Schr ¨ odinger Bridge solver o ff ers deeper insight into the math- ematical nature of non-equilibrium work relations. W e con- sider the probability measures on the entire path space X 0: K = ( x 0 , . . . , x K ). Let Q be the reference path measure. The optimal transport measure P ∗ constructed by P A is related to Q via a change of measure. The ratio of their densities is the global Radon- Nikodym deri v ativ e. Using Eq. (17): d P ∗ d Q ( x ) = K − 1 Y k = 0 π k + 1 ( x k + 1 ) π k ( x k + 1 ) ≈ K − 1 Y k = 0 w k ( x k ) Z k Z k + 1 . (18) The global Jarzynski equality arises as the normalization con- dition for this global measure transformation. T aking the e x- pectation ov er the reference path ensemble Q : ⟨ e −W ⟩ Q = Z K Z 0 ⟨ d P ∗ d Q ⟩ Q = Z K Z 0 = e − ∆ F total . (19) Thus, in the global SB framework, the Jarzynski equality serves as the geometric consistenc y condition. In addition, the optimality of P A is grounded in the 3 J. Phys. Soc. Jpn. Donsker -V aradhan v ariational formula. 17) By identifying the cumulativ e work W as the optimal con- trol potential, the free energy relation is recast as a variational principle on the path space: − ∆ F total = ln ⟨ e −W ⟩ Q = sup P { −⟨W⟩ P − D K L ( P∥Q ) } . (20) This objective maximizes energetic e ffi ciency (negati ve work) penalized by the information-theoretic cost of deviating from the reference dynamics, e ff ectiv ely tightening the standard Second Law bound ⟨W ⟩ ≥ ∆ F total . Since the SB problem corresponds to the minimization of the KL div er gence, it rep- resents the mathematical dual to this maximization. P A, by reweighting trajectories proportional to e −W , empirically con- structs the unique optimal path measure P ∗ that attains this supremum, thereby solving the global variational problem for finite-time transport. In this light, the Jarzynski equality serves as the geometric consistency condition, ensuring that the optimal path measure is properly normalized to bridge the initial and final distributions. 6. Conclusion In this Letter , we have established that Population Anneal- ing is not merely a heuristic algorithm but a rigorous solver for the discrete-time Schr ¨ odinger Bridge problem. While re- cent machine learning literature has focused on iterative re- finements of such samplers, our reformulation clarifies the physical mechanism behind P A ’ s e ffi ciency . W e demonstrated that the resampling step constitutes an analytical solution to the Schr ¨ odinger system via static projection. By explicitly contrasting the standard endpoint-constrained SB problem, which necessitates iterati ve Sinkhorn computations, with the sequentially constrained P A formulation, we clarified why P A admits a non-iterative analytical solution. This analysis re- veals that the resampling step acts as an instantaneous static projection, providing a globally optimal transport trajectory on the path space. The significant insight gained from this perspectiv e is the identification of the cumulative thermodynamic work as the exact potential required for optimal transport. This framework o ff ers a dual unification: physically , it reinterprets the Jarzyn- ski equality as a geometric consistency condition within the Donsker -V aradhan v ariational principle. From an algorithmic perspectiv e, it bridges statistical physics and machine learning by characterizing P A as a training-free, non-iterativ e strategy for optimal transport. This insight opens new avenues for ef- ficient sampling in energy-based generati v e modeling. Finally , a promising future direction is to extend this framew ork to dynamics breaking detailed balance. Since non-rev ersible processes are known to accelerate con ver - gence, 18–22) utilizing optimal transport theory to design and reweight such non-equilibrium dynamics could lead to even more e ffi cient sampling algorithms beyond the standard limi- tations of rev ersibility . W e acknowledge that the standard SB problem, which im- poses constraints only at the endpoints, theoretically admits a broader class of solutions, including the P A trajectory . How- ev er , the unique feature of P A is its greedy , sequential nature enabled by the intermediate constraints. While the standard iterativ e SB approach can find the globally optimal path with- out these intermediate restrictions (potentially finding ”tun- neling” paths that standard annealing misses), P A provides a highly e ffi cient ”forward-pass” approximation that is exact when the system follo ws the prescribed temperature schedule. Comparing these two regimes—sequential constrained trans- port versus global iterati ve transport—remains a promising av enue for future research. This study was supported by the JSPS KAKENHI Grant No. 23H01432. W e recei ved financial supports by programs for bridging the g ap between R&D and IDeal society (Society 5.0) and Generating Economic and social v alue (BRIDGE) and Cross-ministerial Strategic Innov ation Promotion Pro- gram (SIP) from the Cabinet O ffi ce (No. 23836436). 1) K. Hukushima and K. Nemoto: Journal of the Physical Society of Japan 65 (1996) 1604. 2) E. Marinari and G. Parisi: Europhysics Letters 19 (1992) 451. 3) K. Hukushima and Y . Iba: AIP Conference Proceedings, V ol. 690, 2003, pp. 200–206. 4) J. Machta: Physical Revie w E 82 (2010) 026704. 5) H. Katsuda and M. Ohzeki: Journal of the Physical Society of Japan 80 (2011) 045003. 6) M. Ohzeki: Physical Revie w Letters 105 (2010) 050401. 7) E. Schr ¨ odinger: Sitzungsberichte der Preussischen Akademie der Wis- senschaften. Physikalisch-mathematische Klasse 144 (1931) 144. 8) C. L ´ eonard: Discrete & Continuous Dynamical Systems-A 34 (2014) 1533. 9) G. Peyr ´ e and M. Cuturi: Foundations and T rends ® in Machine Learn- ing 11 (2019) 355. 10) Y . Song, J. Sohl-Dickstein, D. P . Kingma, A. K umar , S. Ermon, and B. Poole: International Conference on Learning Representations, 2021. 11) V . De Bortoli, J. Thornton, J. Heng, and A. Doucet: Advances in Neural Information Processing Systems 34 (2021) 17695. 12) K. Kaba, R. Shimizu, M. Ohzeki, and Y . Sughiyama: Physical Review Research 7 (2025) 033213. 13) T . Chen, G.-H. Liu, and E. A. Theodorou: arXiv preprint arXiv:2106.01357 (2021). 14) E. Bernton, J. Heng, A. Doucet, and P . E. Jacob: arXiv preprint arXiv:1912.13170 (2019). 15) G.-H. Liu, J. Choi, Y . Chen, B. K. Miller, and R. T . Q. Chen: Proceed- ings of the 13th International Conference on Learning Representations (ICLR), 2025. 16) C. Jarzynski: Physical Revie w Letters 78 (1997) 2690. 17) M. D. Donsker and S. R. S. V aradhan: Communications on Pure and Applied Mathematics 28 (1975) 1. 18) H. Suwa and S. T odo: Physical Revie w Letters 105 (2010) 120603. 19) Y . Sakai and K. Hukushima: Journal of the Physical Society of Japan 82 (2013) 064003. 20) A. Ichiki and M. Ohzeki: Physical Revie w E 88 (2013) 020101. 21) M. Ohzeki and A. Ichiki: Physical Revie w E 92 (2015) 012105. 22) M. Ohzeki and A. Ichiki: Journal of Physics: Conference Series 638 (2015) 012003. 4

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment