p-spin glasses, characterized by frustrated many-body interactions beyond the conventional pairwise case (p>2), are prototypical disordered systems whose ground-state search is NP-hard and computationally prohibitive for large instances. Solving this problem is not only fundamental for understanding high-order disorder, structural glasses, and topological phases, but also central to a wide spectrum of hard combinatorial optimization tasks. Despite decades of progress, there still lacks an efficient and scalable solver for generic large-scale p-spin models. Here we introduce PLANCK, a physics-inspired deep reinforcement learning framework built on hypergraph neural networks. PLANCK directly optimizes arbitrary high-order interactions, and systematically exploits gauge symmetry throughout both training and inference. Trained exclusively on small synthetic instances, PLANCK exhibits strong zero-shot generalization to systems orders of magnitude larger, and consistently outperforms state-of-the-art thermal annealing methods across all tested structural topologies and coupling distributions. Moreover, without any modification, PLANCK achieves near-optimal solutions for a broad class of NP-hard combinatorial problems, including random k-XORSAT, hypergraph max-cut, and conventional max-cut. The presented framework provides a physics-inspired algorithmic paradigm that bridges statistical mechanics and reinforcement learning. The symmetry-aware design not only advances the tractable frontiers of high-order disordered systems, but also opens a promising avenue for machine-learning-based solvers to tackle previously intractable combinatorial optimization challenges.
Over the past few decades, the spin glass model has become a cornerstone of statistical physics [1,2]. A key breakthrough came with Parisi's replica symmetry breaking (RSB) theory [3], which provides fundamental insights into the fully connected Sherrington-Kirkpatrick (SK) model [4] and extends naturally to more complex disordered systems [5][6][7]. Gardner further generalized the SK model by introducing p-spin interactions [8], where the SK model corresponds to the special case p = 2 and the random energy model (REM) arises in the limiting case p → ∞ [9]. Optimizing p-spin glass models [10] with p > 2 is difficult, particularly for the Edwards-Anderson (EA) model, which describes finite-dimensional lattices with nearest-neighbor interactions [11]. In this regime, meanfield approaches break down, sparking ongoing debates between the RSB framework and the competing droplet model [12]. At the heart of these challenges lies in minimizing the following p-spin Hamiltonian:
In general, this Hamiltonian can be defined on arbitrary graphs. Here we will focus on the most heavily studied Edwards-Anderson (EA) model with Ising spins σ i = ±1 placed on regular two-dimensional lattices-triangular (p = 3), square (p = 4), and hexagonal (p = 6). The coupling J i1,i2,…,ip is generally drawn from bimodal or Gaussian distributions (Fig. 1 for a hexagonal lattice example).
Optimizing the p-spin model with p > 2 is of fundamental importance for three reasons. First, it is the key to understanding structural glasses [13], supercooled liquids [14], and topological quantum error correction [15] governed by high-order disorder. Second, optimizing the native high-order Hamiltonian, rather than a quadratized proxy, eliminates auxiliary variables and retains the original interaction geometry, both of which simplify the problem and improve computational efficiency [16]. Third, the p-spin formulation exactly encodes a vast family of NP-hard combinatorial tasks, from pure-p systems (random k-XORSAT [17], Modern Hopfield model [18]) to mixed-p (hypergraph MaxCut [19], MAXSAT [20]) and even conventional pairwise problems (MaxCut [21], vertex cover [22]). These characteristics position the p-spin model as a universal and physically grounded benchmark for assessing optimization algorithms across diverse regimes.
Unlike the well-studied pairwise case (p = 2), instances with p ≥ 3 give rise to a rugged, fractal-like free-energy landscape [8,23] (Fig. 2) and are notoriously resistant
FIG. 1. Illustrative ground-state optimization on a hexagonal lattice. Ground-state search on a L = 8 hexagonal (p = 6) lattice with fixed boundary conditions and Gaussian couplings. Each hexagon represents a six-spin interaction (opacity scales with |J|; blue-shaded areas indicate unsatisfied bonds with -JΠσi > 0). Snapshots compare the steady-state configurations achieved by Greedy search, Simulated Annealing (SA), Parallel Tempering (PT), and PLANCK. As can be observed, PLANCK is the only method that reaches the exact ground state (verified by Gurobi, a branch-and-bound based exact solver), whereas other methods stall in higher-energy local minima.
to both exact and heuristic solvers. Exact algorithms (branch-and-bound [24], branch-and-cut [25]) are limited to tens of spins, while metaheuristics such as simulated annealing (SA) [26] and parallel tempering (PT) [27] suffer from slow mixing and frequent entrapment in local minima and often require an impractically large number of sweeps to locate low-energy configurations.
Recently, machine learning has emerged as a powerful paradigm for combinatorial optimization [28][29][30][31], particularly through learning-based models. In physical and energy-based settings, notable examples include physics-inspired graph neural networks (PIGNN) [32], hypergraph-oriented optimization frameworks like hypOp [33], and the Free-Energy Machine (FEM) [34]. These approaches achieve strong results on specific problem classes by embedding energy minimization or landscape navigation into gradient-based learning. Yet these approaches are tightly coupled to particular task formulations and lack a unified training-inference pipeline, limiting their cross-domain applicability. An alternative line of work frames optimization as sequential decision-making and leverages reinforcement learning (RL) to construct solutions step by step. DIRAC [35], for instance, learns efficient spin-flip policies for Edwards-Anderson spin glasses and achieves state-of-the-art performance for pairwise (p = 2) interactions. However, extending such RL-based solvers to arbitrary higher-order couplings has remained an open challenge. This limitation is not merely incremental: interactions of order p ≥ 3 create many-body correlations that are exponentially harder to model and require fundamentally different state representations and credit assignment mechanisms.
In this work, we introduce PLANCK (P-spin-gLAss model optimization leveraging deep reiNforCemen
This content is AI-processed based on open access ArXiv data.