VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model
Inspired by human SYSTEM 2 thinking, LLMs excel at complex reasoning tasks via extended Chain-of-Thought. However, similar test-time scaling for diffusion models to tackle complex reasoning remains largely unexplored. From existing work, two primary challenges emerge in this setting: (i) the dependence on an external verifier indicating a notable gap from intrinsic reasoning of human intelligence without any external feedback, and (ii) the lack of an efficient search algorithm. In this paper, we introduce the Verifier-free Test-time Scalable Diffusion Model (VFScale) to achieve scalable intrinsic reasoning, which equips number-of-sample test-time scaling with the intrinsic energy function of diffusion models as the verifier. Concretely, VFScale comprises two key innovations to address the aforementioned challenges. On the training side, VFScale consists of a novel MRNCL loss and a KL regularization to improve the energy landscape, ensuring that the learned energy function itself serves as a reliable verifier. On the inference side, VFScale integrates the denoising process with a novel hybrid Monte Carlo Tree Search (hMCTS) to improve search efficiency. On challenging reasoning tasks of Maze and Sudoku, we demonstrate the effectiveness of VFScale’s training objective and scalable inference method. In particular, trained with Maze sizes of up to $6\times6$, our VFScale solves 88% of Maze problems with much larger sizes of $15\times15$, while standard diffusion models completely fail. The code can be found at https://github.com/AI4Science-WestlakeU/VFScale.
💡 Research Summary
The paper introduces VFScale, a novel framework that enables test‑time scaling of diffusion models for complex reasoning without relying on any external verifier. The authors identify two major obstacles to scaling diffusion‑based reasoning: (1) the dependence on an external verifier to evaluate and select among many generated samples, which diverges from the intrinsic, self‑evaluating nature of human System 2 reasoning; and (2) the lack of an efficient search algorithm that can exploit increased compute budgets without suffering diminishing returns.
To address (1), VFScale treats the diffusion model’s learned energy function as an intrinsic verifier. However, a naïve energy function does not reliably correlate with solution quality. The authors therefore propose a new training objective composed of two components. First, the Monotonic‑Regression Negative Contrastive Learning (MRNCL) loss enforces a monotonic relationship between the L2 distance of a sample to the ground‑truth and its energy value. For each positive example, two negative examples at different corruption levels are generated; their noisy versions at a given timestep are fed through the model to obtain energies. A linear regression over the three (distance, energy) points yields a slope kₜ and intercept bₜ. The loss penalizes slopes smaller than a preset margin γ and also penalizes deviations of the actual energies from the fitted line. This encourages “performance‑energy consistency”: lower energy should imply a more accurate solution. Second, a KL‑regularization term (L_KL) smooths the energy landscape by minimizing the KL divergence between the model’s implicit distribution and the true data distribution, back‑propagating through the denoising process. Together with the standard denoising (MSE) loss and a contrastive loss, the full objective is L_total = L_MSE + λ₁L_Contrast + λ₂L_MRNCL + λ₃L_KL. Empirical analysis shows that MRNCL dramatically improves the alignment between energy and solution quality, making the energy function a reliable verifier.
For (2), the authors design a hybrid Monte Carlo Tree Search (hMCTS) that blends Best‑of‑N (BoN) exploration with classic MCTS exploitation. During the early, high‑noise stages of denoising, BoN is used to generate a broad set of candidate solutions in parallel, preserving diversity while quickly discarding obviously poor candidates based on their energy. As the noise level decreases and the search space contracts, the algorithm switches to MCTS, which performs selection, expansion, simulation, and back‑propagation on the remaining candidates. The simulation step leverages variable‑spacing sampling (sub‑sequences of the diffusion schedule) to accelerate rollouts. The UCB scores in MCTS are computed using the intrinsic energy values, thus eliminating any external scoring network. This hybrid approach yields high search efficiency: increasing the number of samples translates into substantial performance gains without a linear increase in computation.
The framework is evaluated on two challenging reasoning domains. In Maze solving, models are trained on mazes up to 6×6 and tested on much larger 15×15 mazes. VFScale solves 88 % of these out‑of‑distribution mazes, whereas a standard diffusion model fails completely. In Sudoku, under out‑of‑distribution conditions with fewer given digits, VFScale achieves a 43 % solve rate compared to 30 % for the baseline. Ablation studies confirm that removing MRNCL destroys performance‑energy consistency, leading to poor selection during test‑time scaling, while replacing hMCTS with pure BoN yields only marginal improvements despite large sample budgets.
Overall, VFScale makes two key contributions: (1) a training scheme (MRNCL + KL regularization) that shapes the diffusion model’s energy landscape into a trustworthy intrinsic verifier, and (2) a scalable inference algorithm (hMCTS) that efficiently converts additional test‑time compute into higher reasoning accuracy. The work bridges the gap between diffusion‑based generative models and human‑like intrinsic reasoning, opening avenues for applying diffusion models to broader combinatorial optimization, scientific simulation, and automated design tasks where external verification is costly or unavailable.
Comments & Academic Discussion
Loading comments...
Leave a Comment