테스트 시점 정렬을 위한 널텍스트 접근
📝 Abstract
Test-time alignment (TTA) aims to adapt models to specific rewards during inference. However, existing methods tend to either under-optimise or over-optimise (reward hack) the target reward function. We propose Null-Text Test-Time Alignment (Null-TTA), which aligns diffusion models by optimising the unconditional embedding in classifier-free guidance, rather than manipulating latent or noise variables. Due to the structured semantic nature of the text embedding space, this ensures alignment occurs on a semantically coherent manifold and prevents reward hacking (exploiting non-semantic noise patterns to improve the reward). Since the unconditional embedding in classifier-free guidance serves as the anchor for the model’s generative distribution, Null-TTA directly steers model’s generative distribution towards the target reward rather than just adjusting the samples, even without updating model parameters. Thanks to these desirable properties, we show that Null-TTA achieves state-of-the-art target test-time alignment while maintaining strong cross-reward generalisation. This establishes semantic-space optimisation as an effective and principled novel paradigm for TTA.
💡 Analysis
Test-time alignment (TTA) aims to adapt models to specific rewards during inference. However, existing methods tend to either under-optimise or over-optimise (reward hack) the target reward function. We propose Null-Text Test-Time Alignment (Null-TTA), which aligns diffusion models by optimising the unconditional embedding in classifier-free guidance, rather than manipulating latent or noise variables. Due to the structured semantic nature of the text embedding space, this ensures alignment occurs on a semantically coherent manifold and prevents reward hacking (exploiting non-semantic noise patterns to improve the reward). Since the unconditional embedding in classifier-free guidance serves as the anchor for the model’s generative distribution, Null-TTA directly steers model’s generative distribution towards the target reward rather than just adjusting the samples, even without updating model parameters. Thanks to these desirable properties, we show that Null-TTA achieves state-of-the-art target test-time alignment while maintaining strong cross-reward generalisation. This establishes semantic-space optimisation as an effective and principled novel paradigm for TTA.
📄 Content
Test-Time Alignment of Text-to-Image Diffusion Models via Null-Text Embedding Optimisation Taehoon Kim1*, Henry Gouk1, Timothy Hospedales1,2 1School of Informatics, University of Edinburgh 2Samsung AI Center, Cambridge Abstract Test-time alignment (TTA) aims to adapt models to specific rewards during inference. However, existing methods tend to either under-optimise or over-optimise (reward hack) the target reward function. We propose Null-Text Test-Time Alignment (Null-TTA), which aligns diffusion models by optimising the unconditional embedding in classifier-free guidance, rather than manipulating latent or noise vari- ables. Due to the structured semantic nature of the text embedding space, this ensures alignment occurs on a se- mantically coherent manifold and prevents reward hacking (exploiting non-semantic noise patterns to improve the re- ward). Since the unconditional embedding in classifier- free guidance serves as the anchor for the model’s gener- ative distribution, Null-TTA directly steers model’s genera- tive distribution towards the target reward rather than just adjusting the samples, even without updating model param- eters. Thanks to these desirable properties, we show that Null-TTA achieves state-of-the-art target test-time align- ment while maintaining strong cross-reward generalisation. This establishes semantic-space optimisation as an effective and principled novel paradigm for TTA.
- Introduction Diffusion models [12, 25, 28] have demonstrated re- markable ability in modelling complex data distributions across various domains, including images [4, 26] and lan- guage [17]. Despite their generative power, these models are trained on large-scale, uncurated web datasets that often contain undesirable or misaligned content. Consequently, aligning pre-trained diffusion models with human values or target objectives—such as aesthetic quality or prefer- ence—is essential for trustworthy deployment of diffusion models in real world applications. *Correspondence to: Taehoon Kim, t.kim-16@sms.ed.ac.uk Existing approaches for alignment can be broadly di- vided into two categories: fine-tuning based alignment and test-time alignment (TTA). (1) Fine-tuning methods [1, 3, 5, 7, 16, 21, 31, 36] directly modify model parameters to opti- mise a reward function but are computationally expensive and prone to reward over-optimisation, where the model overfits to proxy rewards and loses generalisation across multiple rewards or output diversity. (2) TTA methods optimise latent or noise variables to maximise a differen- tiable reward functions [2, 9, 10, 13, 18, 24, 29, 34]; sam- ple from intractable reward-conditioned posteriors via Se- quential Monte Carlo (SMC) [6, 13, 32]; or explore denois- ing trajectories using discrete search algorithms [18, 24]. But these suffer from reward under/over-optimisation due to highly unstructured nature of latent/noise spaces, ineffi- ciency of SMC and vast search space. To address these limitations we draw inspiration from recent advances in diffusion model editing that leverage optimisation in the text-conditioning space. In particular, Null-Text Inversion (NTI) [19] demonstrated that optimising the unconditional, or null-text, embedding in Classifier-Free Guidance (CFG) [11] allows fine-grained semantic control while preserving image fidelity. Building on this insight, we extend the principle of null-text optimisation beyond image editing and reformulate it as a general mechanism for re- ward alignment in diffusion models. To be specific, we propose Null-Text Test-Time Align- ment (Null-TTA), a novel TTA framework for text-to- image diffusion models that effectively aligns pre-trained models without suffering from either under- or over- optimisation. Instead of manipulating latent or noise vari- ables, we optimise the unconditional (null) text embedding within CFG, which serves as the geometric anchor for the conditional generative distribution. By shifting optimisa- tion to this structured semantic space, Null-TTA ensures optimisation occurs on a structured semantically manifold. This prevents the over-optimisation reward-hacking phe- nomenon – generating non-semantic noise patterns that op- timise the target reward only – while degrading other re- 1 arXiv:2511.20889v1 [cs.CV] 25 Nov 2025 ward metrics, sample diversity, and even subjective visual quality. Null-TTA combined with our principled objective realigns the model’s generative distribution towards the tar- get reward conditioned distribution, rather than simply cor- recting the samples in a way that increases target reward, without updating model parameters. This ensures effective and efficient alignment to target rewards. Our main contributions are summarised as follows: • Null-Text Test-Time Alignment (Null-TTA), a training- free framework that performs alignment in the structured semantic space of text conditioning. • A principled objective that reorients the model’s genera- tive distribution itself via n
This content is AI-processed based on ArXiv data.