Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment

February 09, 2026

Reading time: 2 minute

...

📝 Original Info

Title: Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
ArXiv ID: 2512.07702
Date: 2025-12-08
Authors: Sangha Park, Eunji Kim, Yeongtak Oh, Jooyoung Choi, Sungroh Yoon

📝 Abstract

Despite substantial progress in text-to-image generation, achieving precise text-image alignment remains challenging, particularly for prompts with rich compositional structure or imaginative elements. To address this, we introduce Negative Prompting for Image Correction (NPC), an automated pipeline that improves alignment by identifying and applying negative prompts that suppress unintended content. We begin by analyzing cross-attention patterns to explain why both targeted negatives-those directly tied to the prompt's alignment error-and untargeted negatives-tokens unrelated to the prompt but present in the generated image-can enhance alignment. To discover useful negatives, NPC generates candidate prompts using a verifier-captioner-propo...

📄 Full Content

Text-to-image (T2I) generation has surged with the proliferation of large-scale diffusion models [22,48]-including Stable Diffusion [11,49,54], DALL-E 3 [3], and FLUX [26]-which have revolutionized content creation by enabling diverse, photorealistic images from natural language. Despite these advances, models still frequently fail to satisfy prompt alignment in practice, particularly for prompts with rich compositional structure (multiple objects, attributes, relations, and counts) [18,28] and surreal instructions (e.g., a square soccer ball) [21,45]. †Corresponding author To improve alig

…(Content truncated for length.)

📄 Read Full PDF on ArXiv