Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment

Reading time: 2 minute
...

📝 Original Info

  • Title: Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
  • ArXiv ID: 2512.07702
  • Date: 2025-12-08
  • Authors: Sangha Park, Eunji Kim, Yeongtak Oh, Jooyoung Choi, Sungroh Yoon

📝 Abstract

Despite substantial progress in text-to-image generation, achieving precise text-image alignment remains challenging, particularly for prompts with rich compositional structure or imaginative elements. To address this, we introduce Negative Prompting for Image Correction (NPC), an automated pipeline that improves alignment by identifying and applying negative prompts that suppress unintended content. We begin by analyzing cross-attention patterns to explain why both targeted negatives-those directly tied to the prompt's alignment error-and untargeted negatives-tokens unrelated to the prompt but present in the generated image-can enhance alignment. To discover useful negatives, NPC generates candidate prompts using a verifier-captioner-propo...

📄 Full Content

Text-to-image (T2I) generation has surged with the proliferation of large-scale diffusion models [22,48]-including Stable Diffusion [11,49,54], DALL-E 3 [3], and FLUX [26]-which have revolutionized content creation by enabling diverse, photorealistic images from natural language. Despite these advances, models still frequently fail to satisfy prompt alignment in practice, particularly for prompts with rich compositional structure (multiple objects, attributes, relations, and counts) [18,28] and surreal instructions (e.g., a square soccer ball) [21,45]. †Corresponding author To improve alig

…(Content truncated for length.)

📸 Image Gallery

fig1_v3.png npo.png prompt_based.png prompts.png rank_new.png snr_analysis_v4.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut