Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Research on bias in Text-to-Image (T2I) models has primarily focused on demographic representation and stereotypical attributes, overlooking a fundamental question: how does grammatical gender influence visual representation across languages? We introduce a cross-linguistic benchmark examining words where grammatical gender contradicts stereotypical gender associations (e.g., une sentinelle'' - grammatically feminine in French but referring to the stereotypically masculine concept guard’’). Our dataset spans five gendered languages (French, Spanish, German, Italian, Russian) and two gender-neutral control languages (English, Chinese), comprising 800 unique prompts that generated 28,800 images across three state-of-the-art T2I models. Our analysis reveals that grammatical gender dramatically influences image generation: masculine grammatical markers increase male representation to 73% on average (compared to 22% with gender-neutral English), while feminine grammatical markers increase female representation to 38% (compared to 28% in English). These effects vary systematically by language resource availability and model architecture, with high-resource languages showing stronger effects. Our findings establish that language structure itself, not just content, shapes AI-generated visual outputs, introducing a new dimension for understanding bias and fairness in multilingual, multimodal systems.

💡 Research Summary

The paper investigates a previously under‑explored source of bias in text‑to‑image (T2I) generation: the grammatical gender of the prompting language. While most prior work on multimodal bias focuses on demographic stereotypes or dataset imbalances, this study asks whether the structural feature of a language—its assignment of masculine or feminine gender to nouns—can shape the visual output of state‑of‑the‑art T2I models.

To answer this, the authors construct a cross‑lingual benchmark called GRAMVIS. They first identify “gender‑divergent” nouns in five gendered languages (French, Spanish, German, Italian, Russian) where the grammatical gender contradicts the stereotypical human gender associated with the concept (e.g., French une sentinelle ‘guard’ is grammatically feminine but the occupation is culturally masculine). For each language they select 40 such nouns, covering five social dimensions—occupations, personality traits, power dynamics, social status, and relationships—resulting in 200 distinct words. Two gender‑neutral languages (English, Chinese) serve as controls.

Each noun is embedded in a gender‑neutral prompt template (“A photo of a

Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment