From Native Memes to Global Moderation: Cross-Cultural Evaluation of Vision-Language Models for Hateful Meme Detection

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Cultural context profoundly shapes how people interpret online content, yet vision-language models (VLMs) remain predominantly trained through Western or English-centric lenses. This limits their fairness and cross-cultural robustness in tasks like hateful meme detection. We introduce a systematic evaluation framework designed to diagnose and quantify the cross-cultural robustness of state-of-the-art VLMs across multilingual meme datasets, analyzing three axes: (i) learning strategy (zero-shot vs. one-shot), (ii) prompting language (native vs. English), and (iii) translation effects on meaning and detection. Results show that the common ``translate-then-detect’’ approach deteriorate performance, while culturally aligned interventions - native-language prompting and one-shot learning - significantly enhance detection. Our findings reveal systematic convergence toward Western safety norms and provide actionable strategies to mitigate such bias, guiding the design of globally robust multimodal moderation systems.

💡 Research Summary

The paper “From Native Memes to Global Moderation: Cross‑Cultural Evaluation of Vision‑Language Models for Hateful Meme Detection” addresses a critical gap in multimodal content moderation: most vision‑language models (VLMs) are trained on predominantly Western or English‑centric data, which limits their fairness and robustness when deployed in culturally diverse environments. To diagnose and quantify this limitation, the authors construct a comprehensive evaluation framework that varies three orthogonal dimensions: (i) learning strategy (zero‑shot versus one‑shot), (ii) prompting language (native‑language prompts versus English prompts), and (iii) the effect of translating meme captions into other languages (the “translate‑then‑detect” pipeline).

The study leverages six publicly available meme datasets that are native to distinct linguistic and cultural ecosystems: Arabic (Prop2Hate, 3,061 memes), Bengali (BHM, 6,852 memes), English (HateMeme, 5,029 memes), German (GerMemeHate, 179 memes), Italian (DANKMEMES, 1,000 memes), and Spanish (DIMEMEX, 2,263 memes). Each dataset preserves its original label distribution, reflecting the specific social and political nuances of its community (e.g., Arabic memes contain only 13 % hate labels, whereas German memes contain 58 % hate). By keeping the original images untouched and only translating the textual captions with Google Translate, the authors isolate the linguistic component of cross‑cultural transfer while preserving visual cues that often carry cultural meaning.

The model suite includes both general‑purpose VLMs—Gemini‑2.5‑Flash, GPT‑4o‑Mini, CogVLM2, Qwen 2.5‑VL, InstructBLIP, and LLaMA‑4‑Maverick—and task‑specific hateful‑meme detectors—Pro‑Cap and PromptHate—fine‑tuned on each language’s dataset. General models are primarily evaluated in zero‑shot mode to assess intrinsic multimodal reasoning, while task‑specific models are examined under both zero‑shot and one‑shot conditions. One‑shot prompts consist of a single, culturally representative example per label, selected by native speakers, allowing the authors to measure the benefit of minimal in‑context learning.

Key empirical findings are as follows:

Translate‑then‑detect degrades performance – When captions are machine‑translated into a target language while the image remains unchanged, average accuracy drops by 7–15 percentage points across models. The degradation is especially pronounced for languages with complex morphology (Arabic, Bengali) where translation errors obscure idiomatic or sarcastic cues embedded in the text.
Native‑language prompting improves detection – Using prompts written in the meme’s original language yields a consistent 4–9 % increase in macro‑F1 over English prompts. The gain is largest for low‑resource languages with rich morphology, suggesting that the VLM’s tokenizer and language model components benefit from direct exposure to native lexical patterns.
One‑shot in‑context learning yields additional gains – Providing a single, well‑chosen example per class improves performance by 5–12 % relative to pure zero‑shot. This effect is amplified when the example is culturally resonant, indicating that VLMs can leverage minimal contextual grounding to align visual and textual semantics with local norms.
Model scale correlates with overall performance but not with consistency – Larger models (e.g., Gemini‑2.5‑Flash) achieve higher average scores, yet their performance variance across languages remains substantial. Smaller models (e.g., InstructBLIP‑Vicuna‑7B) exhibit uniformly low scores, confirming a tiered hierarchy where scale matters but does not guarantee cross‑cultural stability.
Systematic Western bias across all models – Regardless of architecture, VLMs tend to over‑penalize content that deviates from Western safety standards (e.g., political satire, religious symbolism) while under‑detecting culturally specific hate expressions. Combining native prompts with one‑shot examples partially mitigates this bias, but residual disparities persist.

The authors conclude that effective global moderation requires more than simply translating content; it demands culturally aligned interaction strategies. They recommend three practical interventions: (a) employ native‑language prompts whenever possible, (b) incorporate at least one culturally representative in‑context example, and (c) avoid reliance on “translate‑then‑detect” pipelines that strip away visual‑textual interplay. The proposed evaluation framework, which systematically varies learning paradigm, prompt language, and content representation, offers a reproducible benchmark for future research on VLM cultural robustness. By exposing the limitations of current models and providing actionable mitigation strategies, the paper advances the field toward fairer, more inclusive multimodal safety systems that can operate reliably across the world’s diverse online cultures.

From Native Memes to Global Moderation: Cross-Cultural Evaluation of Vision-Language Models for Hateful Meme Detection

💡 Research Summary

Comments & Academic Discussion

Leave a Comment