Can LLMs Cook Jamaican Couscous? A Study of Cultural Novelty in Recipe Generation

Can LLMs Cook Jamaican Couscous? A Study of Cultural Novelty in Recipe Generation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models (LLMs) are increasingly used to generate and shape cultural content, ranging from narrative writing to artistic production. While these models demonstrate impressive fluency and generative capacity, prior work has shown that they also exhibit systematic cultural biases, raising concerns about stereotyping, homogenization, and the erasure of culturally specific forms of expression. Understanding whether LLMs can meaningfully align with diverse cultures beyond the dominant ones remains a critical challenge. In this paper, we study cultural adaptation in LLMs through the lens of cooking recipes, a domain in which culture, tradition, and creativity are tightly intertwined. We build on the \textit{GlobalFusion} dataset, which pairs human recipes from different countries according to established measures of cultural distance. Using the same country pairs, we generate culturally adapted recipes with multiple LLMs, enabling a direct comparison between human and LLM behavior in cross-cultural content creation. Our analysis shows that LLMs fail to produce culturally representative adaptations. Unlike humans, the divergence of their generated recipes does not correlate with cultural distance. We further provide explanations for this gap. We show that cultural information is weakly preserved in internal model representations, that models inflate novelty in their production by misunderstanding notions such as creativity and tradition, and that they fail to identify adaptation with its associated countries and to ground it in culturally salient elements such as ingredients. These findings highlight fundamental limitations of current LLMs for culturally oriented generation and have important implications for their use in culturally sensitive applications.


💡 Research Summary

The paper investigates whether large language models (LLMs) can meaningfully adapt cooking recipes across cultures, using Jamaican couscous as a test case. Building on the GlobalFusion dataset, which contains 500 dishes and human‑generated variations from 130 countries, the authors create an extended benchmark called LLMFusion. For each dish, they prompt eight different LLMs (including Meta‑Llama‑3‑70B‑Instruct, Gemma‑2‑27B‑IT, Falcon‑40B, Orion‑14B‑Chat, Phi‑4‑multimodal‑instruct, Gemma‑3‑27B‑IT, Qwen2.5‑32B‑Instruct, and Qwen3‑30B‑A3B‑Instruct‑2507) to generate a “novel, authentic, traditional” version of the recipe for every target nationality. The prompts are carefully engineered with keyword definitions and brief cultural background notes to reduce prompt sensitivity, and the output format is fixed (title, ingredient list, step‑by‑step instructions) to match the human data.

To assess cultural adaptation, the authors adopt five Jensen‑Shannon Divergence‑based metrics originally proposed by Carichon et al. (2025): Cultural Newness (proportion of words that appear or disappear), Cultural Uniqueness (distance from a prototypical view of the target culture), Cultural Difference (overall distance from the cultural knowledge base), Cultural Surprise (violation of expected attribute combinations), and Cultural Divergent Surprise (PMI‑based divergence between expected and observed term co‑occurrences). These metrics capture both lexical novelty and deeper cultural grounding.

The empirical results reveal a stark contrast between humans and LLMs. Human‑generated variations show a strong positive correlation (ρ≈0.6–0.7) between cultural distance (derived from the Inglehart‑Welzel map, linguistic, religious, and geographic distances) and all five divergence metrics. In contrast, LLM‑generated recipes exhibit negligible correlation (ρ≈0.1 or lower). Some models even produce higher “novelty” scores when the target culture is closer to the source culture, indicating a misalignment with the theoretical expectation that cultural distance should drive perceived novelty.

Internal representation analysis—clustering of token embeddings and cosine similarity of culture‑specific vocabularies—shows that the models do not form distinct clusters for different cuisines, suggesting that cultural knowledge is only weakly encoded. Moreover, the presence of “novel”, “creative”, and “unique” in the prompts appears to push models toward surface‑level novelty: they frequently introduce entirely new ingredients while discarding core elements of the source dish. For example, the Jamaican adaptation of Moroccan couscous generated by the LLMs relies almost exclusively on Jamaican staples (jerk seasoning, chicken, Scotch bonnet) and omits the characteristic couscous grains and Moroccan spice blends, whereas human adaptations blend ingredients from both traditions.

The authors attribute these gaps to three main factors: (1) training data bias toward Western culinary language, (2) insufficient explicit cultural grounding during pre‑training, and (3) prompt design that overemphasizes “newness” without clarifying that cultural authenticity should be preserved. They argue that current LLMs lack the internal mechanisms to balance creativity with tradition in a culturally aware way.

To close the gap, the paper proposes several future directions: (i) augment pre‑training corpora with culturally diverse culinary texts and explicit cultural ontologies, (ii) incorporate cultural distance information directly into prompts or model inputs (e.g., as auxiliary embeddings), and (iii) develop human‑in‑the‑loop evaluation pipelines where cultural experts verify the authenticity of generated recipes.

Overall, the study provides a rigorous, multi‑metric evaluation of cultural adaptation in generative language models, demonstrating that despite impressive fluency, today’s LLMs fall short of producing culturally coherent innovations. The findings caution against deploying LLMs in culturally sensitive applications without additional safeguards and highlight the need for research that explicitly models cultural knowledge and its interaction with creativity.


Comments & Academic Discussion

Loading comments...

Leave a Comment