Do readers prefer AI-generated Italian short stories?
This study investigates whether readers prefer AI-generated short stories in Italian over one written by a renowned Italian author. In a blind setup, 20 participants read and evaluated three stories, two created with ChatGPT-4o and one by Alberto Moravia, without being informed of their origin. To explore potential influencing factors, reading habits and demographic data, comprising age, gender, education and first language, were also collected. The results showed that the AI-written texts received slightly higher average ratings and were more frequently preferred, although differences were modest. No statistically significant associations were found between text preference and demographic or reading-habit variables. These findings challenge assumptions about reader preference for human-authored fiction and raise questions about the necessity of synthetic-text editing in literary contexts.
💡 Research Summary
The paper investigates whether readers prefer AI‑generated short stories in Italian over a story written by the renowned author Alberto Moravia. In a blind experiment, twenty volunteers from a public library in Mortara, Italy, were asked to read three Italian short stories of comparable length (approximately 1300–1800 words). Two of the texts were produced by ChatGPT‑4o using prompts designed to emulate Moravia’s style, while the third was the original Moravia story “L’incosciente.” The texts were anonymised with geometric symbols (oval, hexagon, star) and presented in random order; participants were unaware of any AI involvement.
Each participant rated how much they liked each story on a 0‑10 scale and provided free‑form comments explaining their scores. Demographic information (age, gender, education, first language) and reading‑habit data (frequency, typical and preferred reading material) were also collected, though only fifteen participants completed this questionnaire. Two participants who failed to rate all three stories were excluded, leaving eighteen valid responses.
Quantitative results show that the AI‑generated stories received slightly higher average scores (hexagon 7.33 ± 2.32, star 7.42 ± 1.93) than the human‑authored story (oval 6.83 ± 1.71). In terms of “first‑place” rankings, the hexagon story was chosen nine times (94 % of participants), the star story seven times (83 %), and the oval story only six times (17 %). Although the AI texts outperformed the human text, the differences are modest (≈0.5–0.6 points) and, given the small sample size, do not reach conventional statistical significance.
Statistical analyses of demographic and reading‑habit variables employed Fisher’s exact test and the Fisher‑Freeman‑Halton exact test. All p‑values exceeded .05, indicating no significant association between age group, gender, education level, first language, reading frequency, typical reading genre, or preferred reading genre and the scores assigned to any of the three stories. The only near‑significant finding was a gender effect for the star story (p = .057), with all female participants rating it highly, but this result is tentative due to limited power.
Qualitative comments reveal nuanced perceptions. The human‑authored story was praised for narrative flow, dialogue, humor, and psychological depth, yet criticized for slow pacing and verbosity. The AI‑generated texts were lauded for fluid prose, emotional resonance, and clear structure, while some readers noted clichés, moralizing tones, predictability, or stylistic repetition. Notably, participants associated the AI stories with Italian literary figures such as Cesare Pavese and Italo Calvino, whereas the human story evoked the English author Nick Hornby, suggesting that the AI imitations may have appeared more “traditionally Italian” to the readers.
The authors acknowledge several limitations. The participant pool was small, geographically confined, and partially familiar with the researcher, potentially introducing bias. Only 75 % completed the demographic questionnaire, reducing the robustness of the covariate analyses. The texts all derived from the same source material, limiting genre and thematic diversity, and the original Moravia story dates from the 1950s, which may affect perceived relevance. Moreover, the study did not directly assess the impact of synthetic‑text editing (STE); it only inferred that the lack of a clear preference for edited AI text could question the necessity of STE for this type of literature.
In conclusion, the study provides preliminary evidence that AI‑generated Italian short fiction can be received at least as favorably as a classic human‑authored piece, and that reader preferences do not appear to be driven by age, gender, education, native language, or reading habits. However, due to the modest sample and methodological constraints, the findings cannot be generalized without further research. Future work should involve larger, more diverse samples, a broader range of genres and lengths, and a comparative design that includes both professional literary critics and casual readers. Additionally, experiments that directly compare edited versus unedited AI‑generated texts within the same participant group would be valuable for evaluating the practical need for synthetic‑text editing in literary production.
Comments & Academic Discussion
Loading comments...
Leave a Comment