We are not able to identify AI-generated images
📝 Abstract
AI-generated images are now pervasive online, yet many people believe they can easily tell them apart from real photographs. We test this assumption through an interactive web experiment where participants classify 20 images as real or AI-generated. Our dataset contains 120 difficult cases: real images sampled from CC12M, and carefully curated AI-generated counterparts produced with MidJourney. In total, 165 users completed 233 sessions. Their average accuracy was 54%, only slightly above random guessing, with limited improvement across repeated attempts. Response times averaged 7.3 seconds, and some images were consistently more deceptive than others. These results indicate that, even on relatively simple portrait images, humans struggle to reliably detect AI-generated content. As synthetic media continues to improve, human judgment alone is becoming insufficient for distinguishing real from artificial data. These findings highlight the need for greater awareness and ethical guidelines as AI-generated media becomes increasingly indistinguishable from reality.
💡 Analysis
AI-generated images are now pervasive online, yet many people believe they can easily tell them apart from real photographs. We test this assumption through an interactive web experiment where participants classify 20 images as real or AI-generated. Our dataset contains 120 difficult cases: real images sampled from CC12M, and carefully curated AI-generated counterparts produced with MidJourney. In total, 165 users completed 233 sessions. Their average accuracy was 54%, only slightly above random guessing, with limited improvement across repeated attempts. Response times averaged 7.3 seconds, and some images were consistently more deceptive than others. These results indicate that, even on relatively simple portrait images, humans struggle to reliably detect AI-generated content. As synthetic media continues to improve, human judgment alone is becoming insufficient for distinguishing real from artificial data. These findings highlight the need for greater awareness and ethical guidelines as AI-generated media becomes increasingly indistinguishable from reality.
📄 Content
We are not able to identify AI-generated images Adrien Pav˜ao Abstract AI-generated images are now pervasive online, yet many people believe they can easily tell them apart from real photographs. We test this assumption through an interactive web experiment where participants clas- sify 20 images as real or AI-generated. Our dataset contains 120 difficult cases: real images sampled from CC12M, and carefully curated AI-generated counter- parts produced with MidJourney. In total, 165 users completed 233 sessions. Their average accuracy was 54%, only slightly above random guessing, with lim- ited improvement across repeated attempts. Response times averaged 7.3 seconds, and some images were con- sistently more deceptive than others. These results in- dicate that, even on relatively simple portrait images, humans struggle to reliably detect AI-generated con- tent. As synthetic media continues to improve, human judgment alone is becoming insufficient for distinguish- ing real from artificial data. These findings highlight the need for greater awareness and ethical guidelines as AI-generated media becomes increasingly indistinguish- able from reality. 1 Introduction AI-generated content is becoming a dominant part of the online ecosystem. Images, text, music, and even videos produced by machine-learning models now cir- culate alongside authentic human-made content, often without being clearly identified as such. We make the hypothesis that most people overestimate their capacity to reliably spot AI-generated images. This confidence rise from a perception bias: individuals mainly remem- ber the cases where they guessed correctly, while the times they were fooled often pass unnoticed. To mea- sure how well humans can truly distinguish real images from AI-generated ones, we collected an images dataset and built an interactive web experiment, accessible at the following URL: https://adrienpavao.com/RealOrAI Participants are shown a sequence of 20 images and must classify each one as either real photo or AI- generated. Their performance allows us to quantify hu- man accuracy, identify the most deceptive images, and analyze how decision time and prior exposure affect the results. Recent studies was performed on similar proto- cols, on text data [Fiedler and D¨opke, 2025], on images [Roca et al., 2025] and mixing audio, images, and text [Frank et al., 2023]. The results suggest performance only slightly above random guessing. 2 Data and Methods Our dataset is intentionally small (120 images) but de- signed to be difficult. This first version acts as a proof of concept before extending the project to larger im- age sets, as well as other modalities such as music or video. The dataset is balanced, with 60 images depict- ing women and 60 depicting men. Examples of real and AI-generated images are shown in Figure 1. Real images were sampled from the CC12M dataset [Changpinyo et al., 2021]. We selected random entries whose text description started by either “a man” or “a woman”. Obvious celebrities were removed to avoid easy recognition. Images were then filtered for quality and resized to a consistent resolution. Fake images were generated using MidJourney v7 [MidJourney, 2022]. For each selected real image, we extracted its original CC12M text description and im- age dimension and used it as a prompt seed. We ran each prompt several times with similar parameters (for example: –stylize 50, –v 7) and manually selected the most realistic results. This curation step was essen- tial, as many generated candidates still looked artificial or stylized. We acknowledge that this introduces a se- lection bias: the AI-generated images in our dataset are not a random sample, but the most convincing ones among several generations. We consider this bias ac- ceptable, and even desirable, for the purpose of this study. Our goal is to evaluate human performance on the hardest cases, not on obviously synthetic images. Moreover, in real online settings, people who share AI- generated portraits or photographs typically select the most realistic outputs rather than posting the first at- tempt. Our curation process therefore reflects common user behavior. Here are a few examples of prompts used:
- “A woman stands in front of a restaurant door and smiles. –ar 3:4 –style raw –v 7 –stylize 50” 1 arXiv:2512.22236v1 [cs.AI] 23 Dec 2025
- “A man walks through an airport with high ceilings and large windows. –ar 3:2 –style raw –v 7 –stylize 50’
- “A woman walks the runway in a striking red and black dress with a wavy pattern, showcasing her profile. –ar 1667:2500 –style raw –v 7 –stylize 50” Figure 1: Examples from the dataset: three real images (top row) and three AI-generated images (bottom row). Using actual descriptions from the real data as prompts was done in order to ensure a balanced distri- bution in scenery, postures and clothes between real and fake data. We acknowledge that this design introduces a form of bias, where participants expose
This content is AI-processed based on ArXiv data.