The Digital Divide in Generative AI: Evidence from Large Language Model Use in College Admissions Essays

The Digital Divide in Generative AI: Evidence from Large Language Model Use in College Admissions Essays
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models (LLMs) have become popular writing tools among students and may expand access to high-quality feedback for students with less access to traditional writing support. At the same time, LLMs may standardize student voice or invite overreliance. This study examines how adoption of LLM-assisted writing varies across socioeconomic groups and how it relates to outcomes in a high-stakes context: U.S. college admissions. We analyze a de-identified longitudinal dataset of applications to a selective university from 2020 to 2024 (N = 81,663). Estimating LLM use using a distribution-based detector trained on synthetic and historical essays, we tracked how student writing changed as LLM use proliferated, how adoption differed by socioeconomic status (SES), and whether potential benefits translated equitably into admissions outcomes. Using fee-waiver status as a proxy for SES, we observe post-2023 convergence in surface-level linguistic features, with the largest changes in fee-waived and rejected applicants. Estimated LLM use rose sharply in 2024 across all groups, with disproportionately larger increases among lower SES applicants, consistent with an access hypothesis in which LLMs substitute for scarce writing support. However, increased estimated LLM use was more strongly associated with declines in predicted admission probability for lower SES applicants than for higher SES applicants, even after controlling for academic credentials and stylometric features. These findings raise concerns about equity and the validity of essay-based evaluation in an era of AI-assisted writing and provide the first large-scale longitudinal evidence linking LLM adoption, linguistic change, and evaluative outcomes in college admissions.


💡 Research Summary

This paper investigates how the adoption of large language models (LLMs) for writing college admission essays varies across socioeconomic status (SES) and how such adoption influences admission outcomes. Using a de‑identified longitudinal dataset of 81,663 applications to a selective U.S. university from 2020 through 2024, the authors treat the 2020‑2023 cycles as the pre‑ChatGPT era and the 2024 cycle as the post‑ChatGPT era, because ChatGPT was publicly released in late 2022. SES is proxied by fee‑waiver status, a standard indicator of financial need, and a suite of academic and demographic controls (GPA, SAT/ACT, gender, first‑generation status, school type, honors) are included.

Since ground‑truth LLM usage is unavailable, the study builds a distribution‑based detector inspired by prior work on GPT quantification. Human essays from the pre‑ChatGPT cycles form a human reference corpus, while 30,000 synthetic essays generated by GPT‑4o (matched to the observed prompt distribution) form the LLM reference corpus. For each essay, token‑level likelihoods are compared against both references, yielding an essay‑level mixing proportion α̂ ranging from 0 (human‑like) to 1 (LLM‑like). Validation on held‑out synthetic data shows strong calibration, confirming that α̂ captures relative LLM influence. Essays are then categorized as no, low (0 < α̂ ≤ 0.07), medium (0.07 < α̂ ≤ 0.13), or high (α̂ > 0.13) LLM use.

Three research questions guide the analysis. RQ1 examines temporal changes in surface linguistic features (type‑token ratio, Maas TTR, MTLD, HDD, Yule’s K, average word length). The authors find convergence across all SES groups after 2023, with the most pronounced shifts among fee‑waivered and rejected applicants, indicating that LLM adoption homogenizes stylistic patterns. RQ2 tracks LLM adoption over time and across SES. In 2024, estimated LLM use rises sharply for all groups, but the mean α̂ for fee‑waivered applicants is 0.04–0.06 higher than for non‑waivered peers, suggesting that lower‑SES students are using LLMs more intensively—consistent with an “access hypothesis” where AI substitutes for scarce tutoring resources.

RQ3 explores the relationship between estimated LLM use, essay characteristics, and predicted admission probability. Using difference‑in‑differences and multivariate logistic regressions that control for academic credentials and stylometric variables, the authors discover a divergent effect: a 0.1 increase in α̂ reduces the admission probability by roughly 3–4 percentage points for fee‑waivered applicants, whereas the same increase yields less than a 1‑percentage‑point change for non‑waivered applicants. This interaction persists after accounting for lexical diversity and syntactic complexity, implying that LLM‑generated text may be penalized more heavily for lower‑SES candidates. The authors argue that this may stem from evaluators’ concerns about authenticity or from the fact that LLM outputs reflect dominant Western linguistic norms embedded in training data, which may clash with the “voice” expected from disadvantaged applicants.

The paper contributes (1) a large‑scale longitudinal portrait of essay writing before and after the public release of ChatGPT, (2) a validated method for estimating LLM assistance at the document level in the absence of usage logs, and (3) empirical evidence that LLM adoption, while expanding access to writing support, can exacerbate inequities in high‑stakes evaluation. Policy implications include the need for transparent disclosure guidelines for AI‑assisted writing, development of admission‑reader training to recognize AI‑influenced text, and possibly redesigning essay evaluation rubrics to mitigate bias against AI‑enhanced submissions. Without such interventions, generative AI may create a new digital divide: equal access to tools but unequal translation of that access into favorable outcomes.


Comments & Academic Discussion

Loading comments...

Leave a Comment