Déjà Vu? Decoding Repeated Reading from Eye Movements
Be it your favorite novel, a newswire article, a cooking recipe or an academic paper – in many daily situations we read the same text more than once. In this work, we ask whether it is possible to automatically determine whether the reader has previously encountered a text based on their eye movement patterns. We introduce two variants of this task and address them with considerable success using both feature-based and neural models. We further introduce a general strategy for enhancing these models with machine generated simulations of eye movements from a cognitive model. Finally, we present an analysis of model performance which on the one hand yields insights on the information used by the models, and on the other hand leverages predictive modeling as an analytic tool for better characterization of the role of memory in repeated reading. Our work advances the understanding of the extent and manner in which eye movements in reading capture memory effects from prior text exposure, and paves the way for future applications that involve predictive modeling of repeated reading.
💡 Research Summary
The paper tackles the previously unaddressed problem of automatically detecting whether a reader is encountering a text for the first time or is rereading it, using only eye‑movement recordings. Two related prediction tasks are defined. In the Single‑Trial Task, a single eye‑movement trace for a given passage must be classified as “first reading” or “repeated reading”. In the Paired‑Trial Task, two traces from the same participant on the same passage are presented in unknown order, and the model must decide which trace corresponds to the first encounter.
The authors employ the publicly available OneStop Eye Movements dataset (Berzak et al., 2025), which contains high‑resolution eyetracking data from 180 native‑English speakers reading Guardian news articles. Each participant reads a batch of ten articles; the last article is reread immediately (consecutive rereading) and one earlier article is reread after a variable number of intervening articles (non‑consecutive rereading). This yields 360 reread instances (180 consecutive, 180 non‑consecutive) covering 1,944 paragraph trials and more than 105 k word tokens.
Two families of models are explored. The feature‑based approach uses 35 handcrafted global features inspired by psycholinguistic literature: eight standard eye‑movement metrics (total fixation duration, first‑fixation duration, gaze duration, fixation count, skip rate, regression rate, etc.), twenty regression coefficients that capture how these metrics vary with word frequency, surprisal, and length, and seven graph‑theoretic measures derived from a directed scan‑path network (centrality, clustering, connectivity). An XGBoost gradient‑boosted tree classifier is trained on these vectors.
The neural approach builds on the RoBERTa‑Eye multimodal language model (Shubi et al., 2024b). Two variants are implemented: RoBERTa‑Eye‑Words, which attaches a 13‑dimensional eye‑movement vector to each word embedding, and RoBERTa‑Eye‑Fixations, which concatenates a 6‑dimensional fixation‑level vector with the word‑level vector for every fixation. Special tokens differentiate textual embeddings from eye‑movement embeddings, allowing the transformer’s attention layers to jointly attend to linguistic and oculomotor information.
A novel contribution is the synthetic scan‑path augmentation. The authors generate “first‑reading” eye‑movement trajectories for every passage using the cognitive model E‑Z Reader (Reichle et al., 1998, 2003, 2009). These synthetic scans serve as a reference for what a typical first encounter looks like. Three representation strategies are tested: (1) concatenating global synthetic features with the difference between synthetic and human features; (2) word‑level concatenation of synthetic and difference vectors; (3) sequence‑level concatenation of human and synthetic fixation streams, with an additional token marking machine‑generated data. The augmented inputs transform the single‑trial problem into a paired‑style problem, enabling the same architecture used for the Paired‑Trial Task.
Results show that both modeling families achieve well‑above‑chance performance. In the Single‑Trial Task, the XGBoost model reaches ~71 % accuracy, RoBERTa‑Eye‑Words ~73 %, and RoBERTa‑Eye‑Fixations ~75 %. Adding synthetic scan‑paths improves accuracy by 5–7 percentage points for the feature‑based and neural models alike. In the Paired‑Trial Task, accuracies are higher overall (XGBoost 84 %, RoBERTa‑Eye variants 86 %) because the relative ordering information is easier to learn; synthetic augmentation yields modest but consistent gains (2–3 %).
Analyses of error patterns reveal that inter‑reading interval (k) matters: non‑consecutive rereads (larger k) are harder to distinguish, though performance remains above 60 %. Passage position also influences difficulty; later paragraphs exhibit stronger fixation‑count reductions and higher skip rates, aiding classification. Feature‑importance inspection shows that global fixation count, skip rate, and regression rate are the strongest predictors, while word‑level regression coefficients contribute less but still provide useful signals of reduced lexical processing in rereads.
The study demonstrates that eye‑movement behavior encodes sufficient information about a reader’s memory state to allow reliable decoding at the level of individual reader‑text pairs. By integrating cognitive‑model‑generated synthetic data, the authors bridge the gap between descriptive psycholinguistic findings and predictive machine‑learning applications.
Implications and future work include potential deployment in e‑learning platforms (e.g., detecting whether a learner has already seen a passage to adapt instruction), personalized reading assistance, and deeper cognitive modeling of memory effects. The authors suggest extending the approach to other languages, richer text genres, and multimodal signals such as EEG, as well as comparing alternative cognitive simulators (e.g., SWIFT) to improve synthetic scan‑path realism.
In sum, the paper provides a comprehensive methodological framework, strong empirical evidence, and a clear path toward practical applications for decoding repeated reading from eye movements.
Comments & Academic Discussion
Loading comments...
Leave a Comment