Hierarchical Resource Rationality Explains Human Reading Behavior

Hierarchical Resource Rationality Explains Human Reading Behavior
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Reading is a pervasive and cognitively demanding activity that underpins modern human culture. It is a prime instance of a class of tasks where eye movements are coordinated for the purpose of comprehension. Existing theories explain either eye movements or comprehension during reading, but the critical link between the two remains unclear. Here, we propose resource-rational optimization as a unifying principle governing adaptive reading behavior. Eye movements are selected to maximize expected comprehension while minimizing cognitive and temporal costs, organized hierarchically across nested time scales: fixation decisions support word recognition; sentence-level integration guides skipping and regression; and text-level comprehension goals shape memory construction and rereading. A computational implementation successfully replicates an unprecedented range of findings in human reading, from lexical effects to comprehension outcomes. Together, these results suggest that resource rationality provides a general mechanism for coordinating perception, memory, and action in knowledge-intensive human behaviors, offering a principled account of how complex cognitive skills adapt to limited resources.


💡 Research Summary

The paper presents a unified computational account of human reading that simultaneously explains eye‑movement control and text comprehension. The authors argue that reading is a sequential information‑sampling process in which each fixation provides noisy evidence that updates the reader’s internal belief about word identity, sentence meaning, and overall discourse. Because visual, attentional, memory, and temporal resources are limited, readers behave in a resource‑rational manner: they choose eye‑movement actions that maximize expected comprehension utility while minimizing the costs associated with visual processing and time.

To capture this intuition, the authors develop a hierarchical control architecture consisting of three nested levels—word, sentence, and text—each formalized as a Partially Observable Markov Decision Process (POMDP). At the word level, the agent maintains a Bayesian belief distribution over candidate words. Each fixation samples a subset of letters; the belief is updated via Bayes’ rule, and fixation duration is proportional to the remaining uncertainty (entropy). Longer words, lower frequency, and lower predictability produce flatter priors, higher uncertainty, and thus longer or multiple fixations.

At the sentence level, the controller decides whether to continue reading the current word, skip ahead, or regress to a previous word. This decision balances the expected information gain for short‑term memory integration against the cost of additional eye movements. The model reproduces classic empirical patterns: high‑frequency, short, and predictable words are skipped more often, while difficult words trigger regressions.

At the text level, the highest‑order controller monitors long‑term memory coherence, prior knowledge, and overall discourse structure. It can direct the sentence‑level controller to reread a sentence, skip forward, or continue forward, thereby implementing a strategic repair mechanism when comprehension drops. The model predicts that prior knowledge and discourse coherence improve recall and reduce the need for regressions, whereas low initial appraisal of a passage leads to targeted regressions.

Each level is trained independently using deep reinforcement learning with a reward function composed of two terms: (1) a comprehension utility U that reflects successful integration of information into memory, and (2) a resource cost C that penalizes eye‑movement effort and elapsed time. After training, the three policies are integrated, yielding a single agent that can simulate natural reading behavior across multiple time scales.

The authors evaluate the model against a newly collected English reading dataset obtained under varying time‑pressure conditions. They compare simulated and human data on four major phenomena: (a) gaze‑duration effects of word length, frequency, and predictability; (b) skip and regression rates at the sentence level; (c) text‑level comprehension outcomes such as multiple‑choice accuracy and free‑recall performance; and (d) adaptive changes in eye‑movement patterns under increasing time pressure. In all cases, the model reproduces the direction and approximate magnitude of the human effects, with regression lines and confidence intervals closely matching empirical observations.

Key contributions include: (i) a principled, resource‑rational formulation that unifies eye‑movement control and comprehension; (ii) a hierarchical POMDP framework that captures uncertainty at multiple linguistic levels; (iii) a demonstration that a single computational principle can account for a broad set of lexical, syntactic, and discourse‑level findings; and (iv) evidence that readers dynamically reallocate limited resources (visual attention, memory, time) to maximize overall utility.

Limitations are acknowledged. The model is currently validated only on English texts and on a specific experimental paradigm; cross‑linguistic generalization remains to be tested. The reward design abstracts away from subjective satisfaction, long‑term learning, and affective factors, which may influence real‑world reading strategies. Moreover, while the hierarchical architecture mirrors neurocognitive hierarchies, direct neural validation (e.g., fMRI or EEG signatures) is not provided.

In conclusion, the hierarchical resource‑rational model offers a powerful theoretical and computational tool for understanding complex, knowledge‑intensive behaviors such as reading. By explicitly modeling the trade‑offs between information gain and resource expenditure across nested time scales, it explains how humans efficiently sample visual input, update memory, and integrate discourse under realistic constraints. Future work can extend the framework to other languages, incorporate richer neurocognitive data, and explore how longer‑term educational outcomes emerge from these moment‑to‑moment optimization processes.


Comments & Academic Discussion

Loading comments...

Leave a Comment