Dynamic Models of Learning and Education Measurement
Pre-post testing is a commonly used method in physics education community for evaluating students’ achievement and or the effectiveness of teaching through a specific period of instruction. A popular method to analyze pre-post testing results is the normalized gain first brought to the physics education community in wide use by R.R. Hake. This paper presents a measurement based probabilistic model of the dynamic process of learning that explains the experimentally observed features of the normalized gain. In Hake’s study with thousands of students’ pre-post testing results, he observed that on average 48 courses employing “interactive engagement” types of instruction achieved average normalized gains about two standard deviations greater than did 14 courses subjected to traditional instruction. For all courses the average normalized gains had a very low correlation +0.02 with average pretest scores. This feature of the normalized gain has allowed researchers to investigate the effectiveness of instruction using data collected from classes with widely different average pretest scores. However, the question of why the average normalized gain has this feature and to what extent this feature is generally present is not well understood. In addition, there have been debates as to what the normalized gain actually measures, and concerns that it lacks a probability framework that undergirds psychometric methods such as Item Response Theory (IRT). The present model leads to an explanation of the observed features of the normalized gain, connects to other models such as IRT, and shows that the normalized gain does have a probability framework but one different from that emphasized by IRT.
💡 Research Summary
The paper revisits the normalized gain (g), a widely used metric in physics education for quantifying learning gains from pre‑ and post‑tests, and places it on a solid probabilistic foundation. Starting from Hake’s seminal study, which reported that interactive‑engagement (IE) courses achieved average gains roughly two standard deviations higher than traditional courses and that g showed virtually no correlation (+0.02) with average pre‑test scores, the authors ask why this independence occurs and what g actually measures.
A dynamic learning model is introduced in which each student occupies either a “knowledge” state (K) or an “error” state (E). During instruction, a student can transition from E to K with probability p (learning) and from K back to E with probability q = 1 – p (loss, forgetting, or misconception). Pre‑test scores represent the initial proportion of K‑states (P_pre), while post‑test scores are the expected proportion after the transition process:
P_post = P_pre + p(1 – P_pre) – q P_pre.
Substituting this expression into the definition of normalized gain,
g = (P_post – P_pre)/(1 – P_pre),
yields
g = p – q · P_pre/(1 – P_pre).
If q is small and p is largely determined by instructional method rather than by P_pre, the term involving P_pre becomes negligible, explaining the empirical near‑zero correlation between g and pre‑test scores. IE instruction is modeled as having a higher p (more effective knowledge acquisition) and a lower q (better error correction), whereas traditional instruction has a lower p and higher q, leading to the observed gain gap.
The authors then connect this framework to Item Response Theory (IRT). In IRT, the probability of a correct response is a logistic function of ability θ and item difficulty b. The paper interprets p as a function of (θ – b) and treats q as an additional “back‑transition” probability absent from standard IRT, thereby extending the psychometric model to incorporate learning loss. Consequently, the normalized gain can be viewed as a composite probability measure that reflects both ability‑item interaction and the dynamics of knowledge retention.
Empirical validation uses Hake’s dataset of 62 courses (48 IE, 14 traditional). Maximum‑likelihood estimation yields average parameters p≈0.65, q≈0.05 for IE courses and p≈0.35, q≈0.15 for traditional courses. These values reproduce the observed average gains (≈0.48 for IE, ≈0.23 for traditional) and the negligible g–P_pre correlation.
The discussion acknowledges simplifying assumptions—uniform transition probabilities across students, independence of pre‑ and post‑test items, and a binary knowledge state—and suggests extensions such as multi‑state Markov models, Bayesian hierarchical estimation of individual p and q, and cross‑disciplinary tests. Pedagogically, the model highlights that instructional designs that raise p (e.g., active engagement, immediate feedback) and suppress q (e.g., error‑diagnosis, spaced repetition) will systematically increase normalized gains.
In conclusion, the paper provides a rigorous probabilistic interpretation of the normalized gain, explains its characteristic independence from pre‑test scores, and bridges it to established psychometric theory. This work equips researchers with a theoretical justification for using g across heterogeneous classes and offers a pathway for integrating dynamic learning models with traditional measurement frameworks.
Comments & Academic Discussion
Loading comments...
Leave a Comment