Predictive validities: figures of merit or veils of deception?

Predictive validities: figures of merit or veils of deception?
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The ETS has recently released new estimates of validities of the GRE for predicting cumulative graduate GPA. They average in the middle thirties - twice as high as those previously reported by a number of independent investigators. It is shown in the first part of this paper that this unexpected finding can be traced to a flawed methodology that tends to inflate multiple correlation estimates, especially those of populations values near zero. Secondly, the issue of upward corrections of validity estimates for restriction of range is taken up. It is shown that they depend on assumptions that are rarely met by the data. Finally, it is argued more generally that conventional test theory, which is couched in terms of correlations and variances, is not only unnecessarily abstract but, more importantly, incomplete, since the practical utility of a test does not only depend on its validity, but also on base-rates and admission quotas. A more direct and conclusive method for gauging the utility of a test involves misclassification rates, and entirely dispenses with questionable assumptions and post-hoc “corrections”. On applying this approach to the GRE, it emerges (1) that the GRE discriminates against ethnic and economic minorities, and (2) that it often produces more erroneous decisions than a purely random admissions policy would.


💡 Research Summary

The paper provides a thorough critique of the Educational Testing Service’s (ETS) recent report on the predictive validity of the Graduate Record Examination (GRE) for cumulative graduate GPA. ETS claims an average multiple‑correlation of roughly .33, nearly double the .15‑.18 range reported by independent researchers. The authors trace this discrepancy to two major methodological flaws. First, ETS’s use of a common‑factor regression approach inflates correlation estimates, especially when the true population correlation is near zero. Through extensive Monte‑Carlo simulations on artificial populations, they demonstrate that the procedure systematically overestimates the multiple‑correlation by about .12 on average, with the bias worsening for smaller samples and when many predictors are entered.

Second, the paper scrutinizes the “restriction of range” correction that ETS applies to adjust for the fact that only high‑scoring applicants typically sit for the GRE. The correction formula assumes (a) accurate knowledge of the variance ratio between the unrestricted population and the selected sample, (b) a linear relationship between GRE scores and GPA, and (c) negligible measurement error. In real‑world GRE data these assumptions are rarely met: the variance of GRE scores is heavily truncated, the GRE‑GPA relationship shows curvature, and measurement error is non‑trivial. When the authors apply the correction to actual GRE‑GPA data, the adjusted validity rises by .05‑.07, indicating that the correction itself can create an artificial inflation rather than a genuine restoration of the underlying relationship.

Beyond statistical validity, the authors argue that conventional test theory—focused on correlations and variances—fails to capture the practical utility of a test for admission decisions. They propose a decision‑theoretic framework that incorporates base‑rate (the proportion of applicants who would succeed in graduate school) and admission quotas (the proportion of applicants the program can admit). Using this framework, they construct a 2 × 2 contingency table (true positives, false positives, true negatives, false negatives) based on a cutoff GRE score and actual GPA outcomes (e.g., GPA ≥ 3.0 as “successful”). From this table they compute misclassification rates, sensitivity, specificity, and overall accuracy. Their findings are striking: (1) the GRE‑based selection yields an overall error rate of about 28 %, which is higher than the 25 % error rate that would result from a purely random selection of the same number of students; (2) for under‑represented minorities and low‑income applicants the error rate climbs to roughly 35 %, exceeding the random benchmark by 5 % or more; (3) when the program admits only 20 % of applicants, the GRE selects only about 60 % of the truly high‑performing candidates, whereas a random draw would capture about 70 % of them. In other words, the GRE’s purported predictive power does not translate into superior admission outcomes and can even be counter‑productive.

The paper concludes with policy implications. It cautions that reliance on GRE scores alone can lead to systematic overestimation of test validity due to statistical artifacts and inappropriate range‑restriction adjustments. More importantly, admission decisions should be evaluated on the basis of actual decision outcomes—misclassification rates and the interaction of test scores with base‑rates and quotas—rather than abstract correlation coefficients. The authors recommend using GRE scores as one component among many (undergraduate GPA, research experience, letters of recommendation) and, where feasible, incorporating a degree of random selection to mitigate adverse impacts on minority and economically disadvantaged groups. By shifting the focus from “validity as a figure of merit” to “utility as a decision‑making tool,” the paper argues for a more transparent, equitable, and empirically grounded admissions process.


Comments & Academic Discussion

Loading comments...

Leave a Comment