Attenuation Bias with Latent Predictors
Many core concepts in political science are latent and therefore can only be measured with error. Measurement error in a predictor attenuates slope coefficient estimates in regression, biasing them toward zero. We show that widely used strategies for correcting attenuation bias – including instrumental variables and the method of composition – are themselves biased when applied to latent regressors, sometimes even more than simple regression ignoring the measurement error altogether. We derive a correlation-based correction using split-sample measurement strategies. Rather than assuming a particular estimation strategy for the latent trait, our approach is modular and can be easily deployed with a wide variety of latent trait measurement strategies, including additive score, factor, or machine learning models, requiring no joint estimation while yielding consistent slopes under standard assumptions. Simulations and applications show stronger relationships after our correction, sometimes by as much as 50%. Open-source software implements the procedure. Results underscore that latent predictors demand tailored error correction; otherwise, conventional practice can exacerbate bias.
💡 Research Summary
The paper addresses a pervasive problem in political‑science and related social‑science research: the use of latent constructs (e.g., political knowledge, ideology, democracy indices) as predictors in linear regressions. Because these constructs are not directly observable, researchers must estimate them from observed indicators, introducing measurement error that attenuates regression coefficients toward zero. The authors first revisit the classical errors‑in‑variables framework, where an observed regressor (\tilde X = X + U) leads to the familiar attenuation factor (\lambda = \sigma_X^2/(\sigma_X^2+\sigma_U^2)).
When the predictor is latent, an additional identification step is required: the estimated latent scores are standardized to have mean 0 and variance 1. This rescaling changes the variance structure so that the effective attenuation factor becomes (\lambda_{\text{latent}} = 1/(1+\sigma_U^2)). Although less severe than the classical case, the bias remains non‑negligible.
The authors then critique two widely‑used correction strategies. First, instrumental‑variables (IV) or two‑stage least squares (2SLS) assume an external instrument (Z) that is correlated with the true (X) but uncorrelated with the measurement error (U). Applied to standardized latent scores, IV over‑corrects because the variance inflation that IV is designed to remove has already been eliminated by the identification rescaling. Consequently, IV estimates are typically too large, sometimes dramatically so.
Second, the Method of Composition (MOC) draws latent values from their posterior distribution, fits the outcome regression for each draw, and aggregates the resulting coefficient draws. MOC treats the latent draws as independent of the outcome, effectively assuming (p(X|W,Y)=p(X|W)). This ignores the information that (Y) provides about (X) when the true effect is non‑zero, leading to further attenuation rather than correction. Empirical work that has relied on MOC therefore often reports coefficients that are even closer to zero than the naïve OLS estimates.
To solve these issues, the paper proposes a split‑sample, correlation‑based correction. The full set of indicators (W) is randomly divided into two disjoint subsets (W_1) and (W_2). Each subset yields an independent estimate of the latent trait, (\tilde X_1) and (\tilde X_2), which are then standardized to (\hat X_1) and (\hat X_2). The sample correlation (\hat\rho = \text{Corr}(\hat X_1,\hat X_2)) provides a consistent estimator of the attenuation factor because, under the identification restriction, (\hat\rho = 1/(1+\sigma_U^2)). The corrected slope is then simply
\
Comments & Academic Discussion
Loading comments...
Leave a Comment