The purpose of an estimator is what it does: Misspecification, estimands, and over-identification

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In over-identified models, misspecification – the norm rather than exception – fundamentally changes what estimators estimate. Different estimators imply different estimands rather than different efficiency for the same target. A review of recent applications of generalized method of moments in the American Economic Review suggests widespread acceptance of this fact: There is little formal specification testing and widespread use of estimators that would be inefficient were the model correct, including the use of “hand-selected” moments and weighting matrices. Motivated by these observations, we review and synthesize recent results on estimation under model misspecification, providing guidelines for transparent and robust empirical research. We also provide a new theoretical result, showing that Hansen’s J-statistic measures, asymptotically, the range of estimates achievable at a given standard error. Given the widespread use of inefficient estimators and the resulting researcher degrees of freedom, we thus particularly recommend the broader reporting of J-statistics.

💡 Research Summary

The paper “The purpose of an estimator is what it does: Misspecification, estimands, and over‑identification” argues that in over‑identified econometric models the usual assumption of correct specification is rarely justified, and that misspecification fundamentally changes what an estimator actually estimates. Rather than viewing different estimators as merely having different efficiencies for the same target parameter, the authors treat each estimator as converging to its own pseudo‑true value when the model is misspecified.

To motivate this perspective, the authors surveyed 36 papers published in the American Economic Review between 2020 and 2024 that employed the Generalized Method of Moments (GMM) or related moment‑based techniques; 22 of these were over‑identified. They found that a large majority of authors deliberately used only a subset of the available moments, applied non‑optimal weighting matrices (14 papers reported no efficiently‑weighted specifications), and rarely reported Hansen’s J‑statistic or formal over‑identifying‑restriction tests (only three papers did so). Instead, many authors relied on informal “eyeball” diagnostics such as plotting the gap between data‑implied and model‑implied moments. This empirical pattern suggests that researchers often accept that the underlying structural model is only an approximation and therefore focus on obtaining the “least‑bad” estimate rather than on achieving statistical efficiency under a correctly specified model.

The paper formalizes the distinction between an econometric model (M) (the set of admissible ((\theta,P)) pairs defined by economic theory and structural assumptions) and a statistical model (\mathcal P) (the set of data‑generating distributions). Misspecification can be of two types: (i) econometric misspecification, where the true ((\theta,P)) lies outside (M) but may still belong to a larger nesting model (M^{*}); and (ii) statistical misspecification, where the true distribution (P) does not satisfy the moment conditions at any parameter value. The authors emphasize that many parameters of interest in economics (counterfactuals, welfare measures, policy effects) remain well‑defined even when the structural model is misspecified, which justifies studying estimators under misspecification.

A central theoretical contribution is a new result linking Hansen’s J‑statistic to the range of estimates that can be obtained with a given standard error when the model is locally misspecified (i.e., the degree of misspecification shrinks at the (\sqrt{n}) rate). They prove that, asymptotically, the set of admissible estimates forms an interval centered on the efficient GMM estimator, with half‑width proportional to (\sqrt{2J}) times the standard error. Consequently, a large J‑statistic signals a wide “weight‑hacking” window: a researcher can choose alternative weighting matrices to shift the point estimate by up to (\sqrt{J}) standard errors while preserving the same nominal t‑statistic. This reframes the J‑statistic from a mere test of over‑identifying restrictions to a diagnostic of researcher degrees of freedom.

Based on these insights, the authors issue four practical recommendations for empirical work with over‑identified moment‑condition models:

Separate diagnostics for economic vs. statistical specification. Test whether the structural assumptions correctly link the parameter of interest to the data distribution (economic specification) and whether the chosen moments adequately capture the data distribution (statistical specification).
Explicitly state the estimand implied by the chosen estimator. When using non‑optimal weighting or a hand‑picked subset of moments, researchers should explain why the resulting pseudo‑true value is preferred over the efficient GMM target.
Employ misspecification‑robust standard errors. In over‑identified settings, conventional GMM standard errors are invalid under misspecification; alternatives such as heteroskedasticity‑and‑autocorrelation‑consistent (HAC) formulas or bootstrap methods should be used.
Report J‑statistics even when not conducting formal J‑tests. Since J captures the potential range of results obtainable through alternative weighting, its disclosure informs readers about the robustness of the findings and the extent of possible “weight hacking.”

Overall, the paper bridges a gap between econometric theory and practice by treating misspecification as the norm, clarifying that the purpose of an estimator is defined by its actual limiting behavior, and providing tools—especially the reinterpretation of J‑statistics—to make empirical research more transparent and less vulnerable to researcher degrees of freedom.

The purpose of an estimator is what it does: Misspecification, estimands, and over-identification

💡 Research Summary

Comments & Academic Discussion

Leave a Comment