AI Assisted Economics Measurement From Survey: Evidence from Public Employee Pension Choice
We develop an iterative framework for economic measurement that leverages large language models to extract measurement structure directly from survey instruments. The approach maps survey items to a sparse distribution over latent constructs through what we term a soft mapping, aggregates harmonized responses into respondent level sub dimension scores, and disciplines the resulting taxonomy through out of sample incremental validity tests and discriminant validity diagnostics. The framework explicitly integrates iteration into the measurement construction process. Overlap and redundancy diagnostics trigger targeted taxonomy refinement and constrained remapping, ensuring that added measurement flexibility is retained only when it delivers stable out of sample performance gains. Applied to a large scale public employee retirement plan survey, the framework identifies which semantic components contain behavioral signal and clarifies the economic mechanisms, such as beliefs versus constraints, that matter for retirement choices. The methodology provides a portable measurement audit of survey instruments that can guide both empirical analysis and survey design.
💡 Research Summary
The paper introduces an iterative framework that harnesses large language models (LLMs) to extract latent measurement structures directly from survey instruments and then subjects those structures to rigorous out‑of‑sample econometric validation. Traditional survey measurement often forces a one‑to‑one mapping between items and constructs, which can lead to construct contamination when a single question taps multiple mechanisms such as beliefs, constraints, or knowledge. The authors propose a “soft mapping” approach: an LLM processes each item stem (and response options where relevant) and produces a sparse weight matrix W, where each item loads on a simplex of K sub‑dimensions rather than a single construct. Respondent‑level scores are then computed by aggregating harmonized responses with these weights.
Crucially, the framework does not treat the LLM output as ground truth. Instead, it embeds the proposed taxonomy into an iterative loop that evaluates each sub‑dimension’s incremental predictive value for key outcomes using held‑out data. The authors apply two diagnostics: (1) incremental out‑of‑sample validity tests, which retain a sub‑dimension only if it improves prediction of the target variable (here, willingness to switch to a defined‑contribution plan and the required employer contribution rate), and (2) discriminant‑validity diagnostics that flag high correlations among ostensibly distinct constructs, indicating overlap or contamination. When diagnostics signal problems, a targeted refinement operator modifies the taxonomy—either by merging overlapping dimensions, tightening weight constraints, or discarding low‑value items—and the process repeats until performance gains plateau and overlap stabilizes.
The methodology is demonstrated on a large‑scale public‑employee pension survey originally analyzed by Giesecke and Rauh (2022). Starting from the raw question stems, the LLM proposes 25 sub‑dimensions; after iterative validation, only 12 survive as stable, predictive constructs. The strongest signals arise from sub‑dimensions related to tenure and career‑stage lock‑in, which substantially improve prediction of both plan‑switch acceptance and contribution‑rate requirements. Financial‑literacy items contribute meaningfully to the acceptance decision but add little incremental information for contribution‑rate prediction, echoing prior findings but now validated through systematic out‑of‑sample testing. The diagnostics also uncover a high correlation between literacy and perceived plan generosity, prompting a refinement that separates objective knowledge from subjective value judgments; this improves discriminant validity without inflating the taxonomy.
Key contributions include: (1) a portable, instrument‑level measurement system that maps items to a sparse distribution over sub‑dimensions, making cross‑loading explicit and auditable; (2) integration of econometric performance criteria directly into the measurement construction loop, ensuring that added flexibility is retained only when it yields stable predictive gains; and (3) a disciplined, repeatable refinement process that uses overlap diagnostics to guide taxonomy adjustments. The authors argue that this approach bridges the gap between AI‑driven text‑as‑data methods and the stringent validation standards required in economic measurement, offering a transparent, replicable pathway to convert high‑dimensional surveys into validated economic metrics. The framework has broad applicability across domains where surveys are used, promising more reliable measurement of latent economic mechanisms and better‑informed policy design.
Comments & Academic Discussion
Loading comments...
Leave a Comment