This paper derives `Scaling Laws for Economic Impacts' -- empirical relationships between the training compute of Large Language Models (LLMs) and professional productivity. In a preregistered experiment, over 500 consultants, data analysts, and managers completed professional tasks using one of 13 LLMs. We find that each year of AI model progress reduced task time by 8%, with 56% of gains driven by increased compute and 44% by algorithmic progress. However, productivity gains were significantly larger for non-agentic analytical tasks compared to agentic workflows requiring tool use. These findings suggest continued model scaling could boost U.S. productivity by approximately 20% over the next decade.
Between the release of GPT-2 in 2019 and the frontier models of 2025, the amount of compute used to train large language models (LLMs) increased by approximately four orders of magnitude. The machine learning literature has derived remarkably consistent "scaling laws" from this explosion in resources, demonstrating that model performance-measured by cross-entropy loss-improves as a predictable power law of compute, data, and parameter size (Kaplan et al., 2020). Yet, for economists and policymakers, the critical question remains unanswered: how does a reduction in a model's mathematical loss translate into tangible economic productivity? While the technological frontier is advancing rapidly, we possess little rigorous evidence on the elasticity of human professional output with respect to these model capabilities.
To address this, we conducted a large-scale randomized controlled trial (RCT) involving over 500 professionals across three high-skill domains: management, data analysis, and consulting. Participants were tasked with completing complex workflows designed to be representative of their professions-ranging from strategic report writing and statistical hypothesis testing to tasks requiring multi-step tool use such as creating presentation slides or Gantt charts. Workers were randomly assigned to a control group or to a treatment group equipped with one of thirteen different LLMs, spanning various compute scales and release dates. We utilized high-powered incentives, including bonus payments that doubled base earnings for high-quality submissions as evaluated by expert peer graders. Our primary contribution is the derivation of “Scaling Laws for Economic Impacts”-quantifying the relationship between model inputs and professional productivity. We decompose the progress of frontier AI into two distinct factors: the scaling of training compute and algorithmic innovation (e.g., architectural improvements and better training data). First, we identify a robust “calendar-time” scaling effect-which captures the aggregate of both factors-where each year of frontier model progress is associated with an 8% reduction in task completion time (p < 0.05). Second, isolating the effect of scale, we find that a tenfold (10x) increase in model training compute is associated with a 6.3% reduction in time taken. When decomposing these gains, we find that approximately 44% of the observed improvement is attributable to algorithmic progress over time, while the remainder is driven by pure compute scaling.
We further establish the baseline “AI premium” by pooling all treatment groups. Access to any AI model increased base Earnings Per Minute (EPM) by 81.3% (p = 0.001) and raised expert-assessed quality by 0.34 standard deviations. The compounding effects of speed and quality resulted in an 146% increase in Total Earnings Per Minute (TEPM), inclusive of performance bonuses with almost equal contributions for this increase coming from greater speed (52.6%) and higher quality (47.4%).
However, we uncover significant heterogeneities in these gains across task types. While AI assistance delivered total earnings gains of $1.58 per minute on non-agentic, analytical tasks, the gain fell to just $0.34 per minute for “agentic” tasks requiring multi-step interactions with external tools-a disparity significant at the 5% level (p = 0.043). This suggests that while current scaling paradigms are rapidly commoditizing analytical cognition, the productivity frontier for tasks requiring procedural agency currently remains significantly more resistant to automation 1 .
Next, we investigate scaling laws for quality. We find a striking divergence: while the quality of autonomous model output scales linearly with compute, the quality of human-assisted output remains stagnant across model generations. This implies that human users effectively cap the realized capabilities of frontier models, satisfying to a fixed quality threshold rather than maximizing the tool’s potential.
Finally, we utilize these experimental elasticities within an aggregate growth framework (Acemoglu, 2024). We estimate that continued model scaling could boost U.S. productivity by approximately 20% over the next decade, assuming marginal costs of inference remain low. This figure significantly exceeds prior conservative estimates by explicitly incorporating the dynamic gains from predictable advancements in model compute, rather than treating AI capabilities as fixed.
The remainder of the paper is organized as follows. Section 2 reviews the related literature and Section 3 then outlines the experimental methodology. Section 4 presents the experimental results, establishing the baseline AI pre-mium, deriving the economic scaling laws, and decomposing the drivers of productivity growth. Section 5 utilizes these elasticities to estimate aggregate productivity gains for the U.S. economy and Section 6 then concludes. Regression outputs for all figures shown as well as supple
This content is AI-processed based on open access ArXiv data.