Payrolls to Prompts: Firm-Level Evidence on the Substitution of Labor for AI

Payrolls to Prompts: Firm-Level Evidence on the Substitution of Labor for AI
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Generative AI has the potential to transform how firms produce output. Yet, credible evidence on how AI is actually substituting for human labor remains limited. In this paper, we study firm-level substitution between contracted online labor and generative AI using payments data from a large U.S. expense management platform. We track quarterly spending from Q3 2021 to Q3 2025 on online labor marketplaces (such as Upwork and Fiverr) and leading AI model providers. To identify causal effects, we exploit the October 2022 release of ChatGPT as a common adoption shock and estimate a difference-in-differences model. We provide a novel measure of exposure based on the share of spending at online labor marketplaces prior to the shock. Firms with greater exposure to online labor adopt AI earlier and more intensively following the shock, while simultaneously reducing spending on contracted labor. By Q3 2025, firms in the highest exposure quartile increase their share of spending on AI model providers by 0.8 percentage points relative to the lowest exposure quartile, alongside significant declines in labor marketplace spending. Combining these responses yields a direct estimate of substitution: among the most exposed firms, a $1 decline in online labor spending is associated with approximately $0.03 of additional AI spending, implying order-of-magnitude cost savings from replacing outsourced tasks with AI services. These effects are heterogeneous across firms and emerge gradually over time. Taken together, our results provide the first direct, micro-level evidence that generative AI is being used as a partial substitute for human labor in production.


💡 Research Summary

This paper provides the first direct, firm‑level evidence that generative artificial intelligence (AI) is being used as a partial substitute for outsourced human labor. Using detailed expense data from Ramp, a large U.S. expense‑management platform, the authors track quarterly spending on two categories from the third quarter of 2021 through the third quarter of 2025: (1) online labor marketplaces such as Upwork, Fiverr, Toptal, PeoplePerHour, Arc, MarketplaceHire and Catalant, and (2) AI model providers, specifically OpenAI and Anthropic. The dataset captures both card transactions and ACH transfers, allowing precise identification of merchant‑level spend.

The identification strategy treats the public launch of ChatGPT in October 2022 as an exogenous shock that dramatically raised awareness of AI tooling. The authors exploit variation in firms’ pre‑shock exposure, measured as the share of total spend allocated to online labor marketplaces in Q2 2022, before ChatGPT’s release. Firms are grouped into four quartiles based on this exposure share, creating a dosage variable that reflects the incentive to explore AI (higher pre‑shock labor spend implies larger potential cost savings from AI).

A two‑way fixed‑effects difference‑in‑differences (DiD) model is estimated separately for the share of spend on online labor marketplaces (OLM) and the share of spend on AI model providers (AI). The specification includes firm fixed effects (α_i) and quarter fixed effects (γ_t) to control for time‑invariant firm characteristics and common macro trends. The key coefficient δ_k captures the quarter‑specific treatment effect of the interaction between post‑ChatGPT periods and the exposure quartile. Standard errors are clustered at the firm level.

Results show a clear, monotonic relationship between pre‑shock labor exposure and post‑shock AI adoption. Firms in the highest exposure quartile (≥ 75 % of spend on online labor in Q2 2022) increase their AI spend share by an absolute 0.8 percentage points by Q3 2025 relative to the lowest quartile, a sizable shift given the overall AI spend share of 2.85 % in that quarter. Simultaneously, the same high‑exposure firms reduce their OLM spend share by roughly 15 percentage points compared with low‑exposure firms. The middle quartiles also exhibit significant declines in OLM spending (about 2 pp for the 50‑75 % quartile).

To quantify the substitution rate, the authors take the ratio of the AI‑spending coefficient to the OLM‑spending coefficient (δ_AI / δ_OLM) for each quartile and bootstrap the ratio to obtain confidence intervals. The estimated ratio for the highest‑exposure quartile implies that a $1 reduction in OLM spend is associated with a $0.03 increase in AI spend. This translates into an order‑of‑magnitude cost saving (approximately 20‑25 × cheaper) when firms replace outsourced tasks with AI services.

The paper situates its contribution within a growing literature on AI’s labor market effects. Prior work has largely relied on occupational exposure indices, wage and employment data, or job‑posting trends on freelance platforms. Those studies document heterogeneous impacts—skill‑biased demand shifts, modest wage effects, and reductions in postings for tasks like writing or coding. By contrast, this study directly observes firm‑level spending decisions, allowing a concrete measurement of the marginal cost of AI substitution for labor.

Limitations are acknowledged. First, the AI provider sample is restricted to OpenAI and Anthropic, omitting other major vendors (Google, Microsoft, Meta) due to identification challenges, potentially under‑estimating total AI adoption. Second, the analysis focuses exclusively on external freelance spend; internal labor costs, consulting, or software development expenditures are not captured, so the measured substitution is a lower bound of total labor‑AI substitution. Third, a pre‑trend difference exists: the highest‑exposure firms already spent more on OLM before the shock, raising concerns about parallel‑trend validity. Although the post‑shock divergence is large and persistent, additional robustness checks (e.g., synthetic control, event‑study with leads) would strengthen causal claims.

In sum, the study demonstrates that the ChatGPT shock spurred firms with high reliance on online freelance labor to adopt generative AI services rapidly, leading to measurable reductions in outsourced labor spend. The estimated substitution elasticity (‑0.03) provides policymakers and managers with a concrete benchmark for evaluating AI’s cost‑effectiveness as a labor‑saving technology. Future research could extend the analysis to broader AI ecosystems, internal labor categories, and longer‑run productivity outcomes.


Comments & Academic Discussion

Loading comments...

Leave a Comment