Beyond the Commit: Developer Perspectives on Productivity with AI Coding Assistants

Beyond the Commit: Developer Perspectives on Productivity with AI Coding Assistants
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Measuring developer productivity is a topic that has attracted attention from both academic research and industrial practice. In the age of AI coding assistants, it has become even more important for both academia and industry to understand how to measure their impact on developer productivity, and to reconsider whether earlier measures and frameworks still apply. This study analyzes the validity of different approaches to evaluating the productivity impacts of AI coding assistants by leveraging mixed-method research. At BNY Mellon, we conduct a survey with 2989 developer responses and 11 in-depth interviews. Our findings demonstrate that a multifaceted approach is needed to measure AI productivity impacts: survey results expose conflicting perspectives on AI tool usefulness, while interviews elicit six distinct factors that capture both short-term and long-term dimensions of productivity. In contrast to prior work, our factors highlight the importance of long-term metrics like technical expertise and ownership of work. We hope this work encourages future research to incorporate a broader range of human-centered factors, and supports industry in adopting more holistic approaches to evaluating developer productivity.


💡 Research Summary

This paper investigates how AI‑powered coding assistants, exemplified by GitHub Copilot, affect developer productivity. The authors adopt a sequential exploratory mixed‑methods design, first administering a large‑scale survey to 2,989 engineers at BNY Mellon and then conducting 11 semi‑structured interviews with developers of varying seniority, role, and business unit.

The survey, built on the Developer Experience (DX) framework, asks two core questions: (1) overall satisfaction with the AI assistant and (2) perceived weekly time saved. Results show a high satisfaction rate (≈86 % “satisfied” or “very satisfied”), but modest time‑saving estimates, with the majority reporting less than two hours per week. A fine‑grained cross‑tabulation reveals only a weak correlation between satisfaction and perceived time savings, suggesting that a single metric cannot capture the full impact of AI tools.

To explore richer dimensions, the authors interview 11 developers selected through purposive and snowball sampling to ensure diversity in career stage (early, mid‑career, management), technical focus (backend, frontend, full‑stack), and functional area (customer products, platform engineering, data). Interviews probe participants’ background, typical use cases of Copilot, and their conceptualizations of productivity. The qualitative analysis yields six distinct factors that shape productivity when AI assistants are used:

  1. Self‑sufficiency – AI can boost a developer’s ability to solve problems independently, yet over‑reliance may erode one’s own coding skills.
  2. Frustration & Cognitive Load – Incorrect or out‑of‑context suggestions increase mental effort and interrupt workflow, reducing net gains.
  3. Task Completion Rate – For repetitive boiler‑plate tasks AI dramatically speeds up completion, whereas for complex design work it may add verification overhead.
  4. Ease of Peer Review – Consistently styled AI‑generated code can lower review effort; conversely, non‑standard suggestions raise reviewers’ comprehension cost.
  5. Technical Expertise – In the long term AI can flatten learning curves for rapid prototyping but may also limit deep expertise development, a dimension largely absent from existing productivity frameworks.
  6. Ownership of Work – When AI writes substantial code, developers may feel less personal ownership, potentially harming motivation and job satisfaction.

These factors extend beyond traditional quantitative metrics such as lines‑of‑code per time, defect density, or deployment frequency, and they highlight the importance of long‑term, human‑centric outcomes. Notably, “technical expertise” and “ownership” are absent from widely used frameworks like SPACE or DORA, underscoring a gap in current evaluation practices for AI‑augmented development.

The authors argue that AI coding assistants’ inherent non‑determinism and broad task coverage demand a multidimensional productivity measurement approach. They propose a three‑axis framework: (1) short‑term efficiency (time saved, code completion quality), (2) cognitive cost (frustration, mental load), and (3) long‑term human capital (skill growth, sense of ownership).

From an industry perspective, the paper offers practical guidance: organizations should combine objective telemetry (e.g., suggestion acceptance rates, auto‑completion frequency) with subjective surveys and interviews to construct comprehensive KPIs. Monitoring long‑term indicators of expertise development and ownership can inform talent‑development strategies and ensure AI tools augment rather than diminish developer growth. Additionally, establishing feedback loops to improve suggestion relevance and providing training on effective human‑AI collaboration can mitigate cognitive overload.

In conclusion, this study demonstrates that evaluating AI‑assisted developer productivity requires more than a single efficiency metric. By integrating large‑scale survey data with deep qualitative insights, the authors identify six nuanced factors that together capture both immediate workflow effects and longer‑term professional impacts. Their findings lay a foundation for future research on AI‑augmented software engineering and for practitioners seeking holistic, human‑centered evaluation of AI coding assistants.


Comments & Academic Discussion

Loading comments...

Leave a Comment