Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests

Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

As software engineering moves toward SE3.0, AI agents are increasingly used to carry out development tasks and contribute changes to software projects. It is therefore important to understand the extent of these contributions and how human developers review and intervene, since these factors shape the risks of delegating work to AI agents. While recent studies have examined how AI agents support software development tasks (e.g., code generation, issue resolution, and PR automation), their role in documentation tasks remains underexplored-even though documentation is widely consumed and shapes how developers understand and use software. Using the AIDev, we analyze 1,997 documentation-related pull requests (PRs) authored by AI agents and human developers, where documentation PRs are those that create or modify project documentation artifacts. We find that AI agents submit substantially more documentation-related PRs than humans in the studied repositories. We further observe that agent-authored documentation edits are typically integrated with little follow-up modification from humans, raising concerns about review practices and the reliability of agent-generated documentation. Overall, while AI agents already contribute substantially to documentation workflows, our results suggest concerns for emerging challenges for documentation quality assurance and human-AI collaboration in SE3.0.


💡 Research Summary

This paper investigates the emerging role of autonomous AI agents in software documentation within the SE 3.0 paradigm, where agents act as teammates capable of proposing and merging changes via pull requests (PRs). While prior work has examined agent contributions to code‑centric activities, the authors note a gap concerning documentation—a critical yet fragile artifact. Using the AIDev dataset, they extract a focused subset of 1,997 documentation‑related PRs, comprising 1,478 agent‑authored and 519 human‑authored PRs from repositories with more than 500 stars to ensure comparable popularity. Detailed commit‑level data are retrieved via the GitHub API, and documentation files are identified through a heuristic based on extensions (.md, .txt) and path tokens (/docs/, README).

The study is structured around two research questions. RQ1 (Prevalence) quantifies how many documentation PRs are generated by agents versus humans and examines file‑level collaboration. Results show that agents produce roughly three times more documentation PRs than humans. At the file level, 66.1 % of changed files are edited exclusively by agents, 30.2 % exclusively by humans, and only 3.7 % are co‑edited. Notably, 29.0 % of agent‑authored “documentation‑related” PRs actually modify only non‑documentation files, indicating a mismatch between dataset labeling and real edits. Approximately half of the PRs from both agents (48.7 %) and humans (47.8 %) touch documentation files exclusively; the remainder include non‑doc files, creating mixed‑scope PRs that are known to increase reviewer error rates.

RQ2 (Integration) assesses how often agent‑generated documentation changes are retained after human review. The authors isolate 119 instances where a human commit follows an agent commit on the same file. They compare lines added by the agent with lines deleted by the subsequent human commit. In 85.7 % of cases, agent additions equal or exceed human deletions; the mean retention of added lines is 86.8 % (median 98.7 %). Moreover, 34.5 % of the cases show zero human deletions after agent additions, suggesting that many agent contributions are merged with minimal scrutiny. Only 14.3 % of cases exhibit human deletions surpassing agent additions, indicating some level of corrective review.

The discussion highlights two main risks. First, the high acceptance rate combined with limited human follow‑up raises concerns about the reliability of agent‑generated documentation, potentially propagating outdated or incorrect information. Second, current PR review practices appear insufficient for monitoring agent contributions, implying a need to redesign review workflows and possibly augment automated tools with targeted human oversight. The authors acknowledge threats to validity: external validity is limited by the AIDev sample, which excludes repositories without agents; internal validity may be affected by the simplistic file‑classification heuristic and by measuring “limited follow‑up” solely via line deletions, ignoring human additions, modifications, or review comments.

To support reproducibility, the authors release the derived dataset and all analysis scripts. They propose future work that delves into qualitative aspects of review (comments, approval patterns), refines documentation file identification, and explores mechanisms for accountable human‑AI collaboration in documentation quality assurance. Overall, the paper provides the first large‑scale empirical evidence that AI agents now dominate documentation PR volume in SE 3.0, but that their contributions often bypass thorough human validation, highlighting an emerging challenge for software engineering practice.


Comments & Academic Discussion

Loading comments...

Leave a Comment