Context Engineering for AI Agents in Open-Source Software
GenAI-based coding assistants have disrupted software development. The next generation of these tools is agent-based, operating with more autonomy and potentially without human oversight. Like human developers, AI agents require contextual information to develop solutions that are in line with the standards, policies, and workflows of the software projects they operate in. Vendors of popular agentic tools (e.g., Claude Code) recommend maintaining version-controlled Markdown files that describe aspects such as the project structure, code style, or building and testing. The content of these files is then automatically added to each prompt. Recently, AGENTS$.$md has emerged as a potential standard that consolidates existing tool-specific formats. However, little is known about whether and how developers adopt this format. Therefore, in this paper, we present the results of a preliminary study investigating the adoption of AI context files in 466 open-source software projects. We analyze the information that developers provide in AGENTS$.$md files, how they present that information, and how the files evolve over time. Our findings indicate that there is no established content structure yet and that there is a lot of variation in terms of how context is provided (descriptive, prescriptive, prohibitive, explanatory, conditional). Our commit-level analysis provides first insights into the evolution of the provided context. AI context files provide a unique opportunity to study real-world context engineering. In particular, we see great potential in studying which structural or presentational modifications can positively affect the quality of the generated content.
💡 Research Summary
The paper investigates the emerging practice of “AI context files” that are used to provide large language model (LLM)‑based coding agents with project‑specific information such as architecture, coding conventions, build and test commands, and contribution guidelines. While early generative AI tools like GitHub Copilot and ChatGPT have focused on code and test generation, the next generation—agentic tools such as Anthropic’s Claude Code—operate with a higher degree of autonomy and can execute commands, edit files, and pursue goals without constant human supervision. To function effectively, these agents need a persistent, machine‑readable source of contextual data, analogous to a README but intended for AI consumption.
The authors focus on the AGENTS.md format, a tool‑agnostic convention that consolidates various vendor‑specific formats (e.g., CLAUDE.md, copilot‑instructions.md, GEMINI.md). They ask three research questions: (RQ1) How widely are AI context files adopted in open‑source software (OSS)? (RQ2) What information do developers provide and how is it structured? (RQ3) How do these files evolve over time?
Data collection began with a systematic sampling of GitHub repositories using the SEART search engine. The authors filtered out forks, archived or disabled repos, required at least two contributors, a valid OSI‑approved license, and a minimum activity threshold (commits since June 2024, at least 271 commits, and at least 7 watchers). After multiple filtering steps, they arrived at a curated set of 10 000 repositories that balance popularity (stars, watchers, contributors) and maturity (commit count, age, lines of code). Each repository was cloned and scanned for the four known AI context file types. Only 466 repositories (5 %) contained at least one such file, indicating that the practice is still nascent. Language distribution among the adopters mirrors the overall sample, with TypeScript, Go, Python, and C# being the most common.
For RQ2, the authors isolated the 155 repositories that contained AGENTS.md files (excluding those created before the format’s official introduction in January 2025). They extracted all Markdown headings, normalized them (lower‑casing, removing special characters, lemmatization), and grouped semantically similar headings. After manual coding, 44 distinct headings that appeared in at least three repositories and at least three times at level 1 or 2 were categorized into ten thematic groups: Conventions, Contribution guidelines, Architecture/structure, Build commands, Goals/purposes, Test execution, Metadata, Test strategy, Tech stack, and Setup. The most frequent categories were coding conventions/best practices, contribution processes, and architectural overviews; security or troubleshooting sections were rare.
A deeper dive into the “Conventions” category (the most common) revealed five stylistic dimensions: descriptive (simply stating existing practices), prescriptive (imperative instructions), prohibitive (explicit bans), explanatory (providing rationale), and conditional (rules that apply only under certain circumstances). Examples include “This project uses the Linux Kernel Style Guideline” (descriptive), “Follow the existing code style and conventions” (prescriptive), “Never commit directly to the main branch” (prohibitive), “Avoid hard‑coded waits to prevent timing issues in CI environments” (explanatory), and “If you need reflection, use ReflectionUtils APIs” (conditional). This diversity shows that developers are still experimenting with how best to communicate expectations to autonomous agents.
For RQ3, commit histories of the 155 AGENTS.md files were examined. Half of the files had never been changed; 23 % were modified once, and 21 % between two and seven times. The authors focused on the ten files with ten or more commits (6 % of the sample), yielding 169 commits for detailed annotation. Modification patterns varied widely: some files saw rapid bursts of changes (e.g., adding new test strategies), others showed gradual evolution aligned with project milestones such as adding new build tools or updating contribution policies. The analysis suggests that AI context files evolve in response to concrete project changes, reinforcing the need for maintainability.
Overall, the study concludes that AI context engineering is an emerging practice with no established standards. The observed variability in content, structure, and writing style indicates that the community has not yet converged on a canonical schema. The authors propose three avenues for future work: (1) empirically measuring how different structures and styles affect the quality of generated code or other agent outputs; (2) developing automated validation and formatting tools to enforce consistency across projects; and (3) designing a meta‑schema that can accommodate multiple vendor‑specific extensions while preserving tool‑agnostic interoperability. By addressing these gaps, the research community can help turn AI context files from an experimental artifact into a reliable engineering asset that improves collaboration between human developers and autonomous coding agents.
Comments & Academic Discussion
Loading comments...
Leave a Comment