Toward an Agentic Infused Software Ecosystem

Fully leveraging the capabilities of AI agents in software development requires a rethinking of the software ecosystem itself. To this end, this paper outlines the creation of an Agentic Infused Software Ecosystem (AISE), that rests on three pillars.…

Authors: Mark Marron

T o war d an Ag entic Infused Software Ecosystem Mark Marron mark.marron@uky .edu Univ ersity of K entuck y Lexington, K entuck y, USA Abstract Fully lev eraging the capabilities of AI agents in software de v elop- ment requires a rethinking of the software ecosystem itself. T o this end, this paper outlines the creation of an Agentic Infused Softwar e Ecosystem (AISE), that rests on three pillars. The first, of course, is the AI agents themselv es, which in the past 5 years ha ve mo ved from simple code completion and to ward sophisticated independent de vel- opment tasks, a trend which will only continue. The second pillar is the programming language and APIs (or tools) that these agents use to accomplish tasks, and increasingly , serve as the communication substrate that humans and AI agents interact and collaborate through. The final pillar is the runtime en vironment and ecosystem that agents operate within, and which pro vide the capabilities that programmatic agents use to interface with (and effect actions in) the e xternal world. T o realize the vision of AISE, all three pillars must be advanced in a holistic manner , and critically , in a manner that is synergistic for AI agents as they e xist today 1 , those that will e xist in the future, and for the human dev elopers that work alongside them. 1 Framing the Problem This paper takes the vie w that creating a fully realized agentic infused software ecosystem requires a holistic approach to the entire software stack. Agents provide a po werful mechanism for understanding user intents and suggesting possible actions that, are likely , good candidates for accomplishing a task. Howe ver , these outputs are not strongly grounded and do not directly provide any guarantees about correctness. W ithout support from the ecosystem for discov ering and managing tools, and an en vironment that is designed to safely ex ecute these actions, the practical usability of these systems is highly constrained. T o address these challenges, this work outlines a complete software stack, from the core programming language, to the tooling ecosystem, to the actual runtime en vironment, that is focused to supporting the de velopment and operation of agentic software systems. By co-designing these components with the needs of AI agents in mind, we can create a software ecosystem that is uniquely well suited to the challenges of agentic softw are development, and that can significantly enhance the capabilities of AI agents in this space. Thus, the first action is to outline the core design principals that concepts that guide the decisions and development of the various components of this agentic infused software ecosystem. Explicit Intents and Beha viors A key challenge for current AI LLM based agents is their ability to manage context and to (statistically) guess about possibly rele vant information that is not explicitly provided in the context windo w . In 1 For the near term we expect these to remain primarily Lar ge Language Model LLM Transformer based many cases agents are able to successfully infer this information, but there is a long tail of situations where incorrect assumptions can be made. This compounds with the fact that many programming languages and software ecosystems have a wide range of implicit behaviors and special case semantics which creates a le vel of inci- dental complexity [ 7 , 27 ] in the system. W orking on code with these implicit or unexpected behaviors now inv olves understanding the core intent of the code, and then additionally , explicitly or mentally revie wing a checklist of what implicit behaviors and what special case scenarios may be present. This is a significant cognitiv e load for human developers, particularly when revie wing AI generated code, and a significant source of errors for AI agents as the y must be aware of the full context of a codebase not just the immediate code they are working on. W ith this in mind, a core design concept for AISE is to create an ecosystem where the intent and impact of code is explicitly and concisely encoded in the textual (syntactic) representation of the code and that, whenever possible, corner case semantics are elimi- nated. As we will see, by careful language design choices, we can in fact simultaneously satisfy the somewhat contradictory goals of conciseness and explicitness, and in doing so, produce a software ecosystem that is uniquely well suited to the challenges faced when managing context window sizes and attention degradation. At the same time the pursuit of this design concept leads to a software ecosystem that is resistant to a wide range of error patterns and presents lower friction for human-AI cooperation. Discoverability Context window management and tool discovery are major chal- lenges for current AI agentic systems. There is a tension between explicitly loading anything & everything that may be relev ant to a task into the context window and how aggressiv ely to prune via document retriev al and ranking heuristics. As the number of tools and context that an agent has access to grows, it becomes increas- ingly dif ficult for the agent to effecti vely manage its context and can lead to major drops in agentic performance. This problem is not unique to transformer based AI agents, human dev elopers also face challenges in discov ering (remembering) and managing the tools and information they need to accomplish a task! From this perspective it is clear that a core design concept for AISE is to create an ecosystem that simplifies the discovery of tools and information for both human and AI agents. In addition to merely improving disco verability , we also want the ecosystem to simplify summarization tasks and e xplicitly support progressi ve disclosure. This is closely related to the pre vious design concept of explicit intents and impacts – by making the intent and impact of code explicit, it becomes easier for both human and AI agents to discover which tools and information are relev ant for a giv en task. Mark Marron Consider an API signature wait(duration: Int) vs. the same API in a language that allo ws unit typed [ 22 , 28 ] primiti ve values, wait(duration: MilliSeconds) . The token costs of the two signatures are nearly identical, but the second signature provides critical context e xplicitly in the signature itself while the first requires either the initial inclusion of the documentation in the context windo w , the agent to make a probabilistic guess for the unit of time, or for the agent to take the additional step to read the documentation to discov er this critical information. Mechanize Everything The vision of a “Dark Factory” for software agents is a north-star for full automated Agentic coding 2 . In this vision, the software ecosys- tem is mechanized to the point where agents can independently create entire applications without human intervention, operating merely from high lev el specifications of some form. Recent SotA results in this space hav e either used an existing application as an oracle ( i.e. cloning an existing application) or a full formal specification of the application ( i.e. in Lean [ 23 ] or Dafny [ 10 ]). Howe ver , the assumption of an existing application or a full formal specification 3 presents serious roadblocks to the realization of this vision at a large scale. This highlights the need to provide expressi ve, multi-modal and cooperativ e, mechanisms for specifying user/de veloper intent as a core concept in the AISE thinking. By integrating specification and requirements gathering as first-class parts of the system we can significantly reduce the difficulty of expressing what we want to build . By co-designing the language with a tooling ecosystem we ensure that we can then mechanically and/or cooperativ ely validate that the Agentic system has correctly built what was r equested . First-Class Cooperation The problem of specification and requirements is not just a problem for mechanization, but also a major barrier to ef fective cooperation between human and AI agents. The current SotA in human-AI co- operation in software dev elopment is to hav e a human provide a, mostly normativ e, specification of what they want to build, then an AI agent attempt to b uild it, and then a combination of testing and manual revie w to determine if the agent succeeded. This process is not only inefficient but also error prone, as it relies heavily on the human’ s ability to 1) write a comprehensi ve test suite and 2) perform careful code revie ws, both of which are notoriously monotonous, difficult, and error prone tasks for human de velopers!. Thus a core design concept is to integrate specification and re- quirements deeply into the language and system. This inv olves b uild- ing effecti ve multi-modal specification features into the program- ming language – for both formal and informal specifications – then building tooling to mechanize the v alidation of these specifications and creating workflo ws that provide simple/digestible feedback for a human dev eloper to interact with. 2 The idea of automatic application generation is a perennial one in computer science [ 18 , 20 , 26 ] b ut recent e xcitement in the Agentic space highlights the renewed potential in this space with LLM driv en systems – https://www .danshapiro.com/blog/2026/01/the- fiv e- lev els- from- spicy- autocomplete- to- the- software- factory/ and https://simonwillison.net/ 2026/Feb/7/software- factory/ 3 In practice these formal specification are often as large (or larger) and as complex [ 12 ] as the code itself! F ailure Safety & Resilience Failure is an ine vitable part of software dev elopment, this is true for software written by humans, and ev en more pressing in the context of Agentic software de velopment, where the complexity and unpredictability of the system can lead to a wide range of failure modes. A core design concept for AISE is to create an ecosystem with multi-layered safety and resilience mechanisms. At the language lev el we want to eliminate common sources of errors and bugs, and to make it easier for developers to write correct code. At the system lev el we want to build in mechanisms for sand- boxing resources, monitoring for data exfiltration, and managing fault logging and diagnostics. These mechanisims ensure that certain class of failures are impossible, or if the y do occur, are constrained to safe aborts. Beyond strict logical f ailures, we also consider qual- itativ e beha vior and recovery – e.g . ensuring progress even in the case of failure, identifying resource leaks in workflows, or workflo w failures resulting in inconsistent or un-re vertable states. Contributions (1) W e introduce the concept of a agentic infused software ecosystem and outline the core design concepts that guide the dev elopment of this ecosystem. (2) W e extend B O S Q U E with features for explicit Agent & API interfaces as well as support for multi-modal intent specifi- cation ( Section 3 ). (3) W e describe a mechanized validation tool, S U N D E W , that can be used to v alidate the correctness of AI generated code against specifications and requirements or used online by an Agent to provide formal introspection ( Section 4 ). (4) W e outline a runtime en vironment, M I N T , a HA TEOAS [ 15 ] that provides progressiv e discovery as well as support for safety and resilience when e xecuting agentic softw are sys- tems ( Section 5 ). 2 A Programming Language f or AISE W e begin with a review of the B O S QU E language as introduced in [ 27 ]. As noted by the authors – B O S Q U E is not based on a single big feature, or even a number of small novel features, instead the value comes from a holistic process of simplification and feature selection with a single focus toward what will simplify reasoning about code – for humans, AI agents, and formal systems. At the core of B O S QU E is a let-based functional language that is focused on eliminating the complexities associated with muta- blility , aliasing, inductiv e-inv ariants, and nondeterminism. A simple B O S Q UE program, Figure 1 , provides a flav or of the language. The code implements a simple sign function. This code is very similar to the implementation one would expect in Jav a or T ypeScript – in fact just eliminating the explicit ‘i’ specifier on the literals would make it v alid T ypeScript. This function highlights the use of multiple updates to the same variable and block structured conditional flows. B O S Q U E distin- guishes between variables, let, that are fixed and those, var , that can be updated. In many languages signed numbers hav e asymetric ranges and thus neg ation is unsafe in special cases i.e. -INT_MIN will error or silently wrap. Ho wever , in B O S Q UE the dynamic ranges for signed integers are symmetric, and aligned with their unsigned T oward an Agentic Infused Software Ecosystem function sign(x: Int): Int { var y = 1i; if (x < 0i) { y = -1i; } return y; } Figure 1: Example sign function in B O S Q U E . versions as well, so that negation is always safe and corner case issues with signed/unsigned con versions are eliminated. Another distincti ve feature of B O S QU E is the complete elimina- tion of loops for container processing. Instead, B O S QU E provides a rich set of higher-order functions for operating on collections – similar to Jav a Streams or C# LINQ as shown in Figure 2 . let l = List{1i, 2i, 3i}; l.allOf(pred(x) => x >= 0i) %% true l.map(fn(x) => x + 1i) %% List{2i, 3i, 4i} Figure 2: Eliminating the need for loops using higher-order functor operations in B O S Q U E . The use of higher-order functions allows for a natural, concise, and explicit way to e xpress collection processing. Most directly the representation is strictly more token efficient than the equiv alent loop-based code and we av oid the need for the agent to repeatedly generate common loop inde xing and control flo w , with the associated risks of off-by-one errors, incorrect variable name selection, and in verted conditions, that are regular occurrences in these types of code [ 21 ]. Thus, the use of higher-order functions (and B O S Q UE ) not only supports the goal of token efficiency , but also improv es the probability of successful (correct) code generation! This approach also allo ws for a more direct and explicit e xpres- sion of intent. Again, from the perspective of agentic code generation, the mapping from latent intent to semantically meaningful operation names improves the likelihood of of correct operation selection and, when “reading code” enables the agent to simply focus attention on a single operation name instead of needed to analyze all compo- nents in a multiline looping implementation. Interestingly , human dev elopers also benefit from this increased clarity when reading and revie wing code, as, they can also immediately understand the intent of the code without parsing through control flo w details. For example, the use of allOf makes it clear that the intent is to check if all elements satisfy a condition, which is more explicit than a loop with a conditional check and a break statement. The final feature of B O S Q UE that we highlight here is the ability to easily create strong type aliases and enforce inv ariants on, both aliased and composite, data values. Consider the code in Figure 3 that declares two type aliases, Fahrenheit and ZipCode , and two composite entities, TempRange and TempForecast . The code in Figure 3 shows how B O S Q U E allo ws developers to create strong type aliases that pro vide explicit semantic identity to otherwise primiti ve v alues. This allows the de veloper , or AI agent, to express intent more clearly . For example, using Fahrenheit type Fahrenheit = Int; type ZipCode = CString of /[0-9]{5}('-'[0-9]{4})/c; entity TempRange { field low: Fahrenheit; field high: Fahrenheit; invariant $low <= $high; } entity TempForecast { field location: ZipCode; field temp: TempRange; } Figure 3: Examples of types aliases and in variants in B O S Q U E . instead of Int makes it clear that the value represents a temperature (in Fahrenheit) and prev ents unit-confusion [ 22 ] or argument con- fusion bugs [ 40 ]. As we will see in Section 5.1 , this is very useful when specifying APIs where communication formats hea vily use primitiv e values. Beyond simple type aliasing, we can also attach explicit inv ari- ants to both aliased and composite data types. In this example, the ZipCode alias uses a regular expression to declare that a ZipCode value is a string and to be valid it must match the specific pattern 4 . Similarly , the TempRange entity declares an in variant that the low field must be less than or equal to the high field. These in variants can be checked at compile time or later as dynamic checks or in the S U N D E W v alidator . This ability to explicitly specify intent and inv ariant properties as part of the type system greatly enhances both the ability to reason about code and also to catch errors early in traditional dev elopment. It also provides a po werful mechanism for AI agents to generate code that is more likely to be correct, as the in variants can be used to guide both the initial generation, by making the constraints explicit for the LLM, and also used to provide feedback with failures of test-cases or static analysis systems. The validation of string structures is particularly important when APIs and data formats heavily use strings to represent structured data. T racking the possible data content, including sensitiv e PII or un-sanitized user controlled input, is otherwise a difficult problem that requires reasoning o ver flo ws across the entire program. W ith ov ersights leading to potentially serious security vulnerabilities [ 34 ] like SQL injection or leaks of sensiti ve data. These examples highlight the core design philosophy of the B O S QU E language and how it supports the goals of AISE. By elim- inating complexity and pro viding powerful abstractions, B O S Q U E enables dev elopers and AI agents to write code that is easier to gen- erate & reason about as well as providing multi-layered support for safety & fault detection. 3 Agentic B O S Q U E Fully integrating the B O S QU E language with agents and including them a first class part of the system requires additional extensions. As currently described, B O S Q UE is a general purpose programming 4 The regular expression language used by B OS Q UE is a specialized design targeted to eliminating common forms of ReDoS attacks on validation [ 11 ] Mark Marron % ** Transfer amt from payer account to payee account. ** % api transfer(amt: USD, payer: Account, payee: Account) env={ PAYMENT_AUTHORIZATION: OAUTH_TOKEN, PAYMENT_LIMIT: USD } permissions={ \account:${payer.routing}/${payer.account}\ } requires 0.0 < amt; requires amt <= env.PAYMENT_LIMIT || $events.contains(Approve{|payee=payee, amt=amt|}); ; % ** * Given a natural language (plaintext) message, * compute the amount to pay and send to the payee. ** % action splitBill(msg: String, payee: Account) { let amt = agent Chat::compute>( env{}, msg, "What is half of the bill?" ); if(amt === none) { return fail("Could not get amount from message."); } api transfer(env{...}, amt, env.account, payee); ... } Figure 4: Example of an api defintion and an action us- ing api and agent calls in B O S Q U E to perf orm a pay- ment transaction – e.g. splitBill("lunch was $45.50", contacts.get("Tom")) with the expected result is a suc- cessful payment transaction of $22 . 75 to T om’ s account. language and an excellent target for agentic code generation b ut lacks explicit facilities for calling and orchestrating sub-agent workflo ws ( Section 3.1 ) and does not provide support for explicit decomposition and modularity of dev elopment tasks ( Section 3.2 ). 3.1 Explicit api and agent Calls In an agentic infused software ecosystem, we expect agents and deterministic workflo ws to be first class citizens in the system. Thus, we need explicit language features for agents to call workflows, workflo ws to call agents, and agents to call other agents. W e introduce two new language constructs, api and agent , that allow for e xplicit calls to deterministic workflows and agents, respec- tiv ely . The api construct supports calls to deterministic workflows that are defined elsewhere in the system or on remote (RESTful / HA TEO AS) style endpoints 5 . The agent construct allo ws for ex- plicit calls to agents that are defined elsewhere in the system. In theory these constructs could be combined into a single call con- struct b ut this design, with the distinction between them, allo ws us to provide additional language support for the more free-form textual interaction that exists with agent calls. 5 Section 5.1 describes the BAPI system, which provides a powerful framework us- ing B O SQ U E types as a literal interchange format for calls/tools and for progressive discovery in Section 5 . Figure 4 shows an example of these constructs and how they can be used as part of a workflow that takes in a natural language message, say from an email or te xt with a dollar amount, splits the bill in half and then performs a payment transaction. The first step is to use a chatbot AI agent to extract the amount from the message via the Chat::compute operation. The actual implementation of this agent is entirely open, could be remote or local, and could use any number of techniques to extract the amount. The key to this design is the agent signature which allows us to export it into a B O S Q U E program, and then a packaging system which links in the needed module or sets of the remote in vocation 6 . This api ( agent ) in voke statements allo ws concise and explicit descriptions for what an agent should be computing and what it may access. Fundamentally , apis and agents are always operating in some context or en vironment. This context may include ambi- ent information, such as the current location, or in this case of Figure 4 , a limit on spending. The api/agent design allows us to explicitly describe this conte xt as part of the api/agent definition and to specify the exact values provided at in vocation. In this example, the agent is explicitly gi ven an empty en vironment, forbidding it from accessing (and potentially accidentally or maliciously exfil- trating) any information from the ambient context, such as a secret PAYMENT_AUTHORIZATION token. Additionaly , making the agent in vocations explicit enables a simple syntax for con verting un-structured (or semi-structured) out- puts from an AI agent into a structured format. In Figure 4 the agent is asked to compute a value for “half of the lunch bill” from a free-form string, and as it is a Chat agent, it will by default produce a free-form string output. Howe ver , the agent call explic- itly supports providing a result type shaping signature, in this case Option (a type alias of Decimal ). In general this can be accomplished by running the BA P I parser directly on the output of the agent, but if supported by the agent implementation, this can also be used to dri ve structured [ 43 ] (or ev en typed [ 36 ]) output generation without requiring additional data format specifications. Once the amount is extracted, the workflo w uses the transfer API to perform the actual payment transaction. The transfer API is defined with explicit en vironment variables and permissions that are required to perform the transaction, as well as preconditions that must be satisfied for the transaction to be successful. The api specification includes the environment variables that are required for the API to function, such as the PAYMENT_AUTHORIZATION token and the PAYMENT_LIMIT , as well as the resources that the API will access, in this case the payer’ s account information. By explicitly declaring these requirements, we ensure that the agent has the necessary context and also prevent it from accessing any information that is not explicitly pro vided. The API specification also includes preconditions that must be satisfied for the API call to be successful. In our example the api has the precondition that requires 0.0 < amt . Howe ver , in an agentic infused software ecosystem, we also need to be able to express and check temporal properties that inv olve the sequence of ev ents that occur during the execution of a system. In our e xample, we want to ensure that either the payment amount does not exceed 6 This design enables us to make agents modular, versionable, and we can support multiple side-by-side models and personas just as with code packages – i.e. NPM [ 37 ] T oward an Agentic Infused Software Ecosystem a certain limit or that in some previous e vent we obtained explicit user approv al event for the operation. T o handle this we introduce the concept of an event log that is implicitly tracked as part of the ex ecution of the system. This ev ent log can be used to record any ev ents that occur during the execution of the system, such as user interactions, API calls, or agent calls. These ev ents are inserted into an otherwise immutable log by the system, preventing any possibility of tampering byt dev elopers or agents, and this log can then be referenced in the pre/post conditions of API calls to check for temporal properties. In our e xample, we can check if the event log contains an explicit user approv al ev ent for the giv en item. The ability to explicitly express properties these require- ments and conditions, are a critical foundation for later mechanized validation and runtime safety . This explicit information also provides improved discovery , as critical API information is now in a structured and explicit form, which supports improved agent code generation and tool use, as the agent can directly reference the API specification to understand the requirements for successful use. Finally , this system design opens the possibility for more adv anced feedback and learning systems, as the system can provide detailed information on why an API call failed, such as unsatisfied preconditions. This information can further be integrated with validation and reasoning tools into the agent’ s training process via direct reinforcement learning algorithms with tools [ 8 , 14 , 44 ] as well as providing immediate feedback to the agent during training rollout – allowing us to generate action rew ards without a full trajectory . 3.2 Holes and Meta-Thunking A critical problem for agentic code generation is modularity and decomposition. In traditional software dev elopment, dev elopers can break down comple x problems into smaller , more manageable tasks which can then be handed off (or deferred) for later development. This is critical for managing complexity and enabling collaboration. Strangely , programming languages do not, in general, provide ex- plicit language features to support this. Instead, de velopers ha ve to rely on external tools such as issue trackers and markdown docu- ments, or use adhoc patterns such as writ ing incomplete functions with TODO comments and aborts to indicate that a particular task needs to be completed later . This is a major gap in the software dev elopment process, and it becomes even more problematic when we introduce agents into the mix. W e hav e extended the B O S Q UE language with an explicit hole construct [ 5 , 19 , 45 ] that allo ws dev elopers and agents to explicitly indicate that a particular task or piece of code is incomplete and needs to be filled in later . From a syntactic standpoint this provides a clear and explicit way to indicate that a particular task is incomplete, and it also allo ws us to provide additional metadata about the task, such as the e xpected input and output types, the conte xt in which the task should be completed, and any relev ant information that may be helpful for completing the task. From the standpoint of pre-training, this feature also has the potential to support e xplicit specialization of models for agents, architect vs. code generation, as holes are now representable by explicit tokens in the training data. Figure 5 shows an example of using the hole construct in a simple function that computes the sign of an integer . In this example, function sign(x: Int): Int { var y = 1i; if (x < 0i) { y = ?_ -> Int; } return y; } % ** Compute the absolute value for given integer ** % function abs(x: Int): Int ensures $result >= 0i; { ?_absbody(examples = true); } Figure 5: An example of using hole expressions in the sign func- tion to indicate that the implementation of the negative number case is incomplete. The hole expression can be a simple un-named expression, with a desired type, or alternatively , can include a doc-comment, a name, and even information on where to find desired input/output examples. we have implemented the positiv e and zero cases, but the logic to complete the negati ve case left as an expression-hole. By using the hole construct, we can explicitly indicate that this part of the code is incomplete and needs to be completed later – without requiring additional annotations or assertion style hacks. The final part of Figure 5 uses a hole for the entire function body and, using the post-condition, can specify the expected beha vior of the implementation along with the docstring and e xamples for the hole to further specify the desired behavior . This is a powerful way to pro vide explicit specifications for the desired beha vior of the code, and as shown in Section 4 , can be used to symbolically v alidate the behavior of an y generated implementation. This explicit support for modularity and decomposition is critical for agentic code generation, as it allo ws agents to break do wn com- plex problems into smaller , more manageable tasks that can then be completed later , either by the same agent or by a different agent. It also provides a standard structure for embedding additional meta- information about the task which enables impro ved discovery and more ef fectiv e code generation. The direct integration also enables integration into de velopment tools and runtimes. F or example, we can trivially link in the concept of Pr or ogued Pr ogramming [ 1 ] or and LLM extended v ersion of meta-thunking. In this design, when a hole is executed, instead of throwing an error , the system can automatically trigger a call to a human de vel- oper or agent to complete the hole. In this workflo w , the human (or agent) can manually specify the correct result type, in our case if the value of x is -5 , we would provide -1i . In later executions, the system can use these as memoized results for the hole, and if there is an examples file, can load/store these examples for later reference – as unit tests or guides for an agent during code generation. Mark Marron 4 Mechanized Understanding and V alidation in AISE In contrast to other widely used languages 7 where the semantics are not amenable to mechanized analysis, due to features lik e loops, mu- tability , and non-deterministic behaviors, or languages like Lean [ 23 ] and Dafny [ 10 ] that provide support for full functional verification but require substantial proof engineering expertise, B O S Q U E is de- signed to support fully-mechanized reasoning and validation. 4.1 Mechanized Understanding and V alidation As described by the B O S Q U E developers [ 17 ], the design of B O S Q U E enables us to map it, almost entirely , to efficiently decidable theories supported by a SA T -Module-Theory (SMT) solv er [ 24 ]. Operations on numbers, data-types, and functions all map to core decidable the- ories – Integers, Bitv ectors, Constructors, Uninterpreted Functions, and Interpreted Functions. In SMT solvers the theories of Strings and Sequences are semi-decision procedures in the unbounded case. Howe ver , we can bound the sizes of these kinds as inputs and then the system becomes fully (and efficiently) decidable. As a result validation can be fully automated. There is no need for de velopers to learn an additional proof language, all checks are encoded as assertions, pre/post-conditions, and in variants in the B O S Q U E pro- gramming language, and no manual intervention is required to write lemmas or diagnose proof failures! Giv en this design, consider the sample sign function from Fig- ure 1 and the resulting SMTLib [ 9 ] encoding, Figure 6 . This code is a fully decidable translation encoding of the B O S Q U E version. (define-fun sign ((x Int)) Int (let ((y 1)) (ite (< x 0) (let ((y -1)) y) y) ) ) Figure 6: Sign function automatically con verted into SMTLib . The B O S Q UE language includes builtin support for v arious valida- tion workflo ws using this encoding strategy , including a specialized declaration of a parametric property test. A simple validation test that checks that the sign function returns a value in the range [ − 1 , 1 ] for any input ( x ) is shown in Figure 7 . For a simple property like this the S U N D E W validator can show that the property holds for all inputs and takes 7ms to complete. chktest signRange(x: Int): Bool { let sgn = sign(x); assert -1i <= sgn && sgn <= 1i; } Figure 7: Example validation harness f or sign function retur n range ( ∈ [ − 1 , 1 ] ) in B O S Q U E . The S U N D E W validator is also able to go o ver a small application and check, for each possible runtime or user defined error, that either the error is impossible or that it can be triggered – and also generate a witness input. 7 A notable exception is Morg an-Stanley’ s Morphir [ 17 , 35 ]. 4.2 S U N D E W : W orkflo w V alidation W e can extend this approach to support the event log and associated pre/post conditions on APIs and Agent calls from Section 3.1 . In these systems the event log is simply an implicit parameter that is passed, and from the viewpoint on the SMT encoding is simply a Sequence of ev ents. Consider the example request for an agent to look at the con- tents of msg and determine what “half of the bill” was as sho wn in Figure 4 . This code sho ws a hypothetical agent generated script to accomplish the task. This script uses an LLM agent to process the semi-structured text in the payment request msg to determine the amount to pay , and attempts to transfer the payment. If the Chat::compute action were unlucky 8 , or the lunch was partic- ularly expensiv e, this computed amount could be large enough to exceed the payment limit of the user . The M I N T runtime system ( Section 5 ) will catch this error at runtime, with a precondition failure, b ut we can also statically run symbolic validation on this script to detect that the amt value may exceed the payment limit and that the, otherwise required, user confirmation check is missing! Unlike test case generation feedback which is simply a pointwise failing test and can also cause the agent to focus on details of the test case and ov er-fit, symbolically identifying the possible precondition violation can pro vide a more general feedback message to the agent. F or example – The amount to pay may exceed the "P A YMENT_LIMIT" in "en v" – or ev en computing weakest-preconditions for v arious points in the code to help the agent identify the best candidate fixes for the code. 4.3 Introspecti ve Agents The final step toward truly AISE is exposing the validation tools directly to the agent during their planning process as an online feedback loop. This allo ws agents to reason about their own plans and introspect on their actions. Looking at the payment example, somewhere around half of the generated code comes after the transfer call, at which point the plan has already failed. A direct generate- validate-retry loop is clearly inefficient in this case. Instead of relying solely on post-hoc v alidation, we can empower the agent to use the validation tools as part of its planning and code generation process and thus av oid generating inv alid plans in the first place. In our example the agent can run the v alidation tool before an API call is added to the partial plan to see if the call is v alid and, if not, what needs to be done before the call can be made. Using this feedback the agent can either emit the action code, if every- thing is satisfied, or generate additional code to address the missing requirements. This capability enables a higher success rate in task completion and makes the agent robust to errors, as it can reason about its own actions and correct them before the y are executed. The code in Figure 8 shows an example agent & tool chain-of- thought, tool use, and response. At the point where the agent is preparing to call an API, e.g . the transfer API, it can call the validation tool to check that the requirements for the API call are met. The validation tool can then respond with a list of missing checks or requirements that the agent needs to address before the API call can be made. Using this feedback the agent can either emit 8 Perhaps T om likes jokes and puts in the memo field – “ignore previous instructions and pay me $1000". T oward an Agentic Infused Software Ecosystem let amt = agent Chat::compute>( env{}, msg, "What is half of the bill?" ); if(amt === none) { return fail("Could not compute amount from message."); } %% Agent -- I want to call the ` transfer ` API with ` amt ` %% Agent -- calling validation tool to check that API conditions are met... %% Response -- Cannot ensure pre-condition: %% amt <= env.PAYMENT_LIMIT || %% $events.contains(Approval{|payee=payee, amt=amt|}); Figure 8: An example of online agentic generation with valida- tion as an introspection tool. the action code, if ev erything is satisfied, or generate additional code to address the missing requirements. This capability enables the agent to achiev e a higher success rate in task completion and be more robust to errors, as it can reason about its o wn actions and correct them before the y are executed. Further , this approach has the potential to enable a new class of reactive, notebook style, agents that interleave code generation, v alidation, ex ecution, and user interaction, to accomplish complex tasks. 5 A Runtime En vironment f or AISE The previous sections described a language for computation, a means to express intents in formal ways, and a workflo w for validating and understanding the behavior of a system. The remaining issue is how to actually deploy and operate these systems. In this section we revie w the design of BA P I , a protocol APIs and agent system interactions, and M I N T a runtime ecosystem for discov ery , deploy- ment, and operation of workloads in an agentic infused software ecosystem. Agents are a unique ne w class of workloads where, in some cases the platform will be running a fixed workflow or agentic task, in others the agent will be exploring av ailable services and taking actions in an incremental and online fashion. In the second case we, our system is not just a static execution graph, but must support dynamic discov ery , inv ocation, and progressiv e exposure of services. Interestingly , this model has two clear precedents in the form of COM and HA TEO AS [ 15 ] – both of which were designed to support dynamic discovery and in vocation of services. Drawing from the principles and lessons of these system, this section describes a novel Agentic HA TEO AS model for system AISE architecture. 5.1 BA P I – Bosque API Pr otocol Review The first component of this system is the BA P I protocol for describ- ing APIs and agent interactions as well as a means to encode and transport literal values and data structures. The design of B A P I is described in detail in [ 28 ] and we revie w the key features here. As with the B O S Q U E language, the design of BA P I is focused on supporting mechanized correctness and explicitly encoding intents (and key information) into the syntax of API & data definitions. In addition, BA P I provides a literal syntax for encoding data that is % ** Definition of an Order type w/ sensitive TIN ** % type OrderId = CString of /[A-Z][0-9]+$/; sensitive type TIN = CString of /[0-9]{9}$/; entity Order { orderid: OrderId; amount: Decimal; customer: TIN; } % ** Literal Order (Complete Form) ** % Order{ orderid = 'A53', amount = 45.50d, customer = '123456789' } % ** Literal Order (Token Minimized Form) ** % Order{ 'A53', 45.50d, '123456789' } % ** Literal Order (Standard Form w/ Leakage Filter) ** % Order{ 'A53', 45.50d, ' ********* ' } % ** Literal Order (JSON) ** % Order{ "orderId": "A53", "amount": 45.50, "customer": "123456789" } Figure 9: Example of an Order type definition and 3 literal value r epresentation in BA P I , complete, token minimized, and with sensitive leakage filtering applied – plus the automatically generated JSON equivalent r epresentation. more ef ficient (token-wise) than JSON, is fully round-tripable, and designed to be easily authored by humans (or AI agents). BA P I specifications are a extension of the B O S Q UE type system. A simple e xample of a BA P I specification is shown in Figure 9 . The BA P I specification defines a set of types that make up a customer Order . The first declaration is a simple type alias for an OrderId that is a structured string (matching a regular e xpression) while the second declaration is for a sensitive type for a TIN (T axpayer Identification Number). The second declaration shows how BA P I supports information control and monitoring by e xplicitly marking data as sensitive so that later uses and operations on the data can be mechanically checked for compliance or security risks. These two alias are then used in the composite Order type that also includes an amount field. Finally , the example shows v arious BA P I options for serializing literal Order values. All of these forms + parsers and serializers are automatically generated from the type definitions and the choice of which form to use is up to the user (or agent) based on the needs of the scenario. By design they can Mark Marron be intermixed in the same system and are amenable to linear time processing and can be stream/zero-alloc parsed. In the first form, the literal value is fully annotated with type information. This form is ideal for mechanized reasoning and v alida- tion of the data b ut, like JSON, is v erbose and can inv olve massi ve redundancies in property names (or other tags) which increases to- ken load and message sizes. The second form is a token-minimized encoding that relies on the order of fields in the type definition to av oid repeating tags and type information. This form is ideal for efficient encoding and transport of data. The ability to use (and intermix) both forms allows users and agents to work with high-information representations for conv e- nience, or when feeding (small) results of a tool-call into an agent context, while using a token-minimized form for transport, storage, or losslessly compacting large v alues for agent context management. The third form is the same as the first form b ut with a leakage filter automatically applied to the TIN field which was marked sensiti ve. This form is the default for emitting data in scenarios lik e logging or crash dumps. This minimizes the risk of side-channel leakage and, if working in an untrusted agent, provides a means to guarantee sensitiv e information cannot be leaked. The final form is the JSON representation that is automatically generated from the type definition. This allows interop with other systems and tools e ven if they do not support B A PI nativ ely . By design, the JSON form is fully round-trippable with the encodings that av oid common NaN , MAX_SAFE_INTEGER , date format, etc. handling issues. This allo ws our AISE stack to work seamlessly with existing tools and systems while enjoying the benefits of the more efficient BA P I specification and literal formats. 5.2 Inf o Routes and Progr essive Discov ery The original formulation of REST and HA TEOAS [ 15 ] en visioned systems where the description of the platform capabilities was inline with the ability to dispatch these operations. Further , that a user could discov er and progressiv ely explore details of these capabilities and how to in voke them. Howe ver , modern service based architectures and platforms, draw- ing from the heritage of programming languages, split the executable artifact from the documentation and APIs. Further , our objectiv e in- volv es supporting multiple modes of interaction with a single logical service, i.e. both B AP I (V erbose, Minimal, or ev en Binary Encoded) and JSON. In addition we want to provide build and deployment systems that declarati vely generate all of the appropriate bindings, set the needed routes, setup “well known” names for discoverability , provide inte grated search, and are optimized for agentic interaction. T o support this we introduce a novel runtime platform, M I N T , to handle the configuration and ex ecution of agentic infused software ecosystem services. A deployed M I N T server is setup around a BA P I configuration file that specifies routes based on URI globs. Each route may be a static file (for simple resources) or connected to a B O S QU E task, and of course middle ware options for logging, authorization, and request/response meta-data management – very similar to popu- lar frameworks like express.js [ 13 ]. The M I N T server can also au- tomatically setup support for multiple encodings (V erbose, Min- imal, Binary , JSON) along with special routes for discovery and info. By con vention M I N T sets a top-lev el route named /actions that returns a compact structured specification of all of the end- points av ailable on the server , filtered by the permissions of the caller , which include signatures of the handlers, pre/post conditions and documentation comments. Each endpoint is also setup with a /actions/{endpoint} route that returns more detailed in- formation about the endpoint including normativ e usage patterns, examples, URI links to related resources, and more detailed docu- mentation. This allows for M I N T to satisfy the primary use cases of systems like MCP [ 3 ] or Skills [ 38 ] including autonomous & pro- gressiv e discovery of the capabilities of the service in a structured, secure, and efficient manner . T o further support agentic use cases, M I N T also sets up a /search route that accepts a query , as plain text, and is intended to facilitate semantic search over the service’ s capabilities and related infor- mation. By default, the search route is implemented to perform a structured semantic search ov er the specifications of the endpoints and documents but the design of M I N T allows for this to be cus- tomized and to include links to other related services or resources ( e.g . package manager or websites). This allows agents to ask ques- tions about the service without prior knowledge and autonomously obtain relev ant information for a task without. This opinionated design ensures a standardized w ay to discov er the features and functionality a M I N T service provides. The struc- tured B O S Q U E specifications that are returned ensure that an agent can mechanistically understand the usage of a given feature and the ability to incrementally query for more details allo ws for the provi- sion of detailed information, use e xamples, and normati ve patterns without ov erwhelming the agent context. Further, by integrating search as a dedicated route we can support powerful agentic work- flows with a strong seperation of concerns – specifically the agent is responsible for deciding what concepts are rele vant to the task at hand while the M I N T runtime is responsible determining the best way to find and return rele vant information. 5.3 Exposure, Sandboxing, and Guard-Rails f or Agentic W orkloads Once we are running agentic workloads we need to manage the unique security and operational challenges they present. Again our system is able to leverage the design of B O S Q U E and BA P I to mechanistically understand the behavior of these workloads and monitor them for compliance and security . The first le vel of protection is via the BA P I specifications them- selves. Each API specifies the types of data it accepts and returns, including information on sensiti ve data. Thus, M I N T automatically checks that sensitive data is not sent over an public endpoint (by default all endpoints are pri vate) and pri vate endpoints can also be flagged as no-sensiti ve (or specific lev els). This allows for immediate linting of endpoints that may publicly expose sensitiv e data and also allows for analysis of any workflow to determine if it can access sensitiv e data, and if so, what data and what the potential risks are. For example, when an agent accesses and endpoint with sensitiv e data, the runtime can automatically quarantine an y outputs from that agent which may now be tainted with sensiti ve details. In addition to monitoring data-flo ws and sensitivity , the M I N T runtime also automatically sandboxes e xecution based on the URI T oward an Agentic Infused Software Ecosystem resources it has declared it will access. In Figure 4 the transfer task is declared to access the account resource for the payee alone. Thus, the implementation is unable to, maliciously , access the ballance or or move money from the payer’ s account. This technique can also be applied to any resource that is mappable to a URI, for example using the glob syntax to whitelist a certain set of REST resources to prevent data exfiltration to an unauthorized endpoint or prohibiting access outside of the file:///tmp/app_name/ folder to allo w intermediate files to be written/read while ensuring that user data is not accessed. This ensures that ev en if an agent is compromised or beha ves maliciously , it cannot access resources outside of its declared scope. While the idea of sandboxing is well known, the ke y insight here is using URIs and Globs as the means of resource specification and sandboxing. As opposed to the historical use of custom resource types and access languages [ 2 , 16 , 39 ] which ha ve, historically , been difficult to standardize, understand, and use, URIs and globs natu- rally match the model of resources used in RESTful systems. This allows for a simple, uniform, and po werful way to specify and en- force resource access policies across a wide variety of resource types (files, REST endpoints, databases, etc. ) without needing custom plugins or extensions for each type. Finally , the M I N T runtime also provides a dynamic monitoring and enforcement system for agentic workloads. As B O S Q U E and the BA P I specifications efficiently executable pre/post condition, in variants, and assertions, these can all be monitored and enforced at runtime by the M I N T system. Failures of any of these conditions trig- gers a safe-abort of the offending task, and can be configured to trig- ger additional mitigation such as rolling back state changes, logging relev ant data, and e ven producing a offline time-trav el-debuggable dump [ 4 ]. This provides a po werful line of defense against misbeha v- ing agents. This multi-layered and explicit approach to security and safety implies that, even if a malicious actor is able to exploit and work around one layer of protection, then they are still limited by the additional layers either blocking the exploit or forcing attackers to find multiple bypasses for an exploit to chain. 6 Discussion A key question is whether, in the future, a sufficiently powerful A GI agent will even need or benefit from the features of the agentic infused software ecosystem or if, by sufficient training and power , it will be able to operate on any language/platform? Consider a hypothetical limit where we assume that an ev entual A GI agent is as capable as an e xpert human de veloper , top 10% in the world, then we can answer this question by asking the analogous question of whether this expert would be aided by the features outlined in this work. The answer is yes, they would, and in fact, for many of the features we have outlined part of the description and motiv ation is how it eliminates or helps with problems real dev elopers have today . When going for three (or more) 9 ’ s of reliability then ev en small possibilities for failure ( 1% or less) are unacceptable as, at this lev el, almost any mistake would exceed the failure budget. W ould we accept a 1% chance that an agent w ould stop paying our mortgage, delete a customer database, post medical information to a public forum? Thus, when working to build highly reliable and trustworthy agents we need to have multiple overlapping layers to driv e reliability and it is ill-advised to forgo an y possible adv antage. The con verse question is whether the design and principles out- lined here are overfit for the current state of the art in LLMs and attention-based models. When these models change or are replaced with new architectures will the design of this system become obso- lete or ev en a hindrance? As with the pre vious question, the first response is to again con- sider the hypothetical limit of replacing an LLM with an expert human dev eloper . Again we see that, the principles outlined here are not specific to attention-based models or LLMs, but generalize to humans, as well as earlier (simpler) program synthesis systems ( e.g . NL yze [ 19 ]). Additionally , the system design is set to allow agents to be swapped out transparently . Features like the agent in vokes are explicitly designed to av oid committing to plain text as ar guments and the S U ND E W verifier work with any code regardless of who/what produced it. In practice an agent could be a human in the loop and the proposed AISE would still be ef fectiv e and coherent. Thus, the pro- posed design will remain viable and effecti ve ev en as the underlying technology ev olves. 7 Related W ork Agentic Pro gramming Languages: Despite the major role that programming language design plays in the capabilities of AI agents, there has been relati vely little in vestigation into programming lan- guage design specifically for agentic software development. The most substantial work in this space is by Marron [ 27 , 28 ] and Mei- jer [ 30 – 32 ] who have both proposed designs focused on the issue of how programming language design impact the capabilities of AI programming and Agentic systems. T o the best of our knowledge the Univ ersalis language (Meijer) has not been made publicly av ail- able or been formally described. The B O S Q UE language, as well as the extensions in this work (see data a vailability statement), are fully open-source [ 6 ] and all development is publicly accessible. The extensions in this work go far beyond the state of the art in either system, bringing the design from a core PL or tool, into a full ecosystem including deployment and orchestration. V erifiable AI Pr ogramming: The topic of verifiable AI program- ming has been a large focus of research in the formal methods community , with significant work on using languages like Lean [ 23 ] and Dafny [ 10 ] to formally specify and verify the correctness AI generated code. These systems hav e sho wn promise in verifying the correctness of AI generated code, but the y also demonstrate the limits of existing tooling, particularly in terms of the complexity and size of the specifications required. In practice the current state of the art for verified AI program synthesis is limited to small, self contained programs leet-code style problems [ 25 , 33 , 46 , 48 , 49 ], and ev en then, the specifications are often as large (or larger) and as complex [ 12 ] as the code itself. These challenges highlight the need for simplified (partial correctness focused) specification languages, multi-modal support, and scalable mechanized validation tools as core components of the AISE vision. Specification and Cooperation: The problem of specification and requirements gathering is a major challenge for effecti ve cooperation Mark Marron between human and AI agents in software dev elopment. Of particu- lar interest in this space is prior work on multi-modal specificati on and interaction with End-User Pr ogr amming [ 18 , 41 ] systems, such as FlashFill [ 18 ] and Nlyze [ 19 ], and prorogued programming [ 1 ], which have shown promise in allowing users to provide specifi- cations in a variety of forms. Related work by Gulwani et.al has in vestigated the use of examples and demonstrations as a form of specification for AI agents [ 29 ], which is a promising direction for reducing the difficulty of expressing intents (specifications) and for iterating on with AI agents in a cooperati ve manner . The AISE vision builds on this prior work by integrating multi-modal specification and validation deeply into the language and providing a platform for new UX and interaction paradigms for human-AI cooperation in software de velopment. Agentic Systems: Recent work on long-horizon agentic systems and tool use, such as AppW orld [ 47 ], LOOP [ 8 ] and T oolFormer [ 42 ], hav e demonstrated the potential of AI agents to perform complex tasks over extended periods of time. Howev er, these systems also highlight the challenges of context management and tool discov ery , as well as the need for rob ust failure handling and reco very mecha- nisms as, with more complex tasks, the unaugmented success rates are at 70% for simpler tasks b ut dropping off rapidly to the 45% range for more complex (longer horizon) tasks that in volv e more complex API (tool) use. This work provides a complimentary set of contri- butions – in the form of language design, mechanized validation, and runtime features – that are designed to work in conjunction with improv ements in raw agentic capabilities to create a more robust and effecti ve ecosystem for agentic software de velopment. 8 Onward! This paper outlined the vision and core design concepts for an agen- tic infused software ecosystem, and describes the key components of this system, from the programming language to the mechanized v ali- dation tool to the runtime en vironment. Our approach to the problem of creating effecti ve and trustworthy agentic software systems is to take a holistic approach to the entire software stack, co-designing the language, tools, and runtime environment to work together in a synergistic manner . As shown in this work, this approach is crit- ical to addressing the challenges of context management, api/tool discov ery , specification and requirements gathering, code/action gen- eration, and safety that are critical to the success of agentic software systems. By advancing the state of the art in these areas, and address- ing ke y limitations to current agentic deplo yment, we can create a software ecosystem that significantly increases the feasibility , and driv es widespread adoption, of agentic software systems. Data A vailability Experimental versions of all of the systems described in this paper are publicly av ailable via the main B O S Q U E github repository https: //github .com/BosqueLanguage . These systems are activ ely being integrated and con verted into a production ready system that brings the agentic infused software ecosystem to dev elopers (and users) ev erywhere. References [1] Mehrdad Afshari, Earl T . Barr, and Zhendong Su. 2012. Liberating the Program- mer with Prorogued Programming. In Onwar d! [2] Paschal C. Amusuo, K yle A. Robinson, T anmay Singla, Huiyun Peng, Aravind Machiry , Santiago T orres-Arias, Laurent Simon, and James C. Davis. 2025. ZT - DJ A V A: Mitigating Software Supply Chain V ulnerabilities via Zero-Trust De- pendencies (Proceedings of the IEEE/ACM 47th International Conference on Softwar e Engineering) . [3] Anthropic. 2024. Model Context Protocol. https://www .anthropic.com/news/ model- context- protocol . [4] Earl T . Barr, Mark Marron, Ed Maurer, Dan Moseley , and Gaurav Seth. 2016. Time-tra vel Debugging for Jav aScript/Node.Js (FSE) . [5] Andrew Blinn, Xiang Li, June Hyung Kim, and Cyrus Omar. 2024. Statically Contextualizing Large Language Models with T yped Holes (OOPSLA) . [6] Bosque Source 2024. Bosque Programming Language. https://github .com/ BosqueLanguage/ . [7] Frederick P . Brooks, Jr . 1987. No Silver Bullet Essence and Accidents of Software Engineering. Computer 20 (1987). [8] Ke vin Chen, Marco Cusumano-T owner, Brody Huv al, Aleksei Petrenko, Jackson Hambur ger , Vladlen K oltun, and Philipp Krähenbühl. 2025. Reinforcement Learning for Long-Horizon Interactive LLM Agents. arXiv: 2502.01600 [cs.LG] https://arxiv .org/abs/2502.01600 [9] Clark Barrett, Pascal Fontaine, and Cesare Tinelli 2025. SMT -LIB Standard: V ersion 2.7. https://smt- lib.or g/papers/smt- lib- reference- v2.7- r2025- 07- 07.pdf . [10] Dafny Source 2024. Dafny Programming Language. https://dafny .org/ . [11] James C. Da vis, Christy A. Coghlan, Francisco Servant, and Dongyoon Lee. 2018. The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale (ESEC/FSE 2018) . [12] Dodds, Mike 2025. What W orks (and Doesn’t) Selling F ormal Methods. https: //www .galois.com/articles/what- works- and- doesnt- selling- formal- methods . [13] Express.js 2019. Documentation. https://expressjs.com/ . [14] Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Y ujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, and W anjun Zhong. 2025. ReT ool: Reinforcement Learning for Strategic T ool Use in LLMs. arXiv: 2504.11536 [cs.CL] https: //arxiv .org/abs/2504.11536 [15] Roy Thomas Fielding. 2000. Ar chitectural Styles and the Design of Network- Based Softwar e Ar chitectur es . Ph. D. Dissertation. [16] FreeBSD Foundation. 2024. Introduction to FreeBSD Jails. https:// freebsdfoundation.org/freebsd- project/resources/introduction- to- freebsd- jails/ . [17] Stephen Goldbaum, Attila Mihaly , T osha Ellison, Earl T . Barr, and Mark Marron. 2022. High Assurance Software for Financial Regulation and Business Platforms (VMCAI) . [18] Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-Output Examples. In POPL . [19] Sumit Gulwani and Mark Marron. 2014. NL yze: Interactive Programming by Natural Language for Spreadsheet Data Analysis and Manipulation. In SIGMOD . [20] Sumit Gulw ani, Oleksandr Polozo v, and Rishabh Singh. 2017. Program Synthesis. F oundations and T rends Pr ogramming Languages (2017). [21] Rafael-Michael Karampatsis and Charles Sutton. 2020. How Often Do Single- Statement Bugs Occur? The ManySStuBs4J Dataset (MSR) . [22] Andrew Kennedy . 2009. T ypes for Units-of-Measure: Theory and Practice (CEFP) . [23] Lean Source 2024. Lean Programming Language. https://lean- lang.org/ . [24] Leonardo de Moura and Nikolaj Bjørner 2024. Z3 SMT Theorem Prover. https: //github .com/Z3Prover/z3 . [25] Chloe Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Y ing Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, and Max T egmark. 2024. DafnyBench: A Benchmark for Formal Software V erification. arXiv: 2406.08467 [cs.SE] https://arxiv .org/abs/2406.08467 [26] Zohar Manna and Richard W aldinger. 1980. A Deductive Approach to Program Synthesis. ACM T ransactions on Pro gramming Languag e Systems (1980). [27] Mark Marron. 2023. T oward Programming Languages for Reasoning: Humans, Symbolic Systems, and AI Agents (Onwar d!) . [28] Mark Marron. 2024. A Programming Language for Data and Configuration! (Onwar d!) . [29] Mikaël Mayer, Gustavo Soares, Maxim Grechkin, V u Le, Mark Marron, Olek- sandr Polozov , Rishabh Singh, Benjamin Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In UIST . [30] Erik Meijer. 2025. From Function Frustrations to Frame work Fle xibility. Commun. ACM (2025). [31] Erik Meijer . 2025. Unleashing the Po wer of End-User Programmable AI. Commun. ACM (2025). [32] Erik Meijer . 2026. Guardians of the Agents. Commun. ACM (2026). [33] Md Rakib Hossain Misu, Cristina V . Lopes, Iris Ma, and James Noble. 2024. T owards AI-Assisted Synthesis of V erified Dafny Methods. ACM on Software Engineering (2024). T oward an Agentic Infused Software Ecosystem [34] MITRE. 2024. CWE-25: T op 25 Most Dangerous Software W eaknesses. https: //cwe.mitre.org/top25/archi ve/2024/2024_cwe_top25.html Accessed: 2024-04- 08. [35] Morphir 2021. Morphir. https://github .com/finos/morphir . [36] Niels Mündler , Jingxuan He, Hao W ang, Koushik Sen, Da wn Song, and Martin V echev . 2025. T ype-Constrained Code Generation with Language Models (PLDI) . [37] NPM 2023. npm.js. https://www .npmjs.com/ . [38] OpenAI. 2024. Skills. https://developers.openai.com/api/docs/guides/tools- skills/ . [39] Oracle. 2024. Java Platform, The Security Manager. https://docs.oracle.com/ jav ase/tutorial/essential/environment/security .html . [40] Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Y ulissa Arroyo-Paredes. 2017. Detecting Argument Selection Defects. Pro- ceedings ACM Pr ogramming Languag es (2017). [41] C. Rich and R.C. W aters. 1988. Automatic programming: myths and prospects. Computer (1988). [42] Timo Schick, Jane Dwi vedi-Y u, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer , Nicola Cancedda, and Thomas Scialom. 2023. T oolformer: Language Models Can T each Themselves to Use T ools. arXiv: 2302.04761 [cs.CL] https://arxiv .org/abs/2302.04761 [43] T orsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. [n. d.]. PICARD: Parsing Incrementally for Constrained Auto-Regressi ve Decoding from Language Models. In EMNLP . [44] Joykirat Singh, Raghav Magazine, Y ash Pandya, and Akshay Nambi. 2025. Agentic Reasoning and T ool Integration for LLMs via Reinforcement Learn- ing. arXiv: 2505.01441 [cs.AI] https://arxiv .org/abs/2505.01441 [45] Armando Solar-Lezama, Gilad Arnold, Liviu T ancau, Rastislav Bodik, V ijay Saraswat, and Sanjit Seshia. 2007. Sketching Stencils (PLDI) . [46] Chuyue Sun, Y ing Sheng, Oded Padon, and Clark Barrett. 2024. Clover: Closed- Loop V erifiable Code Generation (SAIV) . [47] Harsh Triv edi, Tushar Khot, Mareike Hartmann, Ruskin Manku, V inty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, and Niranjan Balasubramanian. 2024. AppW orld: A Controllable W orld of Apps and People for Benchmarking Interactiv e Coding Agents (ACL) . [48] Haoxin Tu, Huan Zhao, Y ahui Song, Mehtab Zafar, Ruijie Meng, and Abhik Roychoudhury . 2025. Agentic Program V erification. arXiv: 2511.17330 [cs.SE] https://arxiv .org/abs/2511.17330 [49] Zhe Y e, Zhengxu Y an, Jingxuan He, Timothe Kasriel, Kaiyu Y ang, and Dawn Song. 2025. VERINA: Benchmarking V erifiable Code Generation. arXiv: 2505.23135 [cs.LG] https://arxiv .org/abs/2505.23135

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment