Runtime Governance for AI Agents: Policies on Paths

AI agents -- systems that plan, reason, and act using large language models -- produce non-deterministic, path-dependent behavior that cannot be fully governed at design time, where with governed we mean striking the right balance between as high as …

Authors: Maurits Kaptein, Vassilis-Javed Khan, Andriy Podstavnychy

Run time Go v ernance for AI Agen ts: P olicies on P aths Maurits Kaptein ∗ 1,2 , V assilis-Ja ved Khan 2 , and Andriy P o dsta vnyc h y 1 Eindho ven Univ ersit y of T ec hnology , Eindhov en, The Netherlands 2 Kyvvu B.V., Nijmegen, The Netherlands Marc h 18, 2026 Abstract AI agen ts—systems that plan, reason, and act using large language models—pro duce non-deterministic, path-dependent b eha vior that cannot b e fully gov erned at design time, where with go verned w e mean striking the right balance betw een as high as possible success- ful task completion rate and the legal, data-breac h, reputational and other costs asso ciated with running agents. W e argue that the execution path is the central ob ject for effective run time go vernance and formalize compliance p olicies as dete r ministic functions mapping agen t identit y , partial path, proposed next action, and organizational state to a p olicy vi- olation probability . W e show that prompt-lev el instructions (and "system prompts"), and static access control are sp ecial cases of this framework: the former shape the distribution o ver paths without actually ev aluating them; the latter ev aluates deterministic p olicies that ignore the path (i.e., these can only accoun t for a specific subset of all p ossible paths). In our view, runtime ev aluation is the general case, and it is necessary for any path-dep enden t p ol- icy . W e dev elop the formal framework for analyzing AI agen t go v ernance, presen t concrete p olicy examples (inspired by the AI act), discuss a reference implemen tation, and identify op en problems including risk calibration and the limits of enforced compliance. Keyw ords: AI agen ts, run time go v ernance, compliance p olicies, execution paths, organizational risk, EU AI Act 1 In tro duction Organizations across industries are deploying AI agen ts: systems that use large language models to autonomously plan, in vok e to ols, and take actions with real-world consequences [Xi et al., 2023, W ang et al., 2024, Y ao et al., 2023]. The attraction is clear—agen ts can handle complex, m ulti-step workflo ws that previously required human lab or—and the pace of adoption is rapid. But, gov ernance infrastructure has not k ept up. A 2026 KPMG survey of large-enterprise leaders found that 75% cite securit y , compliance, and auditabilit y as the most critical requiremen ts for agen t deploymen t, while m ulti-agent orchestration complexit y has b ecome the primary b ottle- nec k as organizations mov e from pilots to production [KPMG LLP, 2026]. How ev er, the EU AI A ct’s pro visions for high-risk AI systems — those making or supp orting consequential decisions affecting individuals’ rights, safet y , or access to services — tak e effect in August 2026 [European P arliament and Council of the European Union, 2024] and effectively demand prop er orc hestra- tion. The gap betw een what organizations are deploying and what they can demonstrably gov ern is, in our assessment, the cen tral obstacle to responsible agen t adoption. It is that gap we aim address, at a conceptual lev el, with this pap er. ∗ Corresp o nding author. Email: m.c.kaptein@tue.nl . 1 The go v ernance problem for agen ts is fundamentally different from that of con v entional soft- w are or single-query AI systems, and the difference is not merely one of scale. An agen t ask ed to “prepare a quarterly report” may read from a CRM, access financial databases, fetc h competitor data from the web, generate visualizations, and email the result—a sequence of autonomous de- cisions, each with its own compliance implications, determined at runtime by a language mo del whose outputs are sto c hastic. The same agen t on the same task may follow differen t sequences on differen t runs — a fundamental shift from traditional soft w are, where securit y and control relied largely on predictable, auditable workflo ws. Hence, the set of possible b ehaviors is combinato- rially large and, for agents with co de execution, p oten tially un b ounded. This non-determinism, com bined with m ulti-step to ol use and the abilit y of agen ts to delegate to other agents, means that the violations organizations care about—data exfiltration, information barrier breaches, unauthorized external communication—are prop erties of se quenc es of actions, not of individual actions in isolation. A single database rea d is innocuous; a database read follow ed by an external email is a p otential exfiltration even t. No inspection of either step alone reveals the violation. The ob jective of gov ernance in this setting is not new: organizations hav e alwa ys sought to maximize the pro ductive output of automated systems while keeping the exp ected cost of p olicy violations within acceptable b ounds. This is what role-based access control, data loss prev ention, and information security framew orks ha v e alw ays aimed to do. What is new is that these mechanisms—designed for deterministic, statically sp ecified systems—cannot express or enforce the path-dependent constrain ts that agen ts require. Prompting — a seemingly popular go vernance metho d — reduces the probabilit y of bad paths but pro vides no strict enforcemen t. A ccess control eliminates action categories (i.e., remo ves p ossible agent paths) unconditionally but hardly ev er conditions on prior actions which are essen tial to judge the p oten tial costs. Simply put, neither prompting nor standard access control will reliably detect that an agent has crossed an information barr ier b y com bining the outputs of t w o individually permitted steps — and restricting agent b ehavior p ost-ho c ma y not help, as constraints can inadv ertently push agen ts to w ard alternativ e paths that were nev er an ticipated. This pap er provides a formal framework for runtime agen t gov ernance. W e define compliance p olicies as deterministic functions on execution paths, sho w that existing gov ernance mechanisms are special cases (or, in the case of prompting, not cases at all), and connect p er-step ev aluation to an organizational risk ob jective. The framew ork is deliberately minimal: it specifies the structure of go v ernance without prescribing specific risk taxonomies or policy languages, which will v ary across organizations and ev olv e as the field matures. 1.1 Prior w ork The gro wing recognition that agen ts require gov ernance b ey ond what existing mec hanisms pro- vide has pro duced sev eral lines of work. W e briefly surv ey the most relev an t con tributions and iden tify the gaps our framew ork addresses. W ang et al. [2025] in tro duce MI9, com bining an agency-risk index with FSM-based confor- mance engines for individual agents; Agen tSp ec [W ang et al., 2026] defines a domain-sp ecific language for run time constrain ts. Both target individual agents and do not formalize the rela- tionship b et ween p er-step ev aluation and the organizational risk ob jective. Gaurav et al. [2025] prop ose Gov ernance-as-a-Service, an external enforcement la yer architecturally closest to our ap- proac h, but it seems primarily fo cused on conten t-level violations (misinformation, hate sp eech) rather than b ehavioral tra jectory violations. Bhardw a j [2026] dev elop Agent Beha vioral Con- tracts with probabilistic compliance guaran tees and a Drift Bounds Theorem—the most formally rigorous contribution in this space, complementary to our organizational framing: ABC pro vides p er-agen t guarantees; w e provide the fleet-level framework within which such guarantees op erate. On the broader landscap e, the ArbiterOS paradigm [Arb iterOS Con tributors, 2025] prop oses reframing the LLM as a “Probabilistic CPU,” sharing our view that agen ts are fundamen tally dif- feren t from traditional softw are. Threat taxonomies from Deng et al. [2024], O W ASP F oundation 2 [2025], and the MAESTRO framework [Cloud Securit y Alliance, 2025] inform our adversarial sce- narios. Cihon et al. [2025] prop ose co de-based measuremen t of agent autonomy levels, pro viding a complemen tary design-time assessmen t that do es not address run time enforcemen t. Ivliev and F urman [2025] analyze systemic risk from self-improving agen ts and argue that static defenses are insufficien t against self-mo difying systems—a conclusion that aligns with our framework’s em- phasis on runtime ev aluation, though our scop e is the go vernance of curren t-generation en terprise agen ts rather than containmen t of recursiv ely self-impro ving systems. What is missing across this bo dy of work is threefold: (i) a unified mo del in whic h existing go vernance approaches are sp ecial cases of a single framew ork, making precise what eac h curren t mec hanism can and cannot enforce; (ii) organizational scop e, connecting p er-step ev aluation to an aggregate risk ob jective across a fleet of agen ts; and (iii) concrete p olicy sp ecifications tied to regulatory requirements, demonstrating that the framework is not merely theoretical. 1.2 Aims and outline This paper does not presen t an implemen tation or exp erimental results. Its con tribution is conceptual: a framework precise enough to guide implementation, general enough to survive c hanges in the underlying tec hnology , and concrete enough to b e immediately useful. A dmittedly , this pap er is written in a time where new agent paradigms and capabilities are quickly emerging: w e do not exp ect this pap er to b e the "tell all" of agen t gov ernance; w e merely strive to pro vide a meaningful contribution to the literature and help practitioners na vigate this space to build agen ts that are b oth useful and well-behav ed. Section 2 in tro duces the go vernance c hallenge somewhat informally: we discuss what agen ts are, what effective go vernance means, why agen ts require a differen t approach, and why curren t metho ds fail (or do not address all c hallenges). This section uses no formal notation and is in tended to mak e the case for run time gov ernance (as opp osed to mere design-time gov ernance) accessible to a broad audience, including p olicymakers and enterprise architects. Section 3 mak es the argument more precise: formal definitions of execution paths, the p olicy function, the Pol- icy Engine and its gov ernance ob jectiv e, existing approac hes as sp ecial cases, and a concrete instan tiation with w orked p olicy examples. W e ha ve tried to capture generically all the cases that w e are currently aw are of. Next, section 4 discusses ho w to build a system that realizes this framew ork, including a reference implemen tation that illustrates the trade-offs concretely . W e discuss p otential architectures, practical p olicy authoring, and the challenges of shared states. Section 5 reflects on what the framework implies for organizations seeking to comply with the EU AI A ct; we hope to provide a meaningful lens. Section 6 summarizes the argumen t, iden tifies op en problems, and outlines next steps. 2 The Gov ernance Challenge 2.1 What are agen ts F or the purposes of this pap er, an AI agen t is a system that receiv es a task (prompted or otherwise) describ ed in natural language and autonomously executes a sequence of actions to accomplish the task, where the actions, their order, and their n um b er ar e determine d at runtime by a language mo del . The k ey comp onen ts often are: • a language mo del that decides what to do next, giv en the task, • the history of prior actions, and an y retriev ed context; • a set of to ols that the agen t can inv ok e—database queries, API calls, w eb requests, co de execution, email, messaging; 3 • "long-term" memory , including con v ersation history and retriev al-augmented context; and, in many implementations, • internal guardrails—dev elop er-sp ecified c hec ks that the agen t is instructed to apply to its o wn behavior. The b eha vior of an agent on a task pro duces an exe cution p ath : a sequence of discrete steps, eac h consisting of an action t yp e (language mo del call, to ol in vocation, delegation to another agen t), an input, and an output. A customer service agent migh t retrieve a supp ort tick et, lo ok up the customer’s account, draft a resp onse, chec k it against a p olicy , and send it. A financial analyst agen t migh t query a database, fetch market data from an external API, run a calculation, and email the result. Each of these defines a path—a sp ecific sequence of steps tak en on a sp ecific execution. T wo features of this path are consequential for go vernance. First, the path is not predeter- mined: the language model decides at each step what to do next, and differen t runs of the same agen t on the same task ma y pro duce different paths. W e hav e seen paths on tasks range from a handful of steps, to thousands of steps. Second, the steps in the path interact: the output of an early step (e.g., retriev ed data) b ecomes part of the context for later steps (e.g., what the language mo del decides to include in an email). This means that the go vernance implications of a step dep end on what happened b efore it. The ob jective of go vernance in this setting is not new: organizations ha v e alw ays sought to maximize the productive output of automated systems, i.e., a system following a path that leads to successful task completion, while keeping the exp ected cost of p olicy violations within acceptable bounds. This is what role-based access con trol, data loss prev en tion systems, and information securit y frameworks hav e alw ays aimed to do. What is new, as w e develop in the follo wing sections, is the difficult y of ac hieving it. 2.2 Wh y agen ts are differen t Fiv e prop erties make agen ts qualitativ ely differen t from the systems that existing gov ernance mec hanisms were designed for, and together they explain why the familiar ob jective has become substan tially harder to achiev e. 1. Non-determinism. The same agen t giv en the same task may follo w differen t paths on differen t runs. This is not a bug—it is constitutiv e of the flexibility that mak es agents useful. But it means that design-time v erification of “the” b ehavior is impossible: there is no single behavior to v erify . 2. Dynamic to ol use. Which to ols the agen t inv okes, in what order, and with what argumen ts are runtime decisions by the language mo del, not a predetermined sequence sp ecified in co de. A traditional w orkflo w automation system calls a fixed sequence of APIs; an agent decides which APIs to call based on what it has observed so far. 3. V ariable-length p aths. Different executions of the same agent on the same task ma y in v olve differen t num b ers of steps. The decision surface that go vernance m ust co ver v aries across runs. 4. Self-mo dific ation. Agen ts with co de execution capabilities can write new functions, mo dify their o wn prompts, or create p ersistent to ols at run time [Manik and W ang, 2026]. An agen t that writes a help er function to send emails directly—b ypassing a go verned email tool—has altered its own capabilities in a wa y that w as not an ticipated at design time. This is an emerging rather than universal concern: most curren tly deplo yed enterprise agen ts do not ha ve unconstrained co de execution. It is, how ever, a go vernance-relev an t risk for any agen t that do es. 4 5. Multi-agent inter action. Agents delegate to other agen ts, receive results from them, and share w orkspaces. This means that the b ehavior of one agen t can create compliance con- strain ts for another. Consider a financial institution with an advisory-side agent and a trading-side agent separated b y an information barrier. If the advisory agent accesses p ending deal data and then delegates a task to the trading agent, the trading agen t’s re- sp onse ma y con tain deal-adjacen t information that the advisory agen t no w holds alongside restricted data. Neither agen t individually violated a rule; the violation is a prop erty of the in teraction. Detecting it requires visibility across b oth agents’ paths—which is why go vernance m ust be organizational, not per-agent. Sadly , in the sense that it complicates our go vernanc e of agen ts, these properties all in teract. An agent that non-deterministically decides to fetc h external data (dynamic to ol use), then writes a script to pro cess it (self-mo dification), then delegates to another agent for a summary (m ulti-agent interaction) has pro duced a path whose gov ernance implications could not hav e b een sp ecified at design time, whose length w as not predictable, and whose compliance depends on the full sequence of steps including interactions with other agen ts. This is the gov ernance c hallenge in fairly concrete terms. W e use three concrete scenarios that we return to later in the pap er to make things concrete: 1. A customer service agen t retriev es a supp ort tick et containing injected instructions [Gre- shak e et al., 2023, P erez and Rib eiro, 2022] and then acts on those instructions, disclosing accoun t data. Neither step—retrieving a tick et, answ ering a query—is inherently violating; the violation is in the sequence. 2. An agent preparing a rep ort reads from a CRM, accesses financial data, fetches comp etitor pricing from the web, and emails the rep ort externally . The external email may violate p olicy—but only b ecause of what data w as accessed in prior steps. 3. T wo agen ts, each acting within its p ermissions, pro duce a combined path that violates a cross-organizational constraint. In the formalization of Section 3, each of these scenarios b ecomes a concrete p olicy (the information barrier scenario describ ed ab o ve). A natural—and common —resp onse to these scenarios is to require human appro v al b efore consequen tial actions. This is v aluable—and the framework we present in Section 3 accom- mo dates it—but it do es not dissolv e the gov ernance problem. Human approv al addresses the question of whether a sp e cific action is acceptable at the moment it is prop osed; it do es not address whether the path leading to that point already con tains a violation, whether the v olume of approv al requests is manageable, or whether a reviewer presen ted with a wall of con text under time pressure is providing meaningful o versigh t or a rubber stamp. Human approv al will not be fault pro of and should simply be regarded another step in the p oten tially risky agent’s execution path. The p oten tially more interesting question is not whether human appro v al solve our gov er- nance problem, but rather when, on what information, and decided by what mechanism, human appro v al should b e one of the steps in the agen t’s task execution. This is exactly the types of questions a run time gov ernance la yer should aim to answer. 2.3 Wh y curren t metho ds fall short W e no w examine fiv e categories of existing, common, go vernance mechanisms and, for eac h, iden tify what it can and cannot do in the context of agen ts. W e would argue that all are useful, none are complete. • Pr ompt-level c ontr ol. System prompts, injected instructions, "system prompts" and in- con text examples shape the agent’s b eha vior by mo difying its input. An agen t told “nev er send emails con taining customer data without approv al” is statistically less lik ely to do so. 5 But prompt compliance is probabilistic, not guaran teed: the agent may ignore, misin ter- pret, or b e manipulated into ov erriding its instructions, as the prompt injection scenario illustrates. More fundamen tally , prompting cannot enfor c e an ything—there is no mech- anism that preven ts an action if the agen t decides to tak e it. Prompting reduces the probabilit y of p olicy-violating paths; it provides no enforcemen t when violations occur. • A c c ess c ontr ol (RBA C, IAM, to ol whitelisting). Role-based access con trol and to ol whitelist- ing restrict whic h actions an agent can take, unconditionally . An agen t without email p ermissions cannot send emails; an agent without database access cannot query databases. This is genuine enforcement—the action is categorically prev ented and hence any p ossible task execution paths con taining that action node are cut. But it is (at least in most imple- men tations) con text-free: access con trol does not kno w, and cannot condition on, what the agen t has done before. An agen t with both “database read” and “email send” p ermissions has everything it needs to exfiltrate data, and access control cannot distinguish the in- no cuous use of these permissions from the problematic one. A ccess con trol prev ents action c ate gories ; it cannot prev ent action se quenc es . • A gent-level guar dr ails. Many agent frameworks include developer-sp ecified chec ks: out- put filters, conten t classifiers, self-critique steps. These op erate under the agen t’s o wn authorit y—they are part of the agen t’s co de, executed within the agent’s pro cess. F or agen ts with co de execution capabilities, this is a fundamental limitation: an agen t that can write co de can, in principle, write co de that bypasses its o wn guardrails. Ho wev er, ev en without self-mo dification, guardrails embedded within the agen t cannot b e audited, up dated, or enforced independently of the agent. They are not go vernance; they are (po- ten tially useful) self-regulation. • Content-level filtering. External con ten t filters [Gaurav et al., 2025] can in tercept and ev aluate the conten t of individual steps—chec king for PI I, hate speech, or misinforma- tion. This is v aluable but addresses a differen t problem: con ten t violations at individual steps, not behavioral tra jectory violations across steps. A con tent filter can detect that an email contains a so cial security n umber; it cannot detect that the email w as sent after the agen t read confiden tial deal data from a restricted database. The true violation is in the tra jectory , not the con tent and hence con tent lev el filtering will not suffice. • Human appr oval. Requiring a h uman to appro ve consequential actions is often treated as a catc h-all solution, and it is gen uinely useful: a human who sees the accumulated con text of an agen t’s path can exercise judgmen t that no policy function can replicate. But human appro v al is not a go vernance mechanism in its own right—it is an action that a go v ernance mec hanism can in vok e. The p olicy that decides when to request appro v al, what con text to surface to the reviewer, and what the approv al unlo cks in subsequen t steps is where the go vernance work lives. Appro v al without that surrounding structure is unscalable (a fleet of agen ts generates more requests than h umans can meaningfully review), incomplete (the path leading to the approv al p oint ma y already contain a violation), and gameable (an agen t can b e manipulated into reaching the appro v al gate via a sequence of individually inno cuous steps that together constitute a violation). Used well—in vok ed selectiv ely b y a p olicy that has already ev aluated the path, and presented to a review er with the relev ant con text made explicit—h uman approv al is a p ow erful comp onent of the gov ernance system. It should ho wev er not b e mistak en for the system itself. T aken together, these mec hanisms form a lay ered defense that every organization deploying agen ts should maintain: prompting reduces the base rate of violations; access control eliminates en tire action categories; guardrails provide agen t-level c hec ks; conten t filtering catc hes p er-step con tent issues. Each is v aluable. None how ever can express or enforce a p olicy that dep ends 6 on the sequence of prior actions. F or that, a differen t mechanism is needed: one that ev aluates the prop osed next action in the context of the full execution path, op erates externally to the agen t, and applies uniformly across all agen ts in the organization. This is the run time go vernance framew ork w e formalize in Section 3. 3 A F ormal F ramew ork for Agen t Go vernance W e pro ceed as follows. W e first define the execution path of an individual agent executing a task—the ob ject w e go vern—and the step types that constitute it. W e then in tro duce the policy function, which maps a partial path and a prop osed next action to a violation probability . Here w e in tentionally refrain from discussing p olicy implemen tations in detail; w e, for the time b eing, merely assert that suc h p olicies can meaningfully b e constructed. Next, we in tro duce a P olicy Engine, the organizational comp onent that ev aluates p olicies across a fleet of agents and k eeps exp ected violations within an acceptable bound. W e close b y showing ho w prompting and access con trol relate to this framew ork Throughout, we distinguish the ide al formalization —what a complete go v ernance system should do in principle—from more workable instantiations that approximate the ideal under practical constrain ts. The gap b et ween the tw o is not a weakness of the framew ork; it is the precise statemen t of the op en organizational and engineering problems w e face when gov erning agen ts. 3.1 The Execution P ath An agent A is a computational entit y with a p ersistent iden tity and a registered purp ose. F or the purp oses of the formal framew ork, A is simply an identifier; Section 3.5 discusses what metadata a concrete system attaches to it. The execution of a task by agent A pro duces an execution path P = ( s 1 , s 2 , . . . , s n ) , a finite sequence of discrete steps. Eac h step is a triple s i = ( τ i , d in ,i , d out ,i ) , where τ i is the step t yp e, d in ,i is the input data to that step, and d out ,i is the observed output. W e distinguish three step types, eac h presenting a qualitativ ely different gov ernance concern: 1. A sto chastic step is a call to a language mo del: its output is non-deterministic, conditioned on the full input context, whic h may ha ve b een altered by prior steps—for example, by retriev ed con ten t con taining injected instructions. 2. A deterministic step is a call to an external tool or system according to a defined sc hema— database queries, API calls, memory reads and writes—where the go vernance concern is what data is accessed, modified, or transmitted. 3. A c omp osite step is a delegation to another agen t, whic h will itself pro duce a sub-path P ′ ; this type com bines the concerns of the previous tw o and introduces the additional question of how go v ernance of the sub-path propagates to the primary path, a question we revisit in Section 6.1. This taxonomy is motiv ated by go vernance, not by implemen tation: a concrete system will lik ely subdivide these categories further when p olicies require finer distinctions. One might expect a fourth t yp e for h uman inputs—approv al responses or corrections provided b y a person during execution. W e do not treat this as a primitive type because a h uman input is, from a gov ernance persp ective, either a deterministic step (a structured approv al with a defined sc hema) or a sto chastic step (free-form input whose con tent is unpredictable). Its go vernance significance lies not in constituting a separate t yp e but in its presence in P i , which appro v al-gating p olicies can v erify . 7 What constitutes a step is a design choice of the implementer. A coarser granu larity reduces the precision at whic h policies can in tervene; we assume throughout that gran ularity is fine enough that the go vernance-relev ant decisions—to ol in vocations, external comm unications, data accesses, delegations—are individually visible. Informally , the space of all p ossible executions of an agen t on a task can be thought of as an implicit graph, with paths as walks from the initial state to a terminal outcome; the aim is to keep agen ts on paths that are b oth successful and lo w-cost. P olicies are functions on observed (and prosp ectiv e) partial paths, not properties of the graph itself. F or realistic agents the true graph is unknown, p oten tially infinite, and c hanges at run time. W e assume how ever that eac h execution path terminates in a terminal state : either suc c ess ( ⊤ ), meaning the task completed, or failur e ( ⊥ ), meaning the task did not—whether due to natural task failure, h uman interv en tion, or automatic p olicy-engine-induced termination. W e asso ciate with eac h completed task a utilit y u ( P ) ≥ 0 , where higher v alues indicate greater task v alue; the sp ecific utilit y function is task-dependent and left to the instan tiation. 1 3.2 The P olicy F unction A t step i , the partial path P i = ( s 1 , . . . , s i ) records what the agent has done so far. W e subse- quen tly introduce an agent’s intended next action s ∗ = ( τ ∗ , d ∗ in ) . Note that the output d ∗ out of this next action is unkno wn, since s ∗ is what the agent in tends to do, not what it has done. In the ideal framework we in tend to sketc h, we believe the go vernance system should ev aluate s ∗ b efor e it executes, using the full partial path P i as context. This prosp ectiv e ev aluation is what mak es preemptive enforcemen t p ossible: a violation can b e preven ted rather than merely recorded. The alternative—ev aluating solely s i after execution—is retrosp ective and can only detect and record violations, not prev ent them. An organization’s agen t governanc e is a set of p olicies J . Each p olicy j ∈ J is a deterministic function π j ( A, P i , s ∗ , Σ) − → [0 , 1] , whose output is the probabilit y that executing s ∗ constitutes a violation of p olicy j , giv en agen t A , partial path P i , and shared gov ernance state Σ . The input A carries whatever registered metadata the organization asso ciates with the agen t. The input P i records completed steps; in practice, most p olicies w ould compress this into a small state v ector up dated incremen tally rather than re-examining the full sequence at eac h step. The input s ∗ is the prop osed action, consisting of its t yp e τ ∗ and input d ∗ in . The input Σ represents the shared go vernance state main tained b y the Policy Engine across all agents: it captures go vernance-relev ant facts that no single agen t’s path contains, such as which agen ts ha ve accessed whic h data categories or whic h information barriers ha ve been activ ated. Most p olicies in practice lik ely ignore Σ ; p olicies enco ding cross-agent constraints ho wev er require it. In our view, the p olicy function has to b e deterministic: iden tical inputs alw ays pro duce iden- tical outputs. This is a delib erate design constrain t. If p olicy ev aluation w ere non-deterministic, the same execution path could produce differen t enforcemen t decisions on differen t runs, making audit logs un verifiable and compliance guarantees imp ossible to state. Determinism is also what mak es an audit record meaningful: given P i , s ∗ , and the v alue of Σ at ev aluation time, any auditor can reproduce the scores that led to an enforcement decision. 2 It is w orth noting that the input space of π j is very large— P i alone may b e arbitrarily long—so in practice ev ery p olicy maps large regions of the input space to the same output. F or the same task, different executions may pro duce differen t P i b ecause sto chastic steps are non-deterministic, and hence differen t policy outputs. The apparen t v ariability of p olicy outputs across runs reflects the stochasticit y of the agen t, not of the policy . 1 A natural baseline is u ( P ) = 1 [ terminal state = ⊤ ] , i.e. binary success, which is sufficien t to make the go vernance ob jective w ell-defined and to exp ose the trivial-block pathology . 2 Non-deterministic p olicy functions seem "all the rage"; i.e., we use an agent to ev aluate the b ehavior of another agent. This is easy to incorp orate in our framework: the "p olicy agen t" call is simply a step in P . 8 3.3 The P olicy Engine and Go v ernance Ob jective P olicies are functions; something m ust ev aluate them, act on the results, and main tain the shared state they dep end on. W e call this the P olicy Engine : the organizational component responsible for in tercepting prop osed actions, ev aluating all applicable p olicies, maintaining Σ , and issuing in terven tions. 3 Giv en a prop osed action s ∗ at step i , the P olicy Engine ev aluates all active p olicies and com bines their outputs into a step-lev el violation score , v i = 1 − Y j ∈J  1 − π j ( A, P i , s ∗ , Σ)  , the probabilit y that at least one p olicy is violated b y executing s ∗ . Because eac h π j conditions on the full partial path P i , v i is already path-level: it reflects not just the proposed action but the history of steps that led to it. This quantit y is bounded in [0 , 1] regardless of the n umber of p olicies and do es not gro w artificially with |J | . The quantit y of interest p er task is the terminal violation score v T , the step-level score at the terminal step of the task. The fleet-lev el go vernance ob jective is a constrained optimisation: maximise exp ected task utilit y across the fleet while keeping the expected ter minal violation score within an acceptable b ound, max E   X a ∈A ( t ) X tasks of a u ( a )   sub ject to E   X a ∈A ( t ) X tasks of a v ( a ) T   ≤ B , where B is the organization’s risk budget. This form ulation mak es the go vernance tradeoff explicit: a p olicy engine that simply blo cks all tasks achiev es E [ P v T ] = 0 trivially , but at the cost of E [ P u ] = 0 . The budget B is directly interpretable and can b e monitored in real time: B = 0 . 1 means the organization tolerates an exp ected 0 . 1 p olicy-violating tasks completing at an y momen t, a quan tity a risk officer can calibrate against regulatory exposure. 4 Finally , the Policy Engine applies a decision function δ that maps the step-lev el violation score v i to an in terven tion . An interv en tion either terminates the task—mo ving it to a failure terminal state ⊥ and thereb y capping v T = v i —or modifies the path in a w ay that allo ws execution to contin ue with reduced violation risk. The latter includes any action that results in a new step being app ended to P i b efore the agen t pro ceeds, since subsequent π j ev aluations condition on the updated P i and ma y therefore return materially lo wer scores. Crucially , δ faces a gen uine tradeoff: terminating a task at step i caps v T but reduces u ( P ) ; the organization m ust therefore calibrate δ so that the fleet-level ob jectiv e is maintained while aggregate utility remains acceptably high. 5 3.4 Existing Approac hes as Sp ecial Cases Ha ving defined the p olicy function and the Policy Engine, w e can lo cate existing go vernance mec hanisms precisely within the framew ork. Prompt-lev el control—system prompts, injected instructions, in-con text examples—do es not instan tiate π j at all. Prompts mo dify the agen t’s input, shifting the distribution ov er future paths to wards (hop efully) those with low er π j v alues. An agent instructed never send emails 3 A complete implemen tation also maintains an audit trail. F rom the framew ork’s p ersp ective this is simply the stored sequence of tuples ( A, P i , Σ , s ∗ , v i , δ ( v i )) at each step—a natural b ypro duct of run time ev aluation rather than a separate concern. 4 It is trivial to include p olicy violation sp ecific costs into this framework; w e inten tionally try to forward only the core parts w e deem necessary for a ric her understanding. 5 Concrete interv entions—blocking, steering, requesting human approv al—are implementation choices; w e dis- cuss them in Section 4. 9 without appr oval will produce proposed actions s ∗ for which appro v al-related p olicies return lo w er violation probabilities. But this is a shift in distribution ov er p ossible execution paths, not an enforcemen t: there is no mec hanism that prev ents a high-probabilit y action if the agent proposes one. Prompting mak es costly paths less lik ely; it do es not remo ve them. A ccess con trol—role-based p ermissions, tool whitelisting, API authen tication—implemen ts a degenerate case of π j that uses (often) only A and the proposed action type τ ∗ : π access ( A, P i , s ∗ , Σ) = ( 0 if τ ∗ ∈ Allow ed( A ) , 1 otherwise. The partial path P i , the prop osed input d ∗ in , and the shared state Σ are all ignored; the output is binary . Access control eliminates action categories unconditionally but cannot express con text- dep enden t constrain ts. It is a sp ecial case of π j with P i , d ∗ in , and Σ fixed to n ull. Run time ev aluation instan tiates the full π j ( A, P i , s ∗ , Σ) , using all four inputs, and is the general case. A ccess con trol is the special case where path and input are ignored; prompting is not a case at all. Any p olicy whose violation condition depends on what happ ened in prior steps can only be enforced at run time. Finally , note that compliance ma y be voluntary —the agen t calls the P olicy Engine of its o wn accord and may skip the call—or enfor c e d —the P olicy Engine is architecturally interposed so that no prop osed action executes without passing through π j . Only enforced compliance provides v erifiable go v ernance guaran tees. V olun tary compliance shifts behavior in the right direction; it should not be mistak en for go vernance. 3.5 Concrete Instan tiation The framew ork ab ov e is abstractly stated at the lev el of ideal ob jects. W e now sho w ho w each abstract component maps to a concrete c hoice b y working through the three scenarios in tro duced in Section 2.2. T o this end w e assume that in a concrete system, the agen t identit y A carries a metadata record M A sp ecifying registered purp ose, risk classification, o wner, tool configuration, and a cryptographic hash h ( A ) of the agen t’s definition at registration time. The hash enables detection of self-mo dification: at task start, the Policy Engine recomputes the hash of the running definition and compares it to h ( A ) . Agen t definitions evolv e—prompts are revised, to ols added—and each c hange constitutes a re-registration pro ducing a new hash and retiring the old version. The shared go vernance state Σ is instan tiated as a ledger maintained by the P olicy Engine recording, for each active task, the maximum sensitivity lev el of data accessed, flags for information-barrier tags that ha ve b een activ ated, and cross-agent delegation records; it is up dated eac h time a deterministic or composite step completes. With these instantiations in place, the follo wing policies eac h correspond to one of the three scenarios from Section 2.2: • A gent inte grity : at task start, the Policy Engine compares the hash of the running definition to h ( A ) , returning 1 if they differ and 0 otherwise; dep ends only on A and can be ev aluated b efore any step executes. • Purp ose and do cumentation : returns 1 if required fields (purp ose, risk classification, owner) are absent from M A ; dep ends only on A ; ev aluated at deploymen t time. • PII pr e de c essor r e quir ement : returns 1 if s ∗ w ould access p ersonal data and no PII_Check step app ears in P i ; addresses the prompt-injection scenario by requiring that a classification step precede an y action on p ersonal data. 10 • Appr oval b efor e external actions : for high-risk agents, returns 1 if s ∗ is an external action and no Human_Approval step app ears in P i ; addresses emergen t to ol c haining b y blocking the terminal external send un til appro v al is recorded. • Data exfiltr ation pr evention : assumes deterministic steps are lab eled with a sensitivity lev el σ ∈ [0 , σ ceiling ] written into Σ on completion; returns σ max /σ ceiling if s ∗ sends data externally , and 0 otherwise—a graduated policy whose score reflects what has been touched in prior steps. • Information b arrier : if Σ records that A has accessed data tagged as one side of a named barrier and s ∗ in volv es the other side, returns 1 ; this p olicy cannot b e ev aluated from A ’s path alone, making it the clearest illustration of why Σ is necessary . • Exe cution b ounds : returns a score increasing linearly with | P i | , reac hing 1 at a configured maxim um. • Time r estriction : returns 1 if the curren t time falls outside p ermitted operating hours for the agent’s risk classification; dep ends only on A . T able 1 summarizes which inputs each p olicy uses, whether its output is binary or gradu- ated, and whether it can b e ev aluated b efore the task starts. The pre-task column p oints to a practically imp ortan t observ ation: all p olicies that dep end only on A can b e ev aluated once at deplo yment or task start, without p er-step interception, reducing run time o verhead to those p olicies that gen uinely require P i , s ∗ , or Σ . T able 1: Concrete p olicy examples, their inputs, output t yp e, and whether they can b e ev aluated b efore the task starts. P olicy A P i s ∗ Σ Output Pre-task Agen t in tegrit y ✓ binary y es Do cumen tation ✓ binary y es PI I predecessor ✓ ✓ binary no Appro v al required ✓ ✓ ✓ binary no Data exfiltration ✓ ✓ ✓ graduated no Information barrier ✓ ✓ ✓ ✓ binary no Execution b ounds ✓ graduated no Time restriction ✓ binary y es The table also makes the pap er’s central argumen t concrete. Access con trol corresp onds to p olicies that use only A and τ ∗ —the path, proposed input, and shared state are ignored, and the output is binary . Run time ev aluation is required for ev ery other ro w. No amount of prompting or access restriction can enforce a p olicy whose violation condition dep ends on what happ ened in prior steps. 4 Implemen tation The framework of Section 3 describ es what a go vernance system should do in principle. This section discusses in more detail what it takes to realize that in practice. W e do not presen t a complete system; the contribution of this paper is in tentionally conceptual. Instead, w e discuss the key architectural c hoices that determine how closely an implementation can approac h the ideal, the authoring of p olicies, and a reference implementation that illustrates the tradeoffs concretely . 11 4.1 Deplo ymen t mo des The most consequen tial implementation decision is whether the P olicy Engine ev aluates prop osed actions b efor e or after they execute. This single distinction determines what the engine can prev ent versus what it can only detect, and it maps directly on to the prospective/retrospective already hinted at in Section 3.2. In pr osp e ctive mo de , the P olicy Engine in tercepts each prop osed action s ∗ b efore the underlying call is made, ev aluates all applicable p olicies, computes v i , and applies δ b efore execution proceeds. This is enforced compliance: violations can b e preven ted, not merely recorded. Prospective interception can be ac hieved at differen t la yers of the stack— wrapping to ol APIs, instrumenting the agent framew ork, or in terp osing at the mo del endpoint— but these are engineering choices within the mo de, not gov ernance-relev ant distinctions. What matters is that no proposed action executes without passing through π j . In r etr osp e ctive mo de , the P olicy Engine receives execution logs after steps complete and ev aluates p olicies p ost-ho c. No interception o ccurs; δ can flag violations and generate alerts but cannot prev ent actions. This mode is purely detective: it supp orts audit, agen t profiling, and after-the-fact escalation, but it cannot satisfy regulatory requirements that demand prev ention. It is, ho wev er, the only mo de av ailable when the organization do es not control the agent’s execution en vironment—for example, when using third-party agen t services that exp ose logs but not in terception ho oks. The tw o modes are not m utually exclusive: a prospective engine also pro duces an audit trail, and a retrosp ectiv e engine can feed its findings back into the registration-phase p olicy ev aluations that gate future task starts. But they are strictly ordered by go vernance strength: prospective mo de is the target; retrospective mo de is a fallback when in terception is not a v ailable. A practical cav eat applies to b oth mo des. As noted in Section 2.2, agen ts with co de exe- cution capabilities can in principle generate actions that bypass whatev er interception la yer is in place—writing co de that calls to ol APIs directly , spa wning subpro cesses, or mo difying their o wn execution environmen t. Prosp ective in terception is enforced under the assumption that the agen t op erates within the gov erned execution environmen t; it is an arc hitectural assumption, not a prov en inv ariant. The completeness of enforced compliance under self-mo dification is an op en problem (see Section 6.1). 4.2 System arc hitecture Regardless of deploymen t mo de, the P olicy Engine op erates in t wo phases that corresp ond di- rectly to the pre-task/runtime distinction established in T able 1. In the r e gistr ation phase , when an agent is deploy ed or re-registered, the Po licy Engine ev al- uates all policies that dep end only on A . An agent that fails these chec ks—b ecause it is undoc- umen ted, its definition hash has c hanged, or it is sc heduled outside permitted hours—is rejected b efore any task b egins. This eliminates an entire class of violations without p er-step o verhead and provides a natural enforcemen t point for deploymen t-time organizational requiremen ts. In the p er-step phase , the Policy Engine intercepts each prop osed action s ∗ , ev aluates all remaining p olicies, computes v i , applies δ , and records the step to the audit trail. The k ey to making this tractable is that most p olicies do not require re-examination of the full path P i at each step. Instead, the Policy Engine main tains a compact governanc e state ve ctor p er task—a small set of incremen tally up dated v alues such as the maxim um data sensitivity lev el encoun tered, a b o olean recording whether approv al has o ccurred, and the current step count. Up dating this vector at each step is constant-time; ev aluating most p olicies against it is equally c heap. The state v ector is a sufficien t statistic for eac h p olicy: rather than conditioning on the full path, each p olicy conditions on the state vector, which captures ev erything it needs from prior steps. This is exact for all p olicies in T able 1 and for the large ma jority of practically relev ant p olicies. Computing v i = 1 − Q j (1 − π j ( · )) across a p olicy set J is linear in |J | p er step. Policies that dep end only on A and τ ∗ can b e short-circuited b efore the state v ector is consulted, and 12 indep enden t p olicies can be ev aluated in parallel. F or t ypical policy sets of tens to low hundreds of p olicies, p er-step ev aluation adds mo dest ov erhead relativ e to the language mo del inference that often dominates agent execution time. Note that audit logging is a natural byproduct of the p er-step phase: the full state tuple ( A, P i , Σ , s ∗ , v i , δ ( v i )) at eac h step is already av ailable and can b e p ersisted. The audit trail is itself a high-v alue asset requiring appropriate access con trols and retention p olicies; see Sec- tion 4.6. 4.3 P olicy authoring In practice, the large ma jority of organizationally relev ant policies are binary threshold rules on path state: has a particular step type app eared, has a sensitivity level b een exceeded, has the step count reached a limit. These are c heap to ev aluate, straightforw ard to audit, and simple to express. The graduated p olicies in T able 1 are the exception rather than the rule, and even these reduce to simple arithmetic on the state vector. This observ ation motiv ates a template-based approach to p olicy authoring. Rather than ex- p ecting compliance officers to write p olicy functions from scratc h, organizations can main tain a library of parameterized templates co vering the common gov ernance patterns: approv al-gating (whic h action t yp es require prior approv al for whic h agent risk classifications), data sensitivity thresholds (at what sensitivity lev el do es an external action require interv en tion), information barrier enforcement (whic h data categories are barrier-separated), and execution b ounds (maxi- m um step counts by agen t type). Authoring a p olicy b ecomes a matter of selecting a template and configuring its parameters—a task that requires gov ernance judgment but not programming. T esting p olicies b efore deploymen t is non-trivial. A p olicy that is to o p ermissiv e fails to catc h violations; one that is to o restrictive blo c ks legitimate task paths and reduces agen t utility . P olicies also interact via the comp osition rule v i = 1 − Q j (1 − π j ) : a new p olicy that individ- ually app ears reasonable can, combined with existing p olicies, push v i high enough to trigger in terven tions on most useful paths. The recommended practice is to deplo y new p olicies in flag- only mo de first—computing v i and logging the results without acting on them—and to v alidate against a representativ e sample of execution traces b efore enabling enforcement. Policy c hanges tak e effect across all agents immediately and should b e versioned: the audit trail records the activ e policy v ersion at eac h step, so that enforcement decisions can b e reproduced and reviewed against the policy set that w as in force at the time. 4.4 Concrete in terv ention outcomes The decision function δ maps v i to an interv ention at eac h step. In practice, implementations t ypically realize this as one of three concrete outcomes. P ass : the prop osed action executes and the agent con tinues unmo dified. Steer : execution pauses; the P olicy Engine may inject a com- pliance hint in to the agen t’s context, request human appro v al, or alert a resp onsible person, with P i p ersisted and execution res uming from the stored state when resolved. Block : the prop osed action is preven ted, the task terminates at a failure state ⊥ , and the incident is escalated. The thresholds at which δ pro duces each outcome are organizational c hoices, calibrated so that the fleet-lev el ob jective is main tained while aggregate utilit y remains acceptably high. 4.5 A reference implemen tation T o illustrate how the framework translates into a w orking system, w e describ e the implemen tation dev elop ed b y Kyvvu B.V., do cumentation for which is a v ailable at https://docs.kyvvu.com (but the platform is in activ e developmen t). W e describe it not as a definitiv e realization of the framew ork but as a concrete instantiation that makes the tradeoffs visible. Kyvvu’s Policy Engine op erates in prospective mo de: agents call the engine b efore eac h step, passing the prop osed action and receiving a go vernance decision b efore proceeding. The engine 13 in tegrates with LangChain and LangGraph agent frameworks, and with Microsoft Copilot Studio agen ts co v ering a large section of current en terprise agen t deploymen ts. The registration phase ev aluates agent-lev el p olicies at task start, rejecting agents that fail do cumen tation, in tegrity , or sc heduling c hecks b efore an y step executes. The p er-step phase main tains a go vernance state v ector p er task and ev aluates the active policy set at eac h proposed action, pro ducing one of the three interv en tion outcomes describ ed in Section 4.4. Currently , p olicy scores are treated as sev erity indicators rather than calibrated probabilities—the outputs of π j reflect the seriousness of the concern but hav e not been empirically calibrated against actual violation rates, whic h requires op erational data at scale that is not yet av ailable. The shared go vernance state Σ is maintained per-organization with even tual consistency across concurrent agen t executions, whic h is sufficien t for information barrier p olicies under normal operating conditions but ma y miss violations in high-concurrency edge cases. What the implemen tation do es address, beyond most current alternativ es, is the combination of prosp ective enforcement, path-lev el policy ev aluation, and organizational scop e: p olicies con- dition on the full gov ernance state vector rather than just the prop osed action type, the engine observ es all agen ts in the organization enabling information barrier enforcement, and the audit trail records the complete state tuple alongside p olicy scores and decisions at eac h step. Whether this is sufficient for any giv en regulatory context depends on the sp ecific requirements and the calibration of δ against the organization’s risk budget B . 4.6 Challenges Sev eral c hallenges arise in any implemen tation of the framew ork and are worth stating explicitly , b oth as guidance for practitioners and as a record of what the framew ork does not resolv e: 1. F ail-close d b ehavior. The P olicy Engine is a single p oin t of failure in the gov ernance arc hitecture. If it b ecomes unav ailable, the organization must choose b etw een blo cking all agen t execution (fail-closed) or allo wing agen ts to pro ceed ungo verned (fail-op en). F ail- closed is the only acceptable default for an y deplo yment where go vernance guarantees are a regulatory requiremen t. Pro duction deplo ymen ts require redundancy , health monitoring, and timeouts that default to Block . 2. Shar e d state c onsistency. The go vernance state Σ is up dated concurren tly by multiple agen ts and m ust b e read consistently during p olicy ev aluation. F or most policies, even tual consistency is sufficien t. F or the most sensitiv e barrier p olicies, strict consistency ma y b e required, in tro ducing co ordination ov erhead. The state itself is small—a set of tags and coun ters, not full path records—so the consistency problem is manageable, but it m ust b e designed for explicitly . 3. Audit tr ail privacy. The audit trail con tains the sensitive data it aims to protect. Organi- zations m ust treat it as a regulated asset: access con trols limiting who can read ra w step inputs and outputs, retention p olicies aligned with regulatory requirements, and p oten- tially a t wo-tier structure that separates go vernance metadata (alwa ys retained, broadly accessible for audit) from step conten t (retained under stricter con trols, accessible only for inciden t in v estigation). 4. Dele gation pr ovenanc e. When a primary agent delegates to a sub-agen t via a comp osite step, the sub-agent’s execution pro duces its o wn path P ′ . What go v ernance informa- tion from P ′ should propagate back in to the primary agent’s state vector—and at what gran ularity—is an unsolved design problem that we discuss further in Section 6.1. 14 5 Implications for the EU AI A ct The developmen t of this framework w as motiv ated in part b y the gov ernance requirements of the EU AI A ct [European P arliamen t and Council of the European Union, 2024], whose obligations for high-risk AI systems tak e effect in August 2026. This section reflects on what the framew ork implies for organizations deploying agents under the Act. Applying its requirements to agen ts requires in terpretation, and that interpretation is still being w orked out b y regulators and practi- tioners. What the framework provid es is mac hinery: a precise structure for recording, ev aluating, and enforcing go vernance decisions that can supp ort whatever interpretation ultimately prev ails. W e list the most consequen tial articles: 1. Risk management thr oughout the life cycle (Article 9). The A ct requires providers of high- risk AI systems to establish, implemen t, and maintain a risk management system through- out the en tire lifecycle of the system. F or agen ts, lifecycle risk management translates directly to the gov ernance ob jectiv e of Section 3.3: the P olicy Engine’s contin uous ev al- uation of v T against the organizational budget B is a run time instantiation of lifecycle risk managemen t. The registration phase addresses deploymen t-time risk (documentation, in tegrity , scheduling); the per-step phase addresses operational risk; agent profiling closes the lo op b y feeding run time observ ations back into design-time decisions. The Act’s re- quiremen t that the risk management system b e do cumented and rep eatable is satisfied b y the deterministic p olicy functions and the v ersioned audit trail: given the trail, any risk assessmen t decision can b e repro duced. 2. Automatic lo gging (A rticle 12). The A ct requires that high-risk AI systems be designed to automatically generate logs of their op eration, to the extent this is tec hnically feasible. The audit log describ ed in Section 4.2 directly addresses this requiremen t: ev ery step, p olicy ev aluation, score, decision, and outcome is recorded. Crucially , the log records not just what the agen t did but what the go vernance system decided ab out it, including the p olicy v ersion in force at the time. This constitutes activ e ov ersight documentation rather than passiv e execution recording. The priv acy tension noted in Section 4.6—the log con tains the data it aims to protect—is itself a compliance concern under the A ct’s data protection requiremen ts and m ust be addressed in an y compliant implementation. 3. Human oversight (Article 14). The Act requires that high-risk AI systems be designed so that natural p ersons can effectiv ely o versee them during use and in tervene when necessary . A p ossible in terven tion (action) that pauses execution for h uman approv al is a direct im- plemen tation of this requirement: for p olicies that require h uman appro v al b efore high-risk actions, the Policy Engine pauses execution, surfaces the relev ant path context to a re- sp onsible person, and a waits their decision b efore proceeding. The o verride mec hanism—a h uman appro ving or rejecting a proposed action, with the decision logged and the path up- dated accordingly—is precisely the “meaningful human o v ersight” the A ct envisions. The practical cav eat of Section 2.3 applies here too: h uman o v ersight is only meaningful if the go vernance lay er that decides when to inv oke it is well-calibrated. Routing every agen t action to a human review er satisfies the letter of Article 14 while defeating its purp ose. 4. T r ansp ar ency and do cumentation (Articles 13 and 16). The Act requires that high-risk AI systems b e sufficien tly transparent and that pro viders main tain technical documentation. The p olicies requiring do cumented purpose, risk classification, and ownership as precondi- tions for agen t registration address Articles 13 and 16 directly: an agen t that cannot pass the do cumen tation p olicy cannot run. The audit log provides the technical do cumenta- tion of op erational b eha vior. T ogether, these implement a do cumentation regime that is enforced rather than merely required. 15 5. A c cur acy, r obustness, and cyb erse curity (Article 15). The A ct requires that high-risk AI systems achiev e appropriate lev els of accuracy and robustness, including resilience against attempts to alter their b ehavior through adv ersarial manipulation. The prompt injection scenario of Section 2.2 is precisely such an attempt: retrieved conten t that manipulates the agent into violating a p olicy . The PI I predecessor requirement p olicy addresses this b y requiring a classification step b efore an y action on p ersonal data, regardless of how the agent’s con text w as constructed. More broadly , the run time gov ernance la yer pro vides robustness against a class of adversarial inputs that design-time measures cannot anticipate, b ecause those inputs arriv e at runtime via retriev ed con text. The Act’s authors lik ely did not an ticipate that the compliance infrastructure itself would b ecome an attac k surface— a compromised P olicy Engine could appro ve prohibited actions—but this is a gen uine robustness concern for any pro duction deploymen t. The Act’s requirements map naturally onto the framew ork’s comp onen ts, whic h is not coin- ciden tal: the framew ork was dev elop ed with these requiremen ts in view. What the framew ork cannot provide is the calibration that makes compliance claims credible: the translation of B in to a sp ecific risk budget, the empirical v alidation of π j outputs as genuine probabilities, and the demonstration that δ is set appropriately for the organization’s risk profile. These require op erational data and regulatory engagement that go beyond what an y framework can specify in adv ance. 6 Discussion This paper has argued that the gov ernance of AI agen t fleets reduces to a single ob ject: a policy function π j ( A, P i , s ∗ , Σ) → [0 , 1] ev aluated b efore eac h prop osed action. The argumen t has three parts. The first is diagnostic. Agents are different from prior automated systems in wa ys that matter for go vernance: their behavior is non-deterministic, their to ol use is decided at run time, their paths v ary in length, they can mo dify themselves, and they interact with other agents in w ays that create organizational compliance constrain ts no individual agen t can resolve. These prop erties mean that the violations organizations care about—data exfiltration, information barrier breac hes, unauthorized external communication—are prop erties of sequences of actions, not of individual actions in isolation. Existing mechanisms (prompting, access control, guardrails, con tent filtering, human appro v al) are each v aluable and each insufficien t for the same reason: none can express or enforce a p olicy whose violation condition dep ends on prior steps. The second is formal. W e defined the execution path as the sequence of stochastic, determin- istic, and comp osite steps an agent tak es on a task, and the p olicy function as a deterministic map from path, proposed action, agen t iden tit y , and shared go vernance state to a violation probabilit y . W e show ed that prompting is not an instance of this function at all (it mo difies path distributions without ev aluating them), and that access control is a degenerate instance (it uses only agent iden tit y and action t yp e, ignoring path and con text). Runtime ev aluation is the general case. W e in tro duced the Policy Engine as the organizational component that ev aluates p olicies across the fleet, main tains shared gov ernance state, and calibrates δ so that E [ P v T ] ≤ B as the go vernance ob jectiv e. Per-step ev aluation is the mec hanism; fleet-level risk management is the ob jectiv e. The third is practical. W e sho wed ho w the abstract framew ork instantiates in to concrete p olicies, a tw o-phase architecture (registration and per-step), a reference implemen tation, and a mapping to EU AI Act requirements. The framework is not a complete system; it is a precise sp ecification of what a complete system m ust do, general enough to survive c hanges in the underlying technology and concrete enough to guide implementation to da y . 16 6.1 Op en problems The framew ork iden tifies several problems it does not solve, eac h of whic h is a substan tiv e research and engineering c hallenge. • Risk c alibr ation. The p olicy function π j outputs a v alue in [0 , 1] interpreted as a violation probabilit y . F or this interpretation to support meaningful fleet-lev el gov ernance, the out- puts must be calibrated: a p olicy that returns 0 . 7 should corresp ond to a genuinely 70% lik ely violation, not an arbitrary sev erity score. Curren t implementations often treat p ol- icy outputs as sev erit y indicators rather than calibrated probabilities. Calibrating π j from op erational data requires lab eled execution traces—paths annotated with ground-truth vi- olation judgmen ts—whic h are not y et av ailable at the scale needed. Deriving appropriate thresholds for δ from an organizational risk budget B faces the same data requirement. This is the central op en problem: without calibration, the probabilistic framework is a useful approximation, not a rigorous guarantee. • Str ate gic cir cumvention. The framew ork assumes that agents are not mo deling the gov er- nance la yer and planning paths to stay under thresholds while achieving prohibited out- comes. F or curren t-generation agents, this is a reasonable assumption: they hav e no p er- sisten t self-mo del across tasks and no incen tive structure that would lead them to strate- gically circumv en t gov ernance. But it is a fragile assumption. An agen t with long-term memory , access to its own policy scores, and a task ob jective that conflicts with go vernance constrain ts could, in principle, learn to produce paths that individually score lo w but col- lectiv ely ac hieve a prohibited outcome. Run time go vernance as describ ed here catches inciden tal violations; it cannot anticipate strategy . A promising direction is m ulti-step lo ok ahead: extending π j to score not only s ∗ but a simulated contin uation of the path, ev aluating the pro jected tra jectory b efore p ermitting the curren t step. Whether this can b e made robust without reintroducing mo del-based sto chasticit y into the p olicy function is an open question. • Completeness of enfor c e d c omplianc e. Prosp ective in terception is enforced under the as- sumption that the agent op erates within the go verned execution en vironment. Agen ts with co de execution capabilities can violate this assumption b y spa wning processes, making ra w API calls, or mo difying their o wn execution con text in w ays the P olicy Engine do es not observ e. Enforced compliance is therefore an arc hitectural constrain t on agen t capabilities, not a pro v en in v ariant of the go v ernance system. F or high-risk deplo ymen ts, limiting agent access to code execution—or sandb o xing it within the go verned environmen t—is a practi- cal requirement that must be addressed at the infrastructure lev el, not the p olicy lev el. A related gran ularity concern arises when a stochastic step generates executable code that is then run as a deterministic step: the step b oundary as defined in Section 3.1 may be to o coarse to detect violations buried inside the generated script. This is a further instance of the completeness problem, resolv ed by treating co de execution either as a composite step sub ject to its own gov ernance, or via sub-step instrumen tation. • Behavior al drift. Rath [2026] demonstrate measurable degradation in m ulti-agent systems o ver extended interactions: agen ts whose b eha vior gradually shifts from their initial con- figuration in wa ys that individually app ear within tolerance but cum ulatively represent significan t deviation. P er-step ev aluation with fixed p olicies may not detect drift if eac h individual step scores b elow interv ention thresholds. Agen t profiling—tracking the distribu- tion of v T across repeated executions—is the natural detection mec hanism, but it requires a baseline to compare against and a criterion for when drift has b ecome go vernance-relev ant. Connecting p er-step ev aluation to long-run b ehavioral monitoring is an op en design prob- lem. 17 • Dele gation pr ovenanc e. When a primary agent delegates to a sub-agent via a comp osite step, the sub-agent pro duces its o wn path P ′ under its own gov ernance ev aluation. What should propagate from P ′ bac k in to the primary agent’s state vector is not ob vious. At minim um, the maxim um sensitivit y level and an y barrier tags activ ated during P ′ should propagate, so that the primary agen t’s subsequen t policies ha ve access to relev ant context. Whether v ( sub ) T should con tribute to the primary agent’s v T , and ho w to account for sub- agen t violations in the fleet-level budget B , depends on how the organization wan ts to attribute responsibility across agen t hierarc hies. This is both a tec hnical and a go vernance question without a settled answ er. • Gener ate d-c o de c ompleteness. The step taxonom y treats a deterministic step as an atomic triple ( τ , d in , d out ) . When the output of a sto chastic step is a script executed as a single deterministic step, violations inside the generated co de are in visible to the policy function unless the execution environmen t decomp oses the script into sub-steps. This is a sp ecific instance of the completeness problem: gov ernance at step granularit y cannot detect vio- lations at sub-step gran ularity . Sandb o xed execution environmen ts that exp ose individual system calls as go verned steps are the natural mitigation, but their in tegration with the P olicy Engine is an op en engineering problem. • Fle et prioritization under budget exhaustion. The ob jective constrains E [violations( t )] ≤ B but do es not sp ecify which tasks to terminate when the budget is reac hed. An y priori- tization rule—by task v alue, agen t risk class, or arriv al order—is a p olicy choice outside the framework. Organizations deploying at scale will need an explicit scheduling p olicy for this case; the framework pro vides the constraint and the monitoring signal, not the optimization logic. • Policy inter action at sc ale. The composition rule v i = 1 − Q j (1 − π j ) means that adding p olicies increases v i ev en if each individual p olicy has low scores. In a large p olicy set, the aggregate step-lev el violation probability can b e driv en high by many individually lo w- scoring p olicies simultaneously firing on the same action, p otentially blocking actions that no individual policy would blo ck on its own. T esting policies for in teraction effects requires sim ulation o ver realistic path distributions, whic h is expensive and requires representativ e traces. Organizations deploying large p olicy sets will need to oling to detect and manage p olicy interactions that does not y et exist. 6.2 Next steps The most immediate need is empirical v alidation: deployi ng the framew ork in pro duction en viron- men ts, measuring p er-step ov erhead, and b eginning the pro cess of calibrating π j outputs against ground-truth violation judgmen ts. Without op erational data, the probabilistic framew ork rests on un tested assumptions. Calibration requires not only lab eled traces but agreement on what constitutes a violation—whic h for many p olicies is a regulatory and organizational question as m uch as a tec hnical one. A natural formal extension is to connect the fleet-lev el framework to p er-agent guaran tees of the kind dev elop ed b y Bhardw a j [2026]. Their ABC provide probabilistic compliance guarantees for individual agents under sp ecific b eha vioral assumptions; the fleet-level framew ork provides the organizational context within whic h those guaran tees m ust hold. Com bining the tw o w ould yield a framework with p er-agent formal guarantees and fleet-lev el risk management—a more complete gov ernance architecture than either pro vides alone. Finally , the framew ork as presen ted go verns agents within a single organization. Many en- terprise deploymen ts inv olve agents that span organizational b oundaries—supply chain agents, in ter-organizational automation, third-part y agent services integrated into in ternal w orkflows. 18 Extending the shared go vernance state Σ and the fleet-lev el ob jective to multi-organization set- tings raises questions of trust, liability , and information sharing that the current framework do es not address. This is a longer-term research direction, but one with significant practical imp ortance as agen t deplo yment matures. References ArbiterOS Con tributors. ArbiterOS: The LLM as a probabilistic CPU. https://arbiter- os. com , 2025. White pap er. V arun Pratap Bhardw a j. Agent b ehavioral con tracts: F ormal specification and runtime enforce- men t for reliable autonomous AI agents, 2026. First submitted F ebruary 2026. P eter Cihon, Markus Anderljung, and Allan Dafo e. Measuring AI autonom y levels from co de, 2025. Cloud Securit y Alliance. MAESTR O: Multi-agent environmen t, safet y , and threat reference ov erview. https://cloudsecurityalliance.org/research/working- groups/ artificial- intelligence , 2025. Zehang Deng, Y ong jian Guo, Changc hang Liu, W enbo Guo, Bo Li, and Da wn Song. AI agents under threat: A survey of key security c hallenges and future pathw ays, 2024. Europ ean Parliamen t and Council of the Europ ean Union. Regulation (EU) 2024/1689 laying do wn harmonised rules on artificial intelligence (artificial intelligence act). T echnical rep ort, Official Journal of the European Union, 2024. OJ L, 2024/1689, 12.7.2024. Suy ash Gaura v, Jukk a Heikk onen, and Jatin Chaudhary . Gov ernance-as-a-service: A multi-agen t framew ork for AI system compliance and p olicy enforcemen t, 2025. First submitted August 2025. Kai Greshak e, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario F ritz. Not what you’v e signed up for: Compromising real-world LLM-in tegrated applications with indirect prompt injection, 2023. Dmitrii Ivliev and Daniil F urman. Systemic risk from self-impro ving agents, 2025. KPMG LLP. AI at scale: How 2025 set the stage for agen t-driv en en terprise reinv en tion in 2026. Q4 AI quarterly pulse survey . https://kpmg.com/us/en/media/news/q4- ai- pulse.html , Jan uary 2026. Surv ey of 130 U.S.-based C-suite leaders from organisations with annual rev enue $1B+. Published 15 January 2026. Md Motaleb Hossen Manik and Ge W ang. Op enClaw agents on Moltb o ok: Risky instruction sharing and norm enforcement in an agen t-only social net work, 2026. Cited in pap er for self-mo dification risk; first submitted F ebruary 2026. O W ASP F oundation. O W ASP top 10 for LLM applications and generativ e AI: Agentic securit y risks. https://owasp.org/ www- project- top- 10- for- large- language- model- applications/ , 2025. Fábio Perez and Ian Rib eiro. Ignore previous prompt: Attac k tec hniques for language mo dels, 2022. Abhishek Rath. Agent drift: Quantifying b ehavioral degradation in multi-agen t LLM systems o ver extended interactions, 2026. First submitted January 2026. 19 Charles L. W ang, T risha Singhal, Ameya Kelk ar, and Jason T uo. MI9: An in tegrated runtime go vernance framew ork for agentic AI, 2025. First submitted August 2025. Hao yu W ang, Christopher M. P oskitt, and Jun Sun. AgentSpec: Customizable run time enforce- men t for safe and reliable LLM agents. In Pr o c e e dings of the 48th International Confer enc e on Softwar e Engine ering (ICSE 2026) , Rio de Janeiro, Brazil, 2026. Lei W ang, Chen Ma, Xuey ang F eng, Zeyu Zhang, Hao Y an, Jingsen Zhang, Zhiyuan Chen, Jiak ai T ang, Xu Chen, Y ank ai Lin, W a yne Xin Zhao, Zhew ei W ei, and Ji-Rong W en. A survey on large language model based autonomous agen ts, 2024. Zhiheng Xi, W enxiang Chen, Xin Guo, W ei He, Yiw en Ding, Bo yang Hong, Ming Zhang, Junzhe W ang, Senjie Jin, En yu Zhou, Rui Zheng, Xiaoran F an, Xiao W ang, Limao Xiong, Y uhao Zhou, W eiran W ang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang W eng, W ensen Cheng, Qi Zhang, W enjuan Qin, Y ongyan Zheng, Xip eng Qiu, Xuanjing Huang, and T ao Gui. The rise and p otential of large language mo del based agents: A survey , 2023. Sh unyu Y ao, Jeffrey Zhao, Dian Y u, Nan Du, Izhak Shafran, Karthik Narasimhan, and Y uan Cao. ReAct: Synergizing reasoning and acting in language models, 2023. 20

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment