Token Coherence: Adapting MESI Cache Protocols to Minimize Synchronization Overhead in Multi-Agent LLM Systems

Multi-agent LLM orchestration incurs synchronization costs scaling as O(n x S x |D|) in agents, steps, and artifact size under naive broadcast -- a regime I term broadcast-induced triply-multiplicative overhead. I argue this pathology is a structural…

Authors: Vladyslav Parakhin

T oken Coher ence: Adapting MESI Cache Pr otocols to Minimize Synchr onization Overhead in Multi-Agent LLM Systems Vladyslav P arakhin Independent Resear cher Subjects: Distributed, P arallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Machine Learning (cs.LG) Abstract Per-tok en economics in multi-agent lar ge language model orchestration are, at present, governed by a synchronization pathology that scales as O ( n × S × | D | ) in agents, steps, and artifact size—a regime I designate br oadcast-induced triply-multiplicative overhead . I contend this pathology does not inhere in multi-agent coordination per se; it is a structural residue of full-state rebroadcast, a design decision absorbed uncritically from early orchestration scaf folding. The central claim of this manuscript: the synchronization cost e xplosion in LLM-based multi-agent systems (MAS) maps, with formal precision, onto the cache coherence problem in shared-memory multiprocessors, and the canonical hardware remedy—MESI-protocol in validation [P apamarcos and P atel 1984]—transfers to the artifact synchronization domain under minimal structural modification. I construct the Artifact Coherence System (A CS), a six-tuple ⟨ A , D , Σ , δ , α , T ⟩ endowed with an identity state-mapping function ϕ from hardware MESI states onto artifact authorization states. The T oken Coher ence Theorem delineates a savings lo wer bound: lazy artif act inv alidation attenuates synchronization cost by a f actor no less than S / ( n + W ( d i )) subject to S > n + W ( d i ) , where S is the step count, n the agent population, and W ( d i ) the per-artif act write frequenc y . I sequester the principal revie wer objection— that LLM agents in variably embed their full conte xt and therefore cannot benefit from coherence—by formally delineating conditional artifact access semantics as instantiated in production architectures through tool calls, MCP resources, v ector stores, and file search APIs. A TLA+-verified protocol (CCS v0.1) enforces three in variants: single-writer safety (SWMR), monotonic artifact versioning, and bounded-staleness (agents cannot reason on stale artifact state be yond K steps). Through tick-based discrete ev ent simulation across four workload configurations (10 runs per configuration, population standard deviation reported), comparing broadcast synchronization against three coherence strategies (eager , lazy , access-count), observed token sa vings reach 95 . 0% ± 1 . 3% at V = 0 . 05 , 92 . 3% ± 1 . 4% at V = 0 . 10 , 88 . 3% ± 1 . 5% at V = 0 . 25 , and 84 . 2% ± 1 . 3% at V = 0 . 50 —each e xceeding the theorem’ s conservati ve lo wer bounds of 85%, 80%, 65%, and 40% respectiv ely . Contrary to the lo wer-bound formula’ s prediction ( V ∗ ≈ 0 . 9 for n = 4 , S = 40 marking the savings collapse threshold), simulation indicates savings of approximately 81% persist at V = 0 . 9 because lazy deferred-fetch accumulation obviates worst-case collapse from materializing (§8.3). Contributions: (1) a formal mapping between MESI cache coherence and multi-agent artifact synchronization; (2) the T oken Coherence Theorem as savings lo wer bound with the condition under which coherence dominates broadcast; (3) a TLA+- verified protocol with three pro ven inv ariants; (4) a characterization of conditional artifact access 1 semantics that resolves the alw ays-read objection; (5) a reference Python implementation inte grating with LangGraph, CrewAI, and AutoGen via thin adapter layers. 1 Intr oduction Fi ve agents, fifty reasoning steps, one 8,192-token planning document. Under nai ve broadcast, the cost is 5 × 50 × 8 , 192 = 2 , 048 , 000 tokens—and the v ast majority of that budget is unchanged conte xt, retransmitted without necessity . The waste is banal, not exotic. It is the default behavior—I hav e verified this by instrumenting e very major orchestration frame work I could obtain access to—that upon modification of any shared artifact, the orchestrator rebroadcasts its full contents to every subscribing agent at the ne xt synchronization boundary . At modest scale, tolerable. At production scale—and I should be precise about what I mean by “production” here, meaning n ≥ 5 agents sustaining S ≥ 40 reasoning steps ov er multiple artifacts simultaneously—the cost structure becomes, without exaggeration, ruinous. Beyond the in voice, the damage compounds in subtler and arguably more corrosiv e ways. T oken budget constraints coerce practitioners into truncating reasoning traces, compressing artifacts, culling agent populations—each a degradation that trades capability for a cost reduction that is, under certain structural conditions I formalize below , entirely av oidable. Building on [4], Cemri et al. ’ s analysis of 1,642 ex ecution traces spanning se ven production MAS frame works reports task failure rates between 41% and 86.7%. Inter-agent misalignment—including what they designate FM-1.4, Loss of Con versation History , where agents revert to stale artifact states follo wing context truncation—accounts for 32.3% of observed failures. The y note explicitly that multi-agent memory and state management persists as an open structural problem, one that most prior work sidesteps by addressing single-agent contexts only [4, Appendix G.2]. Independently—and this is a datum I consider underappreciated in the current discourse—empirical benchmarking across multi-agent frame works reports token duplication rates of 86% in flat topologies and 72% in linear topologies [28], confi rming that redundant artifact retransmission, not generation, constitutes the dominant cost dri ver . Hardware engineers confronted an isomorphic problem four decades ago. I want to be careful with “isomorphic”—the word does nontrivial load-bearing w ork, and I will qualify the claim in §4.2 with a discussion of where the stable-state mapping breaks do wn at the transient-state boundary . The short version: in shared-memory multiprocessors, multiple CPU cores sharing a memory b us incur catastrophic bandwidth costs when e very write forces full memory retransmission to all caches. The remedy—cache coherence protocols, MESI [19] being canonical—tracks per-cache-line state and propagates in v alidation signals rather than data. A cache line transitions from Shared to In valid upon a remote write; subsequent reads trigger a targeted fetch. Bandwidth cost becomes proportional to write frequency , not to step count. The structural parallel maps cleanly: agents are processors, artifacts are cache lines, the orchestration coordinator is the memory controller, prompt injection of an artifact is a cache fill. I submit that this analogy admits a formal state-mapping function between MESI states and artifact coherence states, and that hardware-deriv ed cost bounds transfer—with cav eats I enumerate—into the agent coordination domain. Before formalizing this claim, I must confront what I term the always-r ead objection . The objection runs as follows: LLM agents consume their entire context window at e very inference call, so lazy in valida- tion is useless because the artifact must be injected regardless. This objection is descriptively incorrect for 2 modern production architectures—a point I delineate at length in §3. T ool-based retriev al, MCP resource access [1], vector store retrie val, and provider-side prompt caching all instantiate conditional artifact access semantics in which R ( a , s ) ⊊ D . The always-read model describes naiv e prompt concatenation only . It does not describe the externalized architectures that dominate production deployments. Whether it describes a majority of current deployments, I cannot certify without broader instrumentation data—b ut the architectural trend is unmistakable, and the formal argument stands contingent on conditional access holding. Contribution 1 — Formal equi valence. I define the Artifact Coherence System (A CS) as a six-tuple ⟨ A , D , Σ , δ , α , T ⟩ and construct an explicit MESI state-mapping function ϕ from hardware cache states to artifact authorization states, establishing that the structural properties of MESI transfer intact to the artifact domain (§4). Contribution 2 — T oken Coherence Theorem. I prove that lazy artifact inv alidation attenuates multi-agent token cost by a factor bounded belo w by S / ( n + W ( d i )) when S > n + W ( d i ) , establishing that broadcast cost grows as O ( n × S × | D | ) while coherent cost grows as O (( n + W ) × | D | ) at worst. Simulation confirms consistently higher savings than this lo wer bound (§8). The condition under which coherence dominates is precisely characterized by the artifact volatility factor V ( d i ) = W ( d i ) / S (§4.3– 4.5). Contribution 3 — TLA+-verified protocol. I specify Coherent Conte xt Synchronization (CCS) in TLA+ and verify three in v ariants: SWMR, monotonic versioning, and K -bounded staleness. I report the state space explored (approximately 2,400 states for 3 agents) and construct an e xplicit counterexample demonstrating that removing in v alidation violates SWMR (§5–6). Contribution 4 — Conditional access semantics. I formally characterize real agent access patterns as R ( a , s ) ⊆ D rather than R ( a , s ) = D , grounding this in four production architecture patterns and identifying the token duplication phenomenon that coherence eliminates (§3). Contribution 5 — Reproducible implementation. I present a reference Python implementation ( agent-coherence v0.1) integrating with LangGraph, CrewAI, and AutoGen through adapter layers requiring no framew ork modifications, with a simulation engine supporting four synchronization strategies and full reproducibility from published seeds (§7, §8). 2 Backgr ound 2.1 Cache Coherence and the MESI Pr otocol Per Sorin, Hill, and W ood [22], cache coherence requires that all reads to a memory location return the v alue of the most recent write and that writes to the same location are serialized across processors. Under MESI [19], four stable states per cache line per processor obtain: Modified (v alid only in this cache; memory is stale), Exclusive (v alid only in this cache; identical to memory), Shared (v alid here and possibly else where; no writes since last commit), Inv alid (not valid; coherence fill required before use). Between stable states, transient states model in-flight operations [22, Ch.6]—e.g., M I S denotes a line that was Modified, is transitioning to In v alid, and aw aits acknowledgment. These transient states bear on protocol correctness. I elide them in the first-order model that follows—a deliberate simplification constraining the analysis to quiescent-state reasoning and foreclosing claims about transient-state beha vior in A CS. The limitation is real; I will revisit it in §4.2 when the question of what “structural isomorphism” 3 means across asynchronous e vent-bus semantics becomes non-tri vially relev ant. The core ef ficiency property of MESI: update-on-demand . T ransitions from In valid to Shared or Exclusi ve are triggered only by actual reads, not by writes elsewhere. Broadcast-on-write bandwidth con verts to tar geted-fetch-on-read bandwidth. When write frequency W is low relati ve to read frequency R , savings are substantial. When W ≈ R , protocol o verhead approaches the broadcast baseline. The mapping from CPU memory hierarchies to multi-agent systems is summarized in T able 0. CCS v0.1 targets the L1/L2 ↔ LLC tier (agent runtime cache ↔ shared artifact store); persistent storage and cross-workflo w artifact retention are explicitly out of scope. T able 0: Agent Memory Hierar chy Analogy CPU Memory Hierarchy Agent Memory Hierarchy Coherence Mechanism L1/L2 cache (per-core) Agent artifact cache (per -agent runtime) MESI state per agent–artifact pair Shared LLC / main memory Shared artifact store (authority service) Canonical version + in v alidation e vents Memory bus / coherence f abric Event b us (Redis / Kafka / N A TS) In validation and v ersion-update messages Disk / persistent storage Long-term persistence (vector DB, file store) Out of scope for CCS v0.1 2.2 Multi-Agent LLM Orchestration Multi-agent LLM systems [24; 26; 23] coordinate multiple language model instances on shared tasks. LangGraph [21], CrewAI [5], AutoGen [24], Semantic Kernel [15]—each represents agents as nodes in a computation graph, passing state via structured messages or shared memory constructs. The synchronization pattern is uniform, and I state this without observ ed exception across e very frame work I have instrumented: full-state r ebr oadcast . Upon artifact modification by any agent, the orchestrator injects the complete updated artifact into the next prompt of e very agent that might need it. Consistency is purchased at the cost described abov e. Building on [4], the MAST taxonomy identifies 14 failure modes across 1,642 annotated ex ecution traces. Failure rates: 41%–86.7%. Dominant categories: system design issues (44.2%), inter-agent misalignment (32.3%). FM-1.4—Loss of Con versation History—describes agents rev erting to earlier artifact states follo wing unexpected context truncation, at 2.8% occurrence across all traces. The bounded- staleness in variant of CCS (§6) instantiates a formal upper bound on the number of reasoning steps any agent can operate on stale artifact state—a structural, not heuristic, constraint on exactly the failure class Cemri et al. identify . 2.3 The Always-Read Objection Before the formal model, I confront the objection that will—and should—be raised by revie wers con ver - sant with transformer inference mechanics. The objection: LLM agents ingest their entir e conte xt window at every forwar d pass. Lazy in validation cannot pr event artifact injection. A token in the pr ompt is a token consumed. 4 Correct on one point, this objection: a token in the prompt is indeed a token consumed. What it misdescribes is the prompt construction pr ocess —and here I remain cautious (the div ersity of production deployment architectures is wider than any single author can claim to ha ve instrumented exhaustiv ely), b ut I am confident enough to state categorically: modern agents do not assemble context from artifacts embed- ded inline. They assemble conte xt from refer ences to externally stored artif acts, retrieving conditionally via tool calls, MCP resource requests, or retriev al APIs. I formalize this in §3. The al ways-read model applies to nai ve single-turn prompt construction exclusi vely . Whether it still describes a non-negligible fraction of deployed systems—yes, almost certainly . Whether it describes the architectures consuming the majority of the multi-agent token budget—no. The distinction matters for the applicability claim, and I draw it deliberately . 3 Conditional Artifact Access Semantics 3.1 The Naive Access Model Let A = { a 1 , . . . , a n } denote a set of agents and D = { d 1 , . . . , d m } a set of shared artifacts. Denote by R ( a , s ) the set of artifacts whose full tok en contents are injected into the prompt of agent a at reasoning step s . Under the nai ve broadcast model: R ( a , s ) = D ∀ a ∈ A , ∀ s Per-step token cost under this assumption: ∑ i | d i | per agent. T otal cost: n × S × ∑ i | d i | . Lazy in validation is trivially useless under this regime—ev en a cached-and-v alid artifact must be injected to remain accessible. MESI sa vings: zero. 3.2 Artifact Externalization in Pr oduction Systems The assertion R ( a , s ) = D is not the access model instantiated by modern production agent architectures. Four patterns bifurcate the nai ve model, each yielding R ( a , s ) ⊊ D : T ool-based r etrieval. LangChain and OpenAI Assistants expose retriev al tools— get_document() , read_file() , query_memory() . System prompts carry artif act identifiers , not artifact contents . T ool in vocation is conditional; tool calls that nev er fire nev er inject tokens. R ( a , s ) ⊆ D with proper subset holding whene ver not all tools are in vok ed. The mechanism is straightforward; the implication for coherence economics is less so, because the decision to in voke a tool is itself a stochastic function of the agent’ s reasoning trace—a dependency I ha ve not modeled formally and ackno wledge as a simplification. MCP resource access. Under the Model Context Protocol [1], artifacts are external resources identified by URIs ( resource://shared_plan , resource://research_notes ). Injection occurs only upon explicit request. The resource reference in the system prompt costs O ( 1 ) tokens, not O ( | d i | ) —a distinction whose significance scales with artifact size. V ector r etrieval systems. Retrie val-augmented architectures store artifacts in v ector stores and inject only top- k retrie ved fragments. The prompt contains min ( k , | d i | ) ≪ | d i | tokens for an y large artifact. Agent exposure to artif act tokens it did not retriev e is structurally precluded. Pro vider-side prompt caching. Anthropic, OpenAI, and Google all implement prompt prefix caching attaining non-tri vial reuse across consecutive agent calls [2; 17]. The caching mechanism operates only 5 when the prompt prefix is identical across calls. W ere artif acts re-embedded with updated content at e very step, cache hit rates would approach zero—but production deplo yments report substantial reuse. This is attainable only when prompt prefixes remain stable, attainable only when artifacts are not re-embedded at e very step. I find this a minor but telling methodological observ ation: the existence of high prompt cache hit rates in production constitutes indirect evidence that real-world systems do not, in fact, operate under the always-read model. The evidence is circumstantial—I cannot rule out that high cache hit rates arise from other prefix-stability mechanisms—but the inference is, within its constraints, sound. The 86% token duplication measured in flat multi-agent topologies [28] is not a contradiction. Systems currently operating under nai ve broadcast semantics do incur this cost. The argument is that the y ought not—and the conditional access patterns above delineate ho w production systems already escape it at scale. 3.3 F ormal Conditional Access Model I define the conditional artifact access model as: R ( a , s ) ⊆ D ∀ a ∈ A , ∀ s with the conditional access condition requiring that for most artifacts and steps: Pr [ d i ∈ R ( a , s )] ≪ 1 Under this model, the rele vant cost is not prompt token consumption per step but artifact injection fr equency —how often d i ’ s full contents must be transmitted to agent a ’ s context. Lazy in v alidation attenuates this frequency: when d i has not been modified since agent a last receiv ed it, no retransmission occurs. The agent holds a v alid local reference; the cached v ersion remains coherent. This restores the conditions under which MESI savings are attainable and renders the formal argument in §4 applicable to deployed systems. 4 F ormal Model 4.1 Artifact Coherence System Definition 1 (Artifact Coherence System). An Artifact Coher ence System (A CS) is a tuple ⟨ A , D , Σ , δ , α , T ⟩ where: • A = { a 1 , . . . , a n } is a finite set of agents; • D = { d 1 , . . . , d m } is a finite set of shared artifacts (analogous to memory locations / cache lines); • Σ = { M , E , S , I } is the set of stable artifact coherence states; • δ : Σ × E → Σ is the state transition function, where E = { read , write , upgrade , fetch , invalidate , commit } ; • α : A × D → Σ is the coherence state function mapping each agent–artifact pair to its current state; • T : Σ → { 0 , 1 } is the v alidity predicate, with T ( I ) = 0 and T ( s ) = 1 for s ∈ { M , E , S } . The v alidity predicate renders the MESI safety in v ariant precise: an agent may reference a cached artifact only when T ( α ( a , d i )) = 1 . State I mandates a coherence fill—a fetch from the authority service—before use. 6 4.2 MESI State Mapping I construct an explicit mapping ϕ from hardware MESI states to artifact coherence states. Definition 2 (MESI State Mapping). The mapping ϕ : Σ hw → Σ is the identity function on the shared state space { M , E , S , I } , with the following semantic interpretations in the artif act domain: Hardware State Artifact State Semantic Interpretation Modified ( M ) Modified ( M ) Agent holds the only v alid copy; authority copy is stale; other agents are in validated Exclusi ve ( E ) Exclusi ve ( E ) Agent holds the only copy , identical to authority; write permitted without broadcast Shared ( S ) Shared ( S ) Multiple agents hold v alid copies; no agent has written since last commit In valid ( I ) In valid ( I ) Agent’ s cached copy is stale; full fetch required before next use Every state transition in the hardw are protocol possesses a direct semantic counterpart in the artifact protocol: “cache fill” maps to “artif act fetch, ” “b us in validation” maps to “in validation e vent” o ver the message bus. Proposition 1 (Structural Equi valence). The A CS transition system ( Σ , δ ) is isomorphic to the MESI hardware transition system under ϕ . Every safety result that holds for MESI under the SWMR in variant holds for A CS under the identical in variant. Pr oof sketch. By construction of ϕ as the identity mapping on { M , E , S , I } , and by the fact that δ reproduces the MESI transition table exactly (read causes I → S , write causes S → M with peer in validation, commit causes M → S , fetch causes I → S ). The SWMR in variant ∀ a  = b : ¬ ( α ( a , d ) = M ∧ α ( b , d ) = M ) is maintained by the identical write-in v alidates-peers rule present in both systems. □ A limitation I w ant to foreground rather than rele gate to §10: the isomorphism holds at the stable-state le vel, but the transient-state beha vior of hardware MESI—which inv olves non-trivial race conditions on the bus fabric—maps only approximately to asynchronous e vent-bus semantics in CCS. I ha ve not formalized the transient-state correspondence. I remain cautious about e xtending the equiv alence claim beyond quiescent states until that formalization is complete. This is not, I suspect, a tri vial gap. The li veness pathologies of snoopy-b us transient interleavings under hardware MESI are well-documented [22, Ch.6], and their analogs in asynchronous message delivery are, at minimum, non-obvious. Whether this gap undermines the practical utility of the mapping—I think not, because the protocol operates at quiescent-state granularity by design—but a re viewer insisting on transient-state completeness would hav e a legitimate objection that I cannot yet dischar ge. 4.3 The Broadcast Cost Baseline Under the nai ve broadcast model ( R ( a , s ) = D ), total token cost for n agents, S steps, and m artifacts: 7 T broadcast = n × S × m ∑ i = 1 | d i | where | d i | denotes token size of artifact d i . The cost gro ws multiplicativ ely—doubling agents, steps, or artifact size each doubles total cost. For a typical configuration ( n = 5 , S = 50 , m = 3 , | d i | = 4 , 096 tokens): T broadcast = 5 × 50 × 3 × 4 , 096 = 3 , 072 , 000 tokens Under conditional access ( R ( a , s ) ⊆ D ), the rele vant per -step cost is not full artifact injection but the coher ence synchr onization cost : tokens transmitted due to initial reads and write-triggered re-fetches. 4.4 The T ok en Coherence Theorem Definition 3 (Coherent Synchronization Cost Upper Bound). Under the A CS model with lazy in validation, total token cost is bounded above by: T coherent ≤ m ∑ i = 1 n ·  n + W ( d i )  × | d i | where W ( d i ) denotes total write operations to artifact d i across all agents and steps. The bound arises because each of n agents performs at most one initial fetch per artif act, and each write e vent can trigger at most n − 1 in validations each followed by one re-fetch. Conservati vely counting the writer’ s o wn fetch, worst-case total fetches per artifact: n ( 1 + W ( d i )) , approximated as n ( n + W ( d i )) for the multi-agent stochastic access model (§8.1). The bound is tight only when every in v alidation immediately triggers a re-fetch—a condition that lazy coherence precludes by collapsing multiple write ev ents into a single re-fetch when agents do not access an inv alidated artifact between consecutiv e writes. The gap between bound and observ ation is, in my experience, substantial—ranging from 10 to 45 percentage points depending on w orkload—and this gap is itself a finding w orth flagging, because it suggests that the analytical bound, while correct, may be too conserv ati ve to serve as a useful planning heuristic for practitioners at high V . Assumptions. T wo modeling assumptions bound the scope of Theorem 1: (A1) Full-artifact transmission on cache miss. Each cache miss triggers transmission of the full artifact d i ( | d i | tokens). Sub-artifact delta fetches are not modeled; this is a conserv ativ e assumption that ov erestimates coherent cost. (A2) Serialized writes via authority . All writes to a gi ven artif act are serialized through the authority service. Concurrent peer-to-peer writes are not modeled; the single-writer in v ariant (§6.2) enforces this structurally . Theorem 1 (T oken Coherence Theorem — Savings Lo wer Bound). (Under Assumptions A1 and A2.) The token savings fr om lazy artifact in validation ar e bounded below by: Savings ≥ 1 − T upper coherent T broadcast = 1 − ∑ i n ( n + W ( d i )) | d i | n × S × ∑ i | d i | The condition under which savings ar e strictly positive is: 8 S > n + W ( d i ) for most artifacts d i F or identical artifact sizes | d i | = | d | , the lower bound simplifies to: Savings ≥ 1 − n + W ( d i ) S Pr oof. Substitute the Definition 3 upper bound for T coherent and the broadcast formula into the savings ratio. F or uniform artifact sizes the | d | and the leading n cancel: n ( n + W ) | d | / nS | d | = ( n + W ) / S . Savings ≥ 1 − ( n + W ) / S . Since T actual coherent ≤ T upper coherent , actual savings meet or exceed this bound. The condition S > n + W ( d i ) ensures positivity . □ Remark. Simulation results in §8 confirm that observed savings consistently e xceed the lower bound, o wing to the lazy collapse mechanism described above. Corollary 1 (Maximum Sa vings). When W ( d i ) = 0 for all i (r ead-only artifacts), the savings lower bound appr oaches 1 − n / S . F or n = 4 , S = 40 : lower bound = 90%; simulation attains ≥ 95% . Corollary 2 (Collapse Condition). When W ( d i ) ≥ S − n (write rate exhausts the step budg et), the lower bound falls to zer o or below; coher ence may pr oduce overhead rather than savings under worst-case conditions. The transformation from broadcast to coherent cost is qualitativ ely significant: T broadcast ∈ O ( n × S × | D | ) while T coherent ∈ O ( n ( n + W ) × | D | ) at worst. A triply-multiplicativ e cost con verts to an additi vely- multiplicati ve one; the S multiplier is eliminated. Simulation confirms actual coherent cost f alls strictly belo w this upper bound. Consistency model. CCS instantiates bounded-staleness coher ence : each agent observes a globally consistent artifact v ersion at ev ery read, subject to the constraint that the observ ed version may lag the canonical version by at most K write operations (In variant 3, §6.2). W eaker than sequential consistency— which mandates that every read observe the most recent write. Stronger than e ventual consistency—which provides no staleness bound. The K parameter renders the staleness budget explicit and configurable, analogous to bounded-staleness consistency le vels in distributed databases. Practitioners requiring strict sequential consistency must set K = 0 , forcing synchronous authority checks on ev ery artifact access and eliminating the token savings from lazy in v alidation—a tradeoff that is, in my estimation, rarely justified gi ven the latency cost. 4.5 Artifact V olatility and the Coherence Condition Definition 4 (Artifact V olatility Factor). The volatility factor of artifact d i : V ( d i ) = W ( d i ) S ∈ [ 0 , 1 ] This captures the fraction of steps triggering a write to the artifact. High v olatility ( V ≈ 1 ): the artifact changes nearly e very step. Lo w volatility ( V ≈ 0): changes are infrequent. Substituting W ( d i ) = V ( d i ) · S into the savings lo wer bound from Theorem 1: Savings ≥ 1 − n + V ( d i ) · S S = 1 − n S − V ( d i ) For typical w orkflow parameters ( n = 4, S = 40, V ( d i ) = 0 . 05), the lo wer bound is: 9 Savings ≥ 1 − ( 0 . 1 + 0 . 05 ) = 85% Simulation attains 95.0% for these parameters (§8), e xceeding the lo wer bound as expected. The coherence condition S > n + W ( d i ) corresponds to V ( d i ) < 1 − n / S . Definition 5 (V olatility Cliff). The volatility cliff is the value V ∗ = 1 − n / S abov e which the coherence savings lo wer bound falls belo w zero and ov erhead dominates. For n = 4 , S = 40 : V ∗ = 0 . 9 . For n = 5 , S = 20 : V ∗ = 0 . 75 . The cliff is lo wer when step count is short or agent count is high—a constraint practitioners ought to internalize when sizing deployments. 5 Pr otocol Specification 5.1 System Assumptions CCS is specified and verified under the follo wing assumptions. Relaxation of each is identified as future work. I want to be explicit—AS3 in particular represents a non-tri vial constraint that may not hold under production conditions where aggressi ve agent-pool recycling is standard practice. (AS1) Reliable authority service. The authority service does not crash or partition during protocol ex ecution. A single logical entity; high-av ailability replication via consensus protocol is possible but falls outside CCS v0.1 scope. (AS2) At-least-once e vent bus deli very . Inv alidation e vents published to the e vent b us are e ventually deli vered to all subscribers at least once. Duplicate deliv eries are idempotent (re-receiving an in v alidation for a version already mark ed Inv alid is a no-op). (AS3) No agent crash while holding M state. An agent holding Modified state does not crash before issuing Commit or releasing o wnership. V iolation causes the authority to hold an orphaned exclusi ve lock; the lease TTL mechanism (§5.2) provides reco very . The TLA+ verification in §6 holds under these assumptions. 5.2 System Architectur e CCS gov erns four interacting entities. The A uthority Service maintains the global artifact directory—a mapping from artifact identifiers to current version numbers, last-writing agent, and per-agent coherence state. Single source of truth for artifact metadata. The Agent Runtime , embedded within each agent, maintains a local artif act cache; each entry stores artif act content, version at time of last fetch, and current MESI state. The Event Bus propagates inv alidation ev ents from authority to agents asynchronously; supported transports: Redis PubSub, Apache Kafka, NA TS, W ebSockets. The Artifact Store holds canonical artifact v ersions and serves fetch requests. T wo communication channels bifurcate the control plane. The Control Channel (Agent → Authority , HTTP/gRPC): read requests, write requests, o wnership acquisition. The Event Channel (Authority → Agents, pub/sub): in v alidation notifications, version updates. 5.3 Protocol Operations Read. Agent a wishing to read artifact d i checks α ( a , d i ) . If T ( α ( a , d i )) = 1 (state ∈ { M , E , S } ), the cached version is consumed directly—zero tokens transmitted. If α ( a , d i ) = I , the agent issues 10 READ_REQUEST to authority , which responds with current content and v ersion, setting α ( a , d i ) ← S . Upgrade. T o write d i while holding α ( a , d i ) = S , agent a must first acquire exclusi ve o wnership via UPGRADE_REQUEST . The authority sets α ( b , d i ) ← I for all b  = a , propagates INVALIDATE e vents ov er the e vent bus, grants α ( a , d i ) ← E . Write. Once α ( a , d i ) = E , the agent writes locally , transitions to α ( a , d i ) = M . Zero tokens broadcast during local writes; authority notification deferred until commit. Commit. Agent sends COMMIT containing ne w content and incremented version. Authority stores the canonical version, sets α ( a , d i ) ← S , broadcasts VERSION_UPDATE to agents not in state I . F etch. Agent with α ( a , d i ) = I needing d i sends FETCH_REQUEST . Authority responds with current content and version. Agent transitions α ( a , d i ) ← S . In validation. Authority sends INVALIDATE(artifact_id, version) ov er ev ent b us on write-upgrade grant. Agents set cache entry to I on receipt. Idempotent; retransmission on reconnect preserves safety . Lease TTL and M-state reco very . Upon granting an Exclusiv e write lock, authority starts a configurable lease timer τ (default: 30s). If COMMIT does not arriv e within τ —the authority treats the lock as orphaned, re verts to last committed v ersion, sets α ( b , d i ) ← I for all agents, releases the exclusi ve grant. Li veness under agent crash: no artifact permanently locked by a crashed o wner . The tradeof f is real: in-progress writes are lost. Agents must re-fetch and re-apply . On the sensiti vity of τ : setting it too aggressi vely introduces a race between legitimate slo w writes and lease expiration. I have observ ed this empirically—benchmarks on throttled cloud instances where API response times exceeded 20 seconds under load surf aced the race condition consistently . It was an unforeseen sensiti vity; the failure mode is that a perfectly v alid write gets re verted because the authority’ s timer fires before the LLM finishes generating. T uning guidance for τ remains, at this writing, heuristic rather than principled. I would like to deri ve a formal relationship between expected inference latenc y , τ , and write-loss probability , but this requires a distributional model of LLM response times that I do not yet possess. A hardw are-induced jitter component from the cloud provider’ s GPU scheduling adds further unpredictability—a nuisance factor I ha ve not been able to sequester cleanly . 5.4 Message Schema Protocol messages conform to a common en velope: { "type": "MESSAGE_TYPE", "timestamp": "ISO8601", "agent_id": "string", "artifact_id": "string", "version": 42, "payload": {} } Artifact metadata on fetch responses: { "artifact_id": "string", "version": 42, 11 "checksum": "sha256:...", "size_tokens": 4096, "last_modified_by": "agent_id" } 5.5 Synchronization Strategies Four synchronization strate gies, pluggable: Eager in validation triggers immediate in v alidation broadcast to all peers the instant a write begins (on upgrade grant, not commit). Staleness window minimized; in v alidation traffic and redundant fetches on abandoned writes are the cost. Lazy in validation (recommended default) triggers in v alidation only on commit, after write comple- tion and new v ersion av ailability . A voids fetches for in-progress writes; batches inv alidation cost to write completion. Lease-based TTL assigns each cache entry a time-to-live; entries e xpire to I on lease expiration regardless of write activity . Simplest strategy—b ut decoupled from write frequency , leaving tokens unrealized in lo w-volatility workloads. Access-count in validation assigns each cache entry a maximum read count; entries transition to I after k uses. Mirrors the ex ecution-count credential model proposed by the OpenID Foundation [18] for authorization, applied here to artifact freshness rather than access control. 6 F ormal V erification 6.1 TLA+ Specification CCS is formally specified in TLA+ (T emporal Logic of Actions) for model checking with TLC. Three agents sharing one artifact—suf ficient, in my assessment, to expose all rele vant concurrenc y scenarios. I concede that increasing to four or fiv e agents may surface additional interlea ving pathologies my current state-space b udget does not cover . The state explosion problem is real; at 3 agents the space is ~2,400 states, and it grows combinatorially . Whether the in variants hold at n = 10 under all interleavings—I belie ve so, by the structural symmetry of the specification, but I hav e not verified it and will not claim it. State variables: VARIABLES artifactVersion, \* Natural number, global canonical version artifactState, \* [Agent -> {M, E, S, I}], per-agent state agentSteps, \* [Agent -> Nat], steps executed since last sync lastSync \* [Agent -> Nat], version at last sync Initial state: All agents hold the artifact in Shared state at v ersion 1. Init == /\ artifactVersion = 1 /\ artifactState = [a \in AGENTS |-> "S"] /\ agentSteps = [a \in AGENTS |-> 0] 12 /\ lastSync = [a \in AGENTS |-> 1] Operations: Read(a) == /\ artifactState[a] # "I" /\ agentSteps’ = [agentSteps EXCEPT ![a] = agentSteps[a] + 1] /\ UNCHANGED <> Write(a) == /\ artifactState[a] \in {"E", "M"} /\ artifactVersion’ = artifactVersion + 1 /\ artifactState’ = [x \in AGENTS |-> IF x = a THEN "M" ELSE "I"] /\ lastSync’ = [lastSync EXCEPT ![a] = artifactVersion’] /\ UNCHANGED agentSteps Fetch(a) == /\ artifactState[a] = "I" /\ artifactState’ = [artifactState EXCEPT ![a] = "S"] /\ lastSync’ = [lastSync EXCEPT ![a] = artifactVersion] /\ UNCHANGED <> Upgrade(a) == /\ artifactState[a] = "S" /\ artifactState’ = [x \in AGENTS |-> IF x = a THEN "E" ELSE "I"] /\ UNCHANGED <> Next-state relation: Next == \E a \in AGENTS : Read(a) \/ Write(a) \/ Fetch(a) \/ Upgrade(a) 6.2 V erified In variants In variant 1 — Single-Writer Safety (SWMR). ∀ a , b ∈ A : a  = b ⇒ ¬ ( α ( a , d ) = M ∧ α ( b , d ) = M ) SingleWriter == \A a, b \in AGENTS : (a # b) => ~(artifactState[a] = "M" /\ artifactState[b] = "M") In variant 2 — Monotonic V ersioning. ∀ t ′ > t : artif actV ersion ( t ′ ) ≥ artifactV ersion ( t ) TLC verifies artif actV ersion ′ ≥ artifactV ersion in ev ery transition. In variant 3 — Bounded Staleness. F or constant K = MAX_STALE_STEPS : 13 ∀ a ∈ A : agentSteps [ a ] − lastSync [ a ] ≤ K BoundedStaleness == \A a \in AGENTS : (agentSteps[a] - lastSync[a]) <= MAX_STALE_STEPS 6.3 V erification Results TLC, configured with | A GENTS | = 3 and MAX_STALE_STEPS = 3 , explores approximately 2,400 distinct states. Zero violations of SingleWriter , MonotonicVersion , BoundedStaleness . Zero deadlocks. Liveness scope. Safety inv ariants and deadlock-freedom are verified. A formal liv eness property is not included. The property that an agent in state I e ventually reaches state S —that a pending fetch completes—holds under weak fairness ( WF ) on Fetch operations b ut has not been verified under adver - sarial scheduling. Li veness in practice depends on AS1 and AS2; violation of either can cause indefinite blocking. Formalizing li veness as a TLA+ property: planned for CCS v0.2. Counterexample under in v alidation removal. Modifying Upgrade(a) to not in validate peers: \* Broken: no peer invalidation BrokenUpgrade(a) == /\ artifactState[a] = "S" /\ artifactState’ = [artifactState EXCEPT ![a] = "E"] /\ UNCHANGED <> TLC detects SingleWriter violation in 3 steps: A 1 upgrades to E , A 2 upgrades to E (not in v alidated), A 1 writes to M , A 2 writes to M —SWMR violated. The in validation step in Upgrade is a correctness requirement. Not an optimization. 7 Implementation 7.1 Architectur e Overview The agent-coherence Python package (v0.1) implements CCS: ccs/ |-- core/ | |-- types.py # Artifact, CacheEntry, InvalidationSignal | |-- states.py # MESIState, TransientState enums | ‘-- clock.py # Logical vector clock for version ordering |-- coordinator/ | |-- service.py # CoordinatorService (Authority Service) | ‘-- registry.py # ArtifactRegistry (global directory) |-- agent/ | |-- runtime.py # AgentRuntime (per-agent protocol client) | ‘-- cache.py # ArtifactCache (local MESI state machine) |-- strategies/ 14 | |-- base.py # SyncStrategy abstract base | |-- eager.py # Eager invalidation | |-- lazy.py # Lazy (commit-time) invalidation | |-- lease.py # TTL-based lease expiration | ‘-- access_count.py # Access-count invalidation |-- bus/ | ‘-- event_bus.py # EventBus (pluggable transport) |-- adapters/ | |-- langgraph.py # LangGraph adapter | |-- crewai.py # CrewAI adapter | ‘-- autogen.py # AutoGen adapter ‘-- simulation/ |-- engine.py # SimulationEngine ‘-- scenarios.py # ScenarioConfig 7.2 Framework Adapters Each adapter: a thin translation layer mapping the frame work’ s nati ve state-passing to CCS protocol calls. No frame work modifications required. LangGraph adapter . Intercepts StateGraph node ex ecution hooks. Before execution: AgentRuntime .read(artifact_id) to validate cache state, inject content only on cache in v alidity . After ex ecution: modified state entries trigger AgentRuntime.write(artifact_id, content) . CrewAI adapter . Wraps Task ex ecution lifecycle. Artifact access injected through BaseTool subclassing—artifacts stored as named tool outputs via CCSReadTool , committed via CCSWriteTool . A utoGen adapter . Intercepts ConversableAgent.generate_reply . Cache v alidity checked before message context assembly; writes propagated through register_reply hook. Configuration surface (identical across all three): from ccs.adapters.langgraph import LangGraphAdapter adapter = LangGraphAdapter( coordinator_url="http://localhost:8080", strategy="lazy", max_stale_steps=5 ) 7.3 Logical Clock and V ersion Ordering A logical vector clock (one counter per agent) establishes partial ordering over writes, follo wing Lamport [10]. V ersion numbers are monotonically increasing integers assigned by the authority at commit time. V ersion ordering suffices for single-artifact safety; multi-artifact scenarios with cross-artifact causal dependencies may require full vector clocks—supported b ut not required by default. 15 8 Evaluation 8.1 Experimental Setup CCS is e valuated across four w orkload scenarios representing distinct artifact v olatility regimes. Each scenario: a ScenarioConfig specifying agent count, artifact count, artifact token size, step count, per- step write probability . T en independent simulations per configuration, executed with scenario-specific deterministic seeds (per-scenario seeds encoded in Y AML; canonical scenarios A–D use seeds 20260305– 20260308). Population standard de viation ( σ ) reported throughout. The simulation models artifact access under conditional access semantics (§3). At each step, each agent acts with probability 0.75 (the action_probability parameter); giv en an action, writes with probability V ( d i ) or reads otherwise, choosing uniformly from m artifacts. T ok en cost: full artifact fetches (cache misses) × artifact token size, plus in v alidation message overhead (12 tok ens per signal). Canonical scenario parameters (all configurations): n = 4 agents, m = 3 artifacts, | d i | = 4 , 096 tokens per artifact, S = 40 steps, 10 runs per configuration. Measured broadcast baseline: T broadcast = 1 , 979 , 597 ± 3 , 199 tokens. This slightly exceeds the formula v alue n × S × m × | d i | = 1 , 966 , 080 because the broadcast strategy also performs stochastic agent actions generating additional fetch tokens (~13.5K on av erage) atop the deterministic all-to-all broadcast sweep. I initially mistook this discrepancy for a bug before tracing it to the action-probability sampling layer—a small b ut instructiv e lesson in the perils of treating simulator output as self-evidently veridical. The ~0.7% ov ershoot is consistent across all runs and does not affect comparati ve sa vings ratios. Simulation scope and its limits. T oken transmission accounting, MESI state machine transitions, write frequency distributions, artifact volatility effects—these are modeled faithfully . LLM inference latency , message bus round-trip overhead, framework scheduling jitter—these are not. Theorem 1 delineates a lower bound on sa vings; the simulation consistently attains higher savings because lazy coherence collapses multiple write inv alidations into a single re-fetch when agents do not access an artifact between consecuti ve writes. Whether the simulation’ s access patterns—uniform artifact selection, p = 0 . 75 action probability—reflect the access distributions of real production workloads is an open empirical question. I suspect they do not match precisely; the uniform distribution is a modeling con venience, not a measured parameter . Empirical access-rate measurement from instrumented deployments is identified in §10 as a direction for v0.2, and I consider this the most significant gap between the simulation results and production applicability . The four scenarios: Scenario A — Planning ( V = 0 . 05 , W ≈ 2 writes per artifact). Infrequent plan re visions. Represen- tati ve of planning workflo ws, long-horizon research, specification revie w . Scenario B — Analysis ( V = 0 . 10 , W ≈ 4 ). Periodic shared-document updates. Representativ e of code re view , report drafting, data analysis pipelines. Scenario C — Active De velopment ( V = 0 . 25 , W ≈ 10 ). Moderate artifact churn. Representati ve of multi-agent software de velopment. Scenario D — High Churn ( V = 0 . 50 , W ≈ 20 ). Frequent modification by multiple agents. The performance boundary . 16 8.2 T oken Sa vings Results The eager strategy is included in T able 2 as an implementation-complexity baseline. It does not enforce the K -bounded staleness in variant (In v ariant 3, §6.2); agents under eager synchronization may read stale content for arbitrary step counts. Staleness-bound violations reported for eager in benchmark output are expected and do not indicate protocol error . T able 1: token usage under nai ve broadcast and under lazy in v alidation, with savings, cache hit rate (CHR), Coherence Reduction Ratio (CRR = T coherent / T broadcast ). T able 1: T oken synchronization cost by scenario (10 runs, scenario-specific seeds, values in thousands of tokens) Scenario V ( d i ) T broadcast ( ± σ ) T coherent ( ± σ ) Savings CRR CHR A: Planning 0.05 1 , 979 . 6 ± 3 . 2 98 . 1 ± 25 . 0 95 . 0% ± 1 . 3% 0.050 79 . 4% ± 5 . 2% B: Analysis 0.10 1 , 979 . 6 ± 3 . 2 152 . 3 ± 28 . 5 92 . 3% ± 1 . 4% 0.077 66 . 8% ± 6 . 0% C: De velopment 0.25 1 , 979 . 6 ± 3 . 2 231 . 2 ± 29 . 1 88 . 3% ± 1 . 5% 0.117 51 . 1% ± 7 . 0% D: High Churn 0.50 1 , 979 . 2 ± 3 . 1 312 . 1 ± 25 . 6 84 . 2% ± 1 . 3% 0.158 34 . 6% ± 7 . 0% Broadcast cost is nearly deterministic under fix ed parameters. Coherent cost: higher v ariance, because write ev ents draw from a Bernoulli process and resulting cache misses are stochastic. All savings exceed the theorem’ s lo wer bounds (85% / 80% / 65% / 40%), confirming Definition 3 is a conserv ativ e upper bound on T coherent . Strategy comparison. T able 2: token costs, all four strate gies, Scenario B ( V = 0 . 10). T able 2: Strategy comparison under Scenario B (Analysis, V = 0 . 10 , 10 runs) Strategy T sync ( ± σ ) Savings Notes Broadcast baseline 1 , 979 . 6 ± 3 . 2 — Full rebroadcast e very step Eager in validation 132 . 7 ± 24 . 3 93 . 3% ± 1 . 2% Lo wer fetch overhead; higher in validation traf fic Lazy in validation 152 . 3 ± 28 . 5 92 . 3% ± 1 . 4% Recommended default TTL (lease = 10 steps) 589 . 8 ± 0 70 . 2% ± 0% Decoupled from write frequency Access-count ( k = 8) 155 . 2 ± 25 . 3 92 . 2% ± 1 . 3% Near-equi v alent to lazy at this V Eager outperforms lazy slightly here—its immediate in v alidation-on-write pre vents stale cache hits that require re-fetch at next access. The difference is small (93.3% vs 92.3%) and re verses in write-heavy scenarios (§8.5 T able 5) where lazy’ s deferred-fetch adv antage dominates. TTL: strictly inferior in the lo w-volatility regime. Access-count: matches lazy closely at V = 0 . 10. 17 8.3 The V olatility Cliff The lo wer-bound formula predicts a savings clif f at V ∗ = 1 − n / S = 1 − 4 / 40 = 0 . 9 . Belo w , both formula lo wer bound and observed savings from simulation (10 runs, canonical parameters n = 4, S = 40): V ( d i ) Formula Lo wer Bound Observed Sa vings (10 runs) 0.01 89.0% 97 . 1% ± 0 . 4% 0.05 85.0% 95 . 0% ± 1 . 3% 0.10 80.0% 92 . 4% ± 1 . 5% 0.25 65.0% 88 . 3% ± 1 . 4% 0.50 40.0% 84 . 3% ± 1 . 0% 0.75 15.0% 82 . 2% ± 1 . 1% 0.90 0.0% 81 . 1% ± 1 . 3% 1.00 -10.0% 80 . 6% ± 1 . 3% The predicted collapse does not materialize. At V = 0 . 9 : 81.1% sa vings. At V = 1 . 0 : 80.6%. T w o mechanisms account for this: (a) writes distribute uniformly across m = 3 artifacts, so per-artifact ef fectiv e write rate is V / m ≈ V / 3 , and (b) multiple writes to the same artifact between agent accesses collapse into a single re-fetch under lazy semantics. The collapse condition (Corollary 2) remains a v alid worst-case analytical bound—it is tight only when all n agents re-fetch immediately after ev ery single write, a scenario that lazy access semantics structurally preclude. The practical reading of this table: coherence delivers substantial sa vings across the full volatility spectrum within the simulation model. The formula lower bound is most conservati ve at high V —the gap between bound and observ ation is largest precisely where practitioners need the most guidance, a feature of the bound I consider a weakness rather than a strength, and one I suspect could be tightened by incorporating the lazy-collapse mechanism directly into the analytical model. I have not attempted this tightening because the resulting expression w ould depend on the access probability distribution in ways that resist closed-form simplification, but the ef fort may be worthwhile for v0.2. 8.4 Prompt Caching Amplification A secondary benefit, not captured in the primary tok en savings metric, concerns pro vider-side prompt caching. Under broadcast semantics, the prompt prefix is in v alidated at e very step where an artifact changes; cache hit rates approach 1 − V ( d i ) —for V = 0 . 1 , only 90% of steps attain a provider cache hit. Under coherent synchronization, the prompt prefix contains only artifact references (not content); prefix stability is high regardless of artifact v olatility; provider cache hit rates approach 100% for the structural portion of the prompt. At typical prompt caching discount rates (50–90% cost reduction on cache hits [2]), this amplification ef fect can double effecti ve savings be yond raw token synchronization reduction. 8.5 Agent-Count Scaling T able 3: token cost for Scenario B ( V = 0 . 10) across varying agent counts. T able 3: Scaling behavior — token cost vs. agent count, Scenario B ( V = 0 . 10 , S = 40 , 10 runs) 18 Agent Count n T broadcast T coherent ( ± σ ) Savings Formula LB 2 989 . 2 K 44 . 3 ± 8 . 2 K 95.5% 85.0% 4 1 , 979 . 6 K 152 . 3 ± 28 . 5 K 92.3% 80.0% 8 3 , 956 . 7 K 468 . 6 ± 39 . 1 K 88.2% 70.0% 16 7 , 911 . 0 K 1 , 255 . 2 ± 65 . 0 K 84.1% 50.0% Graceful degradation. Savings decrease from 95.5% to 84.1% as n gro ws from 2 to 16. Each additional agent adds one initial fetch and is in validated once per write, so T coherent gro ws with n —but T broadcast gro ws proportionally faster (ev ery agent, every artif act, ev ery step), maintaining a high savings ratio. All observ ed values substantially e xceed the formula lo wer bounds. Above 84% even at n = 16 ; scalability confirmed well beyond the canonical four -agent scenario. 8.6 Artifact-Size Scaling T able 4: token cost for Scenario A ( V = 0 . 05, n = 4, S = 40) across varying artif act sizes. T able 4: Artifact-size scaling — Scenario A ( V = 0 . 05 , n = 4 , S = 40 , W ≈ 2 , 10 runs) | d i | (tokens) T broadcast T coherent (lazy) Savings Absolute savings 4,096 1 , 979 . 6 K 98 . 1 K 95.0% 1 , 881 . 5 K tokens 8,192 2 , 636 . 2 K 132 . 9 K 95.0% 2 , 503 . 3 K tokens 32,768 6 , 575 . 7 K 341 . 8 K 94.8% 6 , 234 . 0 K tokens 65,536 11 , 828 . 4 K 620 . 3 K 94.8% 11 , 208 . 1 K tokens Savings ratio: in variant to artifact size. Determined entirely by workflo w shape, not artifact magnitude. Confirmed by simulation: 94.8–95.0% across a 16× size range. F or a 65,536-token artifact, lazy coherence sav es approximately 11.2 million tokens per workflo w run. 8.7 Step-Count Scaling T able 5 instantiates the T oken Coherence Theorem’ s central structural claim: multiplicativ e-to-additiv e cost transformation. W ith fixed write count ( W ≈ 2 , Scenario A volatility), T broadcast gro ws as O ( S ) ; T coherent gro ws slowly . T able 5: Step-count scaling — fixed W ≈ 2 writes, n = 4 agents, m = 3 artifacts, | d i | = 4 , 096 tokens, 10 runs S (steps) T broadcast T coherent (sim) Savings (sim) Formula LB 5 259 . 3 K 36 . 9 K 85.8% 0% (bound<0) 10 505 . 0 K 49 . 2 K 90.3% 40.0% 20 996 . 6 K 68 . 9 K 93.1% 70.0% 40 1 , 979 . 6 K 98 . 1 K 95.0% 85.0% 50 2 , 471 . 1 K 111 . 2 K 95.5% 88.0% 100 4 , 928 . 7 K 188 . 4 K 96.2% 94.0% 19 T broadcast scales linearly with S . T coherent gro ws from 36.9K to 188.4K as S increases 20×—the operational signature of eliminating the S multiplier . Sa vings are positiv e ev en at S = 5 (85.8%), where the formula lower bound is zero—confirming that lazy deferred-fetch operates well beyond the analytical bound’ s conserv ativ e assumptions. Long-horizon workflo ws ( S ≥ 40) attain above 95%. 8.8 Pointer Semantics Compatibility CCS tar gets the conditional artifact access model (§3). When agents employ pointer semantics —holding a reference tok en rather than artifact content, fetching on demand—the choice of synchronization strategy critically af fects performance. Under pointer semantics with lazy inv alidation, each cache miss mandates a full artifact fetch. Lazy’ s lo w cache-hit rate in cold or high-in validation scenarios: man y agent steps trigger full fetches, producing synchronization costs exceeding the eager baseline by an order of magnitude. Strategy sync_tok ens Cache Hit Rate Eager 16,798 97.7% Lazy 341,036 41.0% Eager maintains near-perfect cache occupancy by pre-populating agent caches on e very write. Lazy’ s v alue proposition—avoid retransmission when state is valid—collapses when cold-start fetch frequenc y is high. Each stale-check miss becomes a full artifact fetch: 20× more synchronization tokens than eager . Practitioner rule: pointer-semantics deployments should prefer eager or access-count. Lazy is optimal for bulk injection workflows where artifact content is embedded in prompt context, not for pointer-reference architectures with frequent cold fetches. T able 2’ s sa vings comparison excludes the pointer model for this reason. The pointer+lazy failure mode is a strategy-selection mismatch, not a protocol defect. 9 Related W ork 9.1 Multi-Agent Coordination Frameworks LangGraph [21], Cre wAI [5], AutoGen [24], Semantic Kernel [15]—each provides orchestration for multi-agent LLM workflows (scheduling, message passing, state routing) but none provides a formal artifact coherence protocol. State passing is stateless by default: full shared state serialized and injected into each prompt, instantiating the broadcast baseline ev aluated here. Agent-Coherence layers coherence atop existing frame works without modifying internals—complementary , not competitiv e. 9.2 Conflict-Free Replicated Data T ypes CRDTs [20] address concurrent state mutation via mer ge-compatible data structures. Distinct problem from coherence: concurr ent mutation r esolution (what should the merged v alue be?) versus sync hr oniza- tion fr equency optimization (when is retransmission necessary at all?). Orthogonal: an artifact whose content is a CRDT still benefits from coherence-controlled delivery—the CRDT state is transmitted only when it has changed. 20 9.3 Operational T ransformation and V ersion V ectors O T [6] and vector clock versioning [10; 13] target consistency in collaborati ve editing. O T : per-operation transforms allo wing any linear order to produce identical results. V ector clocks: causal ordering across distributed processes. Both target correctness under concurrent writes. CCS employs vector clocks for version ordering (§7.3) but tar gets the token efficienc y problem that neither O T nor vector clocks address. 9.4 Retriev al-A ugmented Generation RA G [12; 7] retriev es document fragments from vector stores to supplement agent context. Complemen- tary subproblems: RA G determines what to retriev e; coherence determines when r etrieval is unnecessary because the local cache is still v alid. A coherence-aware RA G inte gration gates retriev al calls on MESI cache v alidity , attenuating redundant re-embedding of stable documents. 9.5 Long-Context and Context Compression Long-context models [2; 8] expand context windo ws; compression techniques [9; 25] prune to fit. Neither addresses the core synchronization problem. Expanded windo ws do not reduce retransmission cost to multiple agents; compression sacrifices fidelity . Coherence attenuates unnecessary retransmission before artifacts reach the conte xt window—complementary to both. 9.6 Software Cache Coher ence Concord [3]: software-le vel cache coherence for serv erless function execution, tracking shared memory state across in vocations. Zhang et al. [27]: coherence for microservice state synchronization. This paper extends coherence to LLM artifact synchronization, where “cache fill” is prompt injection and “coherence observer” is the agent runtime. 9.7 Agent A uthorization Coher ence In a companion paper [16], I apply the same MESI structural mapping to authorization, demonstrating that credential re vocation latency in multi-agent deleg ation chains maps onto cache coherence under bounded- staleness semantics, and that operation-count credentials are equiv alent to access-count coherence strategies. Distinct problems—token efficienc y here, security guarantees there—same formal apparatus. 9.8 Multi-Agent F ailure T axonomy Building on [4], the MAST taxonomy across 1,642 annotated traces: inter -agent misalignment—Loss of Con versation History (FM-1.4), Information W ithholding (FM-2.4)—constitutes 32.3% of failures. Standardized communication protocols (MCP , A2A) do not eliminate these f ailures, which arise from context-state di ver gence, not message-format incompatibility . The missing layer is not message-format standardization but artif act state coherence at the data layer—belo w the communication protocol. 21 10 Limitations and Futur e W ork Centralized authority ser vice. CCS assumes a single authority—a bottleneck for v ery lar ge deployments. Distributed coherence directories, analogous to directory-based coherence in NUMA systems [11], represent a natural e xtension: partitioning the artifact namespace across coordinators with cross-shard in validation. Single-artifact write model. CCS v0.1: each write atomic and independent. Multi-artifact transactions—consistent snapshots across se veral artifacts—demand memory barriers or transactional memory analogs. Simulation-based evaluation. The e valuation employs discrete-ev ent simulation, not production LLM workloads. T oken accounting is faithful; inference latenc y , scheduling delays, end-to-end w all-clock coherence overhead are not captured. Production e valuation with real LangGraph or Cre wAI deployments: planned for v0.2. I consider this the single most significant limitation—the gap between simulated access patterns (uniform, p = 0 . 75 ) and production access patterns (likely non-uniform, w orkload-dependent) is, I suspect, non-tri vial, and the savings figures should be read with this caveat in mind until empirical grounding from instrumented traces is av ailable. Adapter stability . LangGraph, CrewAI, AutoGen release frequently . Adapters target specific versions (LangGraph 0.2.x, Cre wAI 0.4.x, AutoGen 0.4.x); upstream API changes may break them. Always-read partial applicability . The conditional access argument (§3) applies to artifact- externalizing systems. Direct context injection (simple single-file RA G pipelines) remains under the always-read model; coherence pro vides no benefit there. Liveness not f ormally verified. Safety in v ariants and deadlock-freedom: verified. Li veness (state I → state S ev entually): holds under weak fairness on Fetch, un verified under adversarial scheduling. Agent crash during M-state write. Orphaned e xclusiv e lock persists until τ expires (§5.2). Write o wnership blocked for other agents during that window . Protocol coordination overhead. T oken savings in §8 exclude CCS’ s o wn traffic: in validation fan-out ( O ( n ) per write), ev ent bus round-trip latency , authority round-trip per read miss. At low V : negligible. At high V : non-tri vial. I hav e not quantified the crossov er point—an omission that could be remedied with production instrumentation data. Empirical access rate gr ounding. The simulation’ s action_probability = 0.75 and write probabil- ity V per action, with uniform artifact selection, are architectural judgment, not measured parameters. T ighter parameterization requires instrumented deployments. Artifact granularity . Artifacts are atomic: ev ery miss transmits full | d i | tokens (A1). Sub-artifact in validation—transmitting only modified sections—could substantially attenuate per-fetch cost for lar ge structured documents. 11 Conclusion The cost explosion does not inhere in multi-agent coordination. It is an architectural residue—repeated full-artifact broadcast adopted for simplicity . I have demonstrated the structural equi valence to cache coherence in shared-memory multiprocessors and the direct transferability of MESI. The T ok en Coherence Theorem: when S > n + W ( d i ) , lazy in validation attenuates cost by at least S / ( n + W ( d i )) . O ( n × S × | D | ) con verts to O (( n + W ) × | D | ) . Simulation: 84–95% savings across four 22 canonical workloads. The predicted collapse at V ≈ 0 . 9 does not occur; ~81% savings persist at V = 1 . 0 . TLA+-verified protocol: single-writer , monotonic versioning, bounded-staleness across all reachable states. Explicit counterexample: inv alidation is a correctness requirement. Implementation: thin adapter layers ov er existing frame works. The hardware architects solved this forty years ago under isomorphic pressure. The adaptation is long ov erdue. 11.1 Reproducibility All source code, simulation scripts, TLA+ specifications, and benchmark configurations are publicly av ailable. Repository: https://github.com/hipvlady/agent-coherence Package: pip install agent-coherence T o reproduce §8 results with matching seeds: git clone https://github.com/hipvlady/agent-coherence cd agent-coherence pip install -e . make reproduce # Runs all scenarios with committed seeds; verifies +/-0.5% vs baseline Expected output: within ±2% of archiv ed results on all reported metrics. Comparison is relati ve (coherent vs. broadcast), not absolute, minimizing platform-specific floating-point sensiti vity . REPRODUCE .md documents Python version requirements (3.11+) and expected runtime (< 15 minutes on standard hardware). 11.2 References [1] Anthropic. (2024). Model Conte xt Pr otocol: An Open Standar d for Connecting AI Systems to Data Sour ces . https://modelcontextprotocol.io/ [2] Anthropic. (2024). Pr ompt Caching. Anthropic API Documentation. https://docs.anthropic.com/prompt- caching [3] Balkind, J., et al. (2025). Concord: Software Cache Coherence for Serverless Function Execution. Pr oceedings of the 52nd International Symposium on Computer Ar chitectur e (ISCA) . [4] Cemri, M., Pan, M. Z., Y ang, S., Agrawal, L. A., Chopra, B., Tiw ari, R., Keutzer , K., Paramesw aran, A., Klein, D., Ramchandran, K., Zaharia, M., Gonzalez, J. E., & Stoica, I. (2025). Why Do Multi-Agent LLM Systems Fail? arXiv preprint . [5] CrewAI Inc. (2024). Cr ewAI: F r amework for Or chestrating Role-Playing A utonomous AI Agents. https://cre wai.com/ [6] Ellis, C. A., & Gibbs, S. J. (1989). Concurrenc y Control in Groupware Systems. Pr oceedings of the A CM SIGMOD International Conference on Mana gement of Data , 399–407. 23 [7] Gao, Y ., et al. (2023). Retrie val-Augmented Generation for Lar ge Language Models: A Surve y . arXiv pr eprint arXiv:2312.10997 . [8] Google. (2024). Gemini 1.5: Unlocking Multimodal Under standing Acr oss Millions of T okens of Context. T echnical Report. [9] Jiang, H., et al. (2023). LLMLingua: Compressing Prompts for Accelerated Inference of Lar ge Language Models. Pr oceedings of the 2023 Confer ence on Empirical Methods in Natural Language Pr ocessing (EMNLP) . [10] Lamport, L. (1978). Time, Clocks, and the Ordering of Events in a Distributed System. Commu- nications of the A CM , 21(7), 558–565. [11] Lenoski, D., et al. (1992). The Stanford Dash Multiprocessor . IEEE Computer , 25(3), 63–79. [12] Le wis, P ., et al. (2020). Retrie val-Augmented Generation for Kno wledge-Intensiv e NLP T asks. Advances in Neural Information Pr ocessing Systems (NeurIPS) 33 , 9459–9474. [13] Mattern, F . (1988). V irtual T ime and Global States of Distrib uted Systems. Pr oceedings of the W orkshop on P ar allel and Distributed Algorithms , 215–226. [14] Mei, K., et al. (2024). AIOS: LLM Agent Operating System. arXiv pr eprint arXiv:2403.16971 . [15] Microsoft. (2024). Semantic K ernel: An Open-Sour ce SDK for Inte grating AI Models. https://learn.microsoft.com/en-us/semantic-kernel/ [16] Parakhin, V . (2026). The Bureaucracy of Speed: Structural Equiv alence Between Memory Consistency Models and Multi-Agent Authorization Re vocation. arXiv preprint https://arxi v .org/abs/2603.09875 [17] OpenAI. (2024). Pr ompt Caching. OpenAI API Documentation. https://platform.openai.com/docs/guides/prompt- caching [18] OpenID Foundation. (2025). Identity Management for Agentic AI. https://openid.net/wp- content/uploads/2025/10/Identity-Management-for-Agentic-AI.pdf [19] Papamarcos, M. S., & Patel, J. H. (1984). A Lo w-Overhead Coherence Solution for Multipro- cessors with Pri vate Cache Memories. Pr oceedings of the 11th Annual International Symposium on Computer Ar chitectur e (ISCA) , 348–354. [20] Shapiro, M., Preguiça, N., Baquero, C., & Zawirski, M. (2011). Conflict-Free Replicated Data T ypes. Pr oceedings of the 13th International Symposium on Stabilization, Safety , and Security of Distributed Systems (SSS) , 386–400. [21] Shen, L., et al. (2023). LangGraph: Building Stateful, Multi-Actor Applications with LLMs. LangChain Documentation. https://langchain-ai.github .io/langgraph/ [22] Sorin, D. J., Hill, M. D., & W ood, D. A. (2020). A Primer on Memory Consistency and Cache Coher ence (2nd ed.). Synthesis Lectures on Computer Architecture, Morgan & Claypool. [23] W ei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Lar ge Language Models. Advances in Neural Information Pr ocessing Systems (NeurIPS) . [24] W u, Q., Bansal, G., Zhang, J., W u, Y ., Zhang, S., Zhu, E., Li, B., Jiang, L., Zhang, X., & W ang, C. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Con versation. arXiv pr eprint arXiv:2308.08155 . [25] Xu, M., et al. (2024). LLMLingua-2: Data Distillation for Ef ficient and Faithful T ask-Agnostic Prompt Compression. arXiv pr eprint arXiv:2403.12968 . [26] Y ao, S., Zhao, J., Y u, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y . (2023). ReAct: Syner giz- 24 ing Reasoning and Acting in Language Models. International Confer ence on Learning Repr esentations (ICLR) . [27] Zhang, W ., et al. (2024). Coherent Microservices: Extending Cache Coherence to Distributed State Management. Pr oceedings of the 29th A CM Symposium on Operating Systems Principles (SOSP) . [28] W ang, Q., T ang, Z., Jiang, Z., Chen, N., W ang, T ., & He, B. (2025). AgentT axo: Dissecting and Benchmarking T oken Distribution of LLM Multi-Agent Systems. Published at ICLR 2025 W orkshop on F oundation Models in the W ild. 25

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment