Reading time: 44 minute
...

📝 Original Info

  • Title:
  • ArXiv ID: 2512.18561
  • Date:
  • Authors: Unknown

📝 Abstract

Large-scale networked multi-agent systems increasingly underpin critical infrastructure, yet their collective behavior can drift toward undesirable emergent norms that elude conventional governance mechanisms. We introduce an adaptive accountability framework that (i) continuously traces responsibility flows through a lifecycle-aware audit ledger, (ii) detects harmful emergent norms online via decentralized sequential hypothesis tests, and (iii) deploys local policy and reward-shaping interventions that realign agents with system-level objectives in near real time. We prove a boundedcompromise theorem showing that whenever the expected intervention cost exceeds an adversary's payoff, the long-run proportion of compromised interactions is bounded by a constant strictly less than one. Extensive high-performance simulations with up to 100 heterogeneous agents, partial observability, and stochastic communication graphs show that our framework prevents collusion and resource hoarding in at least 90% of configurations, boosts average collective reward by 12-18%, and lowers the Gini inequality index by up to 33% relative to a PPO baseline. These results demonstrate that a theoretically principled accountability layer can induce ethically aligned, self-regulating behavior in complex MAS without sacrificing performance or scalability.

📄 Full Content

Multi-agent systems (MAS) increasingly underpin critical applications in transportation, smart-grid energy management, finance, and healthcare [60,67]. By distributing decision making across many partly autonomous agents, these systems offer scalability, resilience to single-point failures, and rapid adaptation to non-stationary environments. Those benefits, however, come with interaction complexity: once deployed, the collective behavior can drift toward undesirable emergent norms that were never explicitly designed or anticipated [61,54]. Identifying and correcting such norms is essential for ensuring fairness, security, and compliance with societal values.

In large-scale networked MAS, responsibility is diffused across agents and time steps. Classical AI-accountability frameworks-often built for single models or clearly identifiable human decision makers-rely on static compliance checks or post-hoc audits [38,28]. These approaches break down when no single node sees the full global state and harms can emerge from subtle feedback loops. Recent policy instruments, such as the EU Artificial Intelligence Act [16] and the U.S. NIST AI Risk-Management Framework [42], call for “collective accountability” in AI infrastructures but leave open the technical challenge of tracing and mitigating harmful norms online and in a decentralized fashion.

Three established research threads touch on this challenge yet remain individually insufficient. Normative MAS studies formal norms and sanctions [3,25] but typically assumes a central monitor with perfect state access. Multi-agent reinforcement learning (MARL) has produced sophisticated training algorithms [9,18], but only recently has begun to explore equity, collusion, or convergence to harmful equilibria. Runtime assurance for cyber-physical systems proposes supervisory safety guards [51], yet seldom scales beyond a handful of agents or supports changing objectives. Consequently, the field still lacks a comprehensive method to detect, trace, and correct undesirable emergent norms under partial observability and heterogeneous incentives. This paper tackles four questions: (i) How can responsibility for distributed actions be continuously attributed when no participant sees the entire trajectory? (ii) Can harmful emergent norms be detected online without halting operations or requiring global state? (iii) Which local policy or reward interventions can steer a large MAS back toward socially preferred outcomes while respecting resource and latency constraints? (iv) Do those interventions remain robust as the number of agents, communication topology, or payoff structure changes?

We answer by proposing an Adaptive Accountability Framework that combines a lifecycle-aware audit ledger, decentralized sequential hypothesis tests for online norm detection, and targeted reward-shaping or policy-patch interventions. Our principal theoretical result-the bounded-compromise theorem-proves that when the expected cost of interventions exceeds an adversary’s payoff, the steady-state fraction of compromised or collusive interactions converges to a value strictly below one. The theorem formalizes the intuition that inexpensive, well-targeted corrections can prevent persistent harmful norms even in strategic environments.

To evaluate the framework at realistic scale, we run extensive simulations on a high-performance cluster equipped with NVIDIA A40 GPUs. Up to 100 heterogeneous agents interact over stochastic communication graphs under partial observability. Across 360 Monte-Carlo configurations that vary random seeds, payoff functions, and intervention protocols, the framework averts collusion and resource hoarding in at least 90% of runs, lifts average collective reward by 12-18%, and reduces the Gini inequality index by up to 33% relative to a PPO baseline. All code and simulation scripts will be released to support replication.

The paper has the following main contributions:

  1. An accountability architecture that logs interaction events, tags causal chains, and maps evolving responsibility flows without centralized state.

  2. Online norm-detection algorithms based on decentralized sequential hypothesis testing.

  3. Adaptive mitigation protocols that apply graded local incentives or policy patches to realign global behavior.

  4. A high-fidelity implementation integrated with modern MLOps pipelines and illustrated through resourceallocation and collaborative-task case studies.

  5. Comprehensive large-scale evaluation demonstrating robustness and scalability across diverse operating conditions.

The rest of the paper is organized as follows. Section 2 surveys related work on MAS coordination, normative design, and AI governance. Section 3 formalizes the problem setting. Section 4 details the proposed architecture, detection tests, and intervention algorithms. Section 6 describes the software stack and deployment considerations. Section 7 reports empirical results, and Section 8 discusses their implications and limitations. Finally, Section 9 concludes with directions for future research in ethically aligned MAS.

Multi-agent systems (MAS) consist of multiple autonomous entities that sense, decide, and act on local information while exchanging limited signals with neighbors [67,60]. Delegating control in this way yields scalability, fault tolerance, and graceful adaptation, enabling real-world deployments in smart-grid scheduling [1], adaptive traffic control [53], supply-chain optimization [41], cooperative robotics [8], distributed optimization on sensor networks [14], algorithmic trading [62], and cross-hospital resource sharing [50]. What unites these domains is the need to reconcile local incentives with global constraints under partial observability and stringent latency budgets.

Nonetheless, the same decentralization that confers robustness breeds interaction complexity. Agents are autonomous, social, and both reactive and proactive [68]; even simple local rules can induce system-level phenomena that are hard to predict, verify, or control [61]. Coordination grows more challenging as populations scale, links fluctuate, or objectives diverge [13]. Emergent behaviors-benign or harmful-complicate formal safety guarantees in partially observable settings [9].

Classical emergence studies viewed norms as regularities not explicit in any single agent’s code [6]. Early mechanisms included imitation and social learning [57], reinforcement-driven sanctioning [39], and top-down institutions with explicit penalties [4,52]. Recent work shifts from descriptive to constructive: Tzeng et al. [64] show that “soft-touch” speech acts (tell, hint) enforce norms faster than pure sanction; Oldenburg and Zhi-Xuan [46] introduce Bayesian rule-induction so newcomers infer latent norms on-line; Serramia et al. [58] formalise optimal consensus norms that balance heterogeneous stakeholder utilities; Ren et al. [55] harness large language models to generate conflict-reducing social norms in an artificial town; and Woodgate et al. [66] operationalization Rawls’ maximin principle, steering emergent norms toward lower inequality. Together these papers recast norm formation as an algorithmic and ethical design problem.

Standard artifacts-model cards [37] and fact sheets [5]-target monolithic models. Applied to MAS they fail to address diffuse responsibility [17], continual policy drift [28], and aggregation of local harms. Recent research begins to fill the gap: Chan et al. [11] catalogue visibility tools for peer-to-peer deployments, trading off informativeness and privacy; Mu and Oren [40] extend alternating-time temporal logic with quantitative responsibility metrics; Gyevnar et al. [22] generate natural-language causal explanations via counterfactual roll-outs, boosting trust in autonomous vehicles; and Chang and Echizen [12] treat provenance as a chain of custody for LLM-generated artifacts, reconstructable without privileged logs.

Detection alone is insufficient; an accountability layer must intervene. Beyond classic reward shaping, recent levers include action-space restriction [45], monitored MDPs with human oracles [49], and sample-efficient opponent shaping [19]. Model-based credit assignment with counterfactual imagination [10] shows promise for scalable cooperation-although its computational cost remains high in large MAS.

Effective intervention presupposes causal insight. Triantafyllou et al. [63] quantify agent-specific effects-how one agent’s actions propagate to others-while time-uniform concentration bounds [26] and stochastic approximation theory [33] provide rigorous guarantees for sequential detectors and adaptive thresholds such as those used in our framework.

The EU AI Act and the NIST AI RMF (2023) explicitly flag collective emergent risk, yet supply no technical recipe for tracing accountability in networked AI. Recent literature contributes building blocks-richer norm learning, quantitative responsibility metrics, scalable intervention levers-but lacks an online, end-to-end pipeline that meets bandwidth budgets and supplies audit-grade evidence. Three open needs persist:

  1. Continuous monitoring attuned to policy drift and lossy observations; 2. Distributed attribution robust to adversarial spoofing; 3. Low-latency interventions that safeguard global welfare while respecting local autonomy.

The Adaptive Accountability Framework introduced next addresses all three needs. It combines a cryptographically grounded Merkle ledger, time-uniform norm detectors, and cost-bounded interventions validated at 100-agent scale.

The next sections detail the architecture (Section 4), furnish theoretical guarantees (Section 5), and supply empirical evidence that the framework curbs harmful emergence even under adversarial disturbance (Section 7).

This section formalizes the accountability problem in large-scale networked multi-agent systems, frames the research questions that guide our work, and states the operational assumptions under which any proposed solution must function.

The notation introduced here flows directly into the analytical guarantees of Section 5 and the implementation described in Section 6.

System model: We represent the MAS as a discrete-time, partially observable stochastic game ; and receives an individual reward

where a t = (a 1,t , . . . , a N,t ) and λ ∈ [0, 1] trades off egoistic and collective incentives. The transition kernel T yields s t+1 ∼ T (s t , a t ); consequently, learning updates π i,t+1 ← Learn(π i,t , D i,t ) on local traces D i,t , imparting non-stationarity to the joint dynamics T •Π t [24].

Communication substrate: Agents communicate over a time-varying graph G t = (V, E t ) whose edge set changes with failures or mobility. Each edge (i, j) carries at most B max bytes per step; messages are subject to random delay δ i,j ∼ Geo(p drop ) and may be adversarially reordered. We model the aggregate channel as an ERASURE-AND-DELAY network to capture practical wireless constraints in vehicular or drone swarms.

Event ledger and responsibility flow: Accountability demands a tamper-evident record of “who did what, when, and with what effect”. We therefore maintain a rolling directed acyclic graph

where each vertex e ∈ E t is an atomic event ⟨id, t, type, payload⟩ (actions, messages, external shocks To quantify blame we adopt a Shapley-value-style responsibility rule where ϕ(e, S) equals 1 if e would still occur when only agents in S are allowed to influence upstream events and 0 otherwise. We compute ρ i (e) via a Monte-Carlo kernel-SHAP proxy with at most K coalition samples per event, keeping amortised overhead sub-quadratic in N .

Norms, violations, and emergent harm: System designers declare a finite set Φ = {ϕ (1) , . . . , ϕ T ≤ η ⋆ almost surely, irrespective of κ and graph topology, provided G t stays (κ+1)-vertex-connected on average.

Complexity and storage bounds: Let d max denote the maximum out-degree in G t and h the causal horizon. Then storing L T for T steps uses O(T (N + d max h)) events and edges; computing K-sample kernel-SHAP scores incurs Õ K |E t | per round. With K = 32 and d max ≤ 8 (typical for traffic or drone mesh networks) the ledger fits comfortably within 500 MB for T = 10 6 on-board an edge server.

The above formalism exposes every variable that the accountability layer must track, bounds algorithmic cost, and states explicit performance-safety trade-offs. It therefore provides the analytical foundation for the architectural and algorithmic choices detailed in Section 4.

The formalism above yields four research questions: RQ1 -Responsibility tracing. Design an event ledger L T and allocation rule ρ i (e) that remain accurate under partial observability, message loss, and non-stationary policies, yet scale to N ∼ 10 2 -10

The accountability layer must operate under four non-negotiable constraints that mirror conditions in fielded MAS deployments.

Partial observability and stochasticity: No agent-and certainly no external auditor-enjoys global, noise-free visibility of system state. Packet loss, clock drift, and privacy filtering mean every observation o i,t and every supervisory record is only a stochastic sample of ground truth. The causal ledger therefore stores time-stamped tuples ⟨ŝ t , âi,t , ri,t ⟩ governed by an explicit error model (ε loss , ε delay ). Detection thresholds and confidence bounds are derived so that false-alarm probability remains below a chosen level α even when up to 20% of events are missing or mis-sequenced (Theorem 3, Section 5); this is essential because spurious interventions can be nearly as harmful as inaction in safety-critical domains.

Bandwidth and compute budgets: City-scale traffic networks, drone swarms, and smart grids share a hard upper bound on wireless throughput and require sub-second control latency. We cap per-edge traffic at B max = 128 bytes step -1 consistent with vehicular ad-hoc networks-and restrict each agent to devote at most 5 % of its CPU/GPU cycles to accountability tasks. Raw data are compressed or hashed locally, and only digests plus occasional Bloom-filter proofs traverse the network; Algorithm 2 (Section 4) keeps intervention latency below the 100 ms wall-clock budget of our traffic-signal case study by relying on incremental CUSUM statistics and constant-time reward-patch look-ups.

Agent heterogeneity: Real deployments mix micro-controllers running tabular Q-learning, legacy rule-based agents, and cloud hosts executing transformer-based deep RL. Because internal gradients or weights are not universally available, the framework is algorithm-agnostic: it computes responsibility scores from observable events alone. Resource-constrained nodes can run a 32 KB Rust reference kernel with O(1) per-step overhead, while more capable agents may optionally expose richer telemetry (e.g. policy logits) that tighten attribution bounds without being required for correctness.

Regulatory and ethical context: Many target applications fall under the EU AI Act’s “high-risk” category. Each ledger entry therefore carries a pseudonymised agent ID, a purpose-limited data tag, and a zero-knowledge proof attesting that the log originates from certified software. This design satisfies GDPR data-minimisation requirements yet still lets external auditors reconstruct causal chains. Intervention records include plain-language rationales, effect sizes, and parameter settings so that policymakers and domain experts can review system behavior without inspecting source code.

These deliberately pessimistic assumptions, i.e., lossy sensing, tight resource budgets, heterogeneous agents, and stringent legal scrutiny, shape every design choice in Sections 4-7. By proving guarantees and demonstrating empirical performance under such constraints, we strengthen confidence that the Adaptive Accountability Framework will generalise to real-world MAS whose operating conditions may be even harsher than those evaluated here.

The Adaptive Accountability Framework (AAF) turns a networked MAS into a self-auditing socio-technical system. It supplies three always-on capabilities: (i) tamper-evident event logging, (ii) fine-grained responsibility attribution, and (iii) online detection and mitigation of harmful emergent norms. Each capability is engineered to respect the real-world pressures outlined in Section 3.3: sub-second latency, sub-kilobyte bandwidth, heterogeneous on-board hardware, and strict external auditability. 1. Agent layer: Each agent A i runs its domain policy π i (e.g. PPO, tabular Q-learning, classical MPC). Agents exchange application messages over G t and expose a one-line RUST interface publish_event(e) to the monitoring layer. The interface is isolated in a sandbox so that faulty or Byzantine agents cannot overwrite audit logic.

  1. Monitoring layer: High-dimensional observations and actions are irreversibly compressed to 32-byte digests using BLAKE3. Digests, reward scalars, and a two-byte nonce form a 40-byte record that is forwarded over a non-blocking gossip channel capped at B max = 128 bytes step -1 . To keep within this budget the monitor batches up to two records per step and drops additional events using an importance-weighted reservoir; dropped hashes are recoverable via local ring buffers for post-mortem forensics.

  2. Ledger layer: Gossip streams converge at edge servers that execute Merkle-DAG synthesis: new records extend a Merkle prefix tree whose leaves reference prior sub-roots, forming a snapshotted Merkle DAG every H snap = 256 steps [36]. The root hash of each snapshot is signed with a rotating ECDSA key and broadcast back to agents for self-verification. The DAG design, adapted from block-lattice data stores, provides O(log V ) proof-of-inclusion while avoiding the confirmation latency of Nakamoto consensus.

  3. Governor layer: A lightweight supervisor (which may itself be replicated for fault tolerance) consumes ledger updates, performs causal analysis, runs norm detectors, and dispatches interventions. To avoid central bottlenecks, the governor is implemented as a CRDT-based service whose replicas hold disjoint shards of the ledger keyed by agent ID modulo the replica count.

  4. Security plane: Every inter-layer message carries (i) a monotone counter signed with a time-bounded session key, (ii) a linkable ring signature for agent privacy, and (iii) an optional zero-knowledge proof of policy provenance for high-assurance deployments. These cryptographic hooks are disabled in the low-power benchmark but included in the public code base.

The accountability layer begins by turning every control-loop iteration into a tamper-evident, causally annotated record. This subsection first walks through the data path-from on-board sensor reading to committed Merkle node-and then formalizes how those records yield the responsibility scores analyzed in Section 5.

At the end of control step t an agent A i produces the tuple

The observation and action hashes are 16-byte BLAKE3 digests, and the full record is 40 bytes. A 64-bit SIPHASH of the byte string yields the unique identifier h e = H(e) ∈ {0, . . . , 2 64 -1}.

Records propagate at most d max hops-matching the communication degree bound in Assumption A2-before reaching an edge server. Each hop uses non-blocking gossip with a 64-entry replay buffer so that transient link failures do not stall the ledger.

On arrival, the record travels through three micro-stages: (i) local commit to the agent’s ring buffer, (ii) causal testing against the most recent m=8 upstream events, and (iii) Merkle-DAG insertion. All three are captured in Algorithm 1. Thus, the constant-time online update for each F -statistic uses the Sherman-Morrison rank-one identity, so the sensor-to-ledger latency is < 0.3 ms on a Cortex-A55.

The ledger furnishes a stream of verifiable events so the next task is to decide in real time whether those events signal that the MAS is drifting toward an undesirable emergent norm. We treat norm detection as a sequential change-point problem. For every active norm ϕ (m) ∈ Φ the governor computes a scalar diagnostic statistic Z if S ≥ h then // raise alarm, then reset statistic alertt ← 1;

In Algorithm 2, lines 1-2 accumulate evidence that Z t has shifted upward so whenever the statistic S crosses the adaptive threshold h, an alarm is fired and S is reset (classical Page’s rule [48]). Lines 3-4 perform a Robbins-Monro stochastic approximation [56]: if the empirical alarm frequency exceeds the target α the threshold is gently raised, and vice-versa. Section 5.3 proves that this update drives the long-run false-alarm rate to α regardless of slow concept drift (Assumption A3).

When many norms are monitored simultaneously, naïvely raising an alarm for every CUSUM excursion would flood the intervention scheduler. We therefore allocate a global alert budget ᾱ = 0.05 across norms by online mirror descent: if norms m and n both spike, the one whose running FP-budget α(m) t is lower wins the alert slot, and the loser’s threshold is increased by ∆h = ξ log α(m)

. Appendix D derives the regret bound R T ≤ 2 T log |Φ|, guaranteeing that the allocator never starves a chronically violated norm.

Across 360 Monte-Carlo seeds (Section 7) the combined detector recognized emergent collusion within9 steps median (95th percentile = 17) and maintained a 4.7 % system-wide false-positive rate-even with 20 % record loss and bounded 3-step delays (Assumption A1). The compute footprint is negligible: updating all |Φ| = 3 norms costs 0.07 ± 0.01 ms on a 2.2 GHz Xeon Silver and < 0.3 ms on a Cortex-A55, comfortably within the five-per-cent control-loop budget set in Section 6.

With a statistically calibrated, resource-light detector in place, the framework can raise alarms fast enough to allow the bounded-cost interventions of Section 4.4 to curb harmful norms before they compromise system-level objectives-a guarantee formalized in Theorem 5 and validated experimentally in Section 7.

Once the detector of §4.3 raises an alarm alert t = 1, the governor must act quickly yet economically: the corrective signal has to arrive within the 100 ms domain deadline while keeping the expected cost below c max so that Theorem 5 applies. The intervention pipeline therefore unfolds in three stages responsibility ranking, playbook selection, and actuation-all executed inside the same event loop that hosts Algorithm 2.

Stage 1: Responsibility heap: The governor queries the ledger for all events that contributed to the alarm’s CUSUM statistic during the look-back horizon [t-H, t] with H = 25. Each event’s responsibility vector ρ(e) is summed to obtain a windowed score ρ i = e∈[t-H,t] ρ i (e). These pairs form a max-heap H t = {(i, ρ i ) | ρ i > 0} that can be popped in O(log N ) time to identify the top-k most culpable agents.

Stage 2: Three-tier playbook: Interventions escalate through three increasingly intrusive tiers:

  1. Reward shaping (default): Each culpable agent receives an additive penalty ∆r i,t = -λ ρ i for H steps. The penalty weight λ = c max /H satisfies the budget dominance requirement in Assumption A4 by construction.

If an agent has triggered two penalties within the past 100 s it is considered recalcitrant so the governor hot-loads a “safety head”-a two-layer 128-unit ReLU MLP compiled to We-bAssembly-that clamps logits violating the norm predicate. Average load time is 12 ms (P95 = 15 ms).

  1. Link throttling (aggressive): For suspected collusion, edges (i, j) are re-weighted by w ij ← w ij 1 -ρ i ρ j for H steps, attenuating high-responsibility communication channels while leaving the global bandwidth budget intact.

Stage 3: System-wide failsafe: If three non-overlapping alert regions fire within a 300 s sliding window, the governor enters a yellow-flag mode: learning rates α i are frozen, policy gradients are clipped to 0.1, and a Grafana dashboard renders a causal heat-map so that a human operator can intervene. Yellow-flag mode clears automatically once the compromise ratio C T /T falls below its 7-day moving average.

The supervisory spend per step is bounded by E[Cost(σ t )] ≤ λ + c patch α + c throttle α 2 , where α is the false-alarm budget and the last two terms account for patch and throttle operations that occur only on alerts. Since λ > g max by design, Theorem 5 ensures lim sup T →∞ C T /T ≤ η ⋆ . In practice we observed η ⋆ = 0.19 under a 5 % Byzantine mix, dropping to 0.06 in fully rational populations ( §7).

At runtime, the ledger is materialized as a Merkle directed-acyclic graph (Merkle DAG) whose nodes are addressed by the cryptographic hash of their content [36]. Every 256-step snapshot is edge-partitionable: we shard by the first 16 bits of the event hash h e , creating 2 16 disjoint key ranges that can be balanced across the edge tier. Each shard is replicated onto f +1 servers; read and write clients contact a quorum, i.e., a strict majority of replicas, so the system tolerates f crash failures without sacrificing linearizable semantics [20]. Any two quorums overlap in at least one live replica, eliminating the need for a global lock or leader election.

Resource complexity. Let h denote the causal horizon (8 in our deployment) and d max the bounded out-degree from Assumption A2. A simple amortized analysis (Appendix E) yields

in perfect agreement with Proposition 2. Concretely, for N = 100, h = 8, and a run length T = 10 6 steps the entire ledger occupies ≈ 492 MB-comfortably within the 16 GB RAM of commodity edge boxes-and streams across the wire at < 80 KB s -1 , well under the 1 Gbit s -1 budget of the deployment network.

These figures demonstrate that AAF’s cryptographically protected audit history scales to hundred-agent regimes without exotic hardware or networking-fabric requirements, while the quorum-replicated shards ensure continued availability in the face of routine server failures.

In offline calibration, designers simulate worst-case scenarios (50% packet loss, 10% Byzantine agents) to grid-search baselines µ (m) 0 , offsets δ (m) , attenuation β, and initial threshold h 0 . The best configuration is promoted to a shadow governor that runs for 48 h beside the production system; any latency > 100 ms or throughput drop > 2% triggers an automatic rollback.

In online adaptation, after go-live, CUSUM thresholds h t are tuned by mirror descent, ECDSA2 keys rotate every 24h, and new norms can be hot-enabled without downtime because the ledger schema is append-only and therefore backward-compatible.

Every intervention writes a triple ⟨normID, timestamp, rationale⟩ to the ledger. External auditors can query responsibility flows, intervention rationales, and performance metrics via the FoundationDB snapshot API exposed in §6.

Taken together, adaptive interventions, edge-partitioned storage, and a blue-green deployment strategy create an accountability layer that is auditable, bandwidth-frugal, and robust to non-stationary, partially observable, and adversarial MAS dynamics. The next section derives theoretical bounds that underpin these claims, and Section 7 empirically validates AAF on heterogeneous benchmarks with up to N = 100 agents and high packet-loss regimes.

This section formalises the statistical and game-theoretic properties that underpin the Adaptive Accountability Framework (AAF). We proceed in four logical stages:

  1. Ledger soundness: We show that responsibility scores converge and causal edges respect a time-uniform false-positive (FP) budget under lossy communication. 2. Detector reliability: We derive finite-sample FP and detection-delay bounds for the adaptive CUSUM test fed by ledger statistics. 3. Intervention efficacy: We prove a bounded-compromise theorem: whenever per-step supervisory cost dominates adversarial gain, the long-run fraction of compromised events remains below a designer-chosen η ⋆ . 4. Resource complexity: We bound storage, communication, and computational cost, validating the feasibility claims from Section 4.

We work on a filtered probability space (Ω, F , {F t } t≥0 , P) where F t is generated by the global MAS trajectory up to time t [24]. Indicators, expectations, and probabilities are taken with respect to P unless stated otherwise. Throughout we invoke four assumptions-justified empirically in Section 3.3-to isolate the essential trade-offs:

A1 Lossy channel: Each event tuple is independently dropped with probability ε loss ≤ 0.2 and delayed by at most ε delay ≤ 3 steps. A2 Bounded degree: The time-varying communication graph satisfies

A3 Vanishing non-stationarity: Policy updates obey ∥π i,t+1 -π i,t ∥ 1 ≤ κt -ξ with ξ > 1 2 . A4 Budgeted adversary: Byzantine agents harvest at most g max expected utility per step; the supervisor’s cost budget honours c max > g max .

A reader may wonder: What exactly is stored in the audit ledger, how are causal claims derived, and why do responsibility scores behave like probabilities? This subsection therefore proceeds in three steps. We (i) introduce precise definitions for the ledger structures and the responsibility rule, (ii) prove that those scores are well-normalised and converge even when packets are lost, and (iii) establish a time-uniform bound on spurious causal edges. The notation follows Section 3.1; assumptions A1-A3 remain in force.

Definition 1 (Event ledger) Each control step t produces a finite set E t = {e

|Et| } of atomic events-actions, message sends/receives, reward reveals, exogenous shocks. The ledger up to time T is the directed acyclic graph

whose edge set C T is produced by Algorithm 1: (e k → e ℓ ) is inserted when the rolling m-lag Granger F-statistic exceeds threshold h t . The DAG property follows because algorithmic inserts never point backward in time.

Definition 2 (Causal path and attribution weight) A causal path from agent i to an event e is any directed sequence p = ⟨e k0 , . . . , e km = e⟩ with e k0 ∈ E act (i) (an action by i) and (e kj-1 → e kj ) ∈ C T . Let P i (e) collect all such paths. Given a discount factor β ∈ (0, 1), the attribution weight of p equals β |p| ; shorter paths contribute more heavily than longer ones. Proof 1 Because C T is a DAG, every causal path has a unique originating agent. The sets P i (e) N i=1 therefore form a partition of P(e) = i P i (e). Summing the numerator over i reproduces the denominator, giving unity.

Stability under lossy communication. Even with 20% packet loss (A1) we require that responsibility scores stabilise as the ledger grows.

Theorem 2 (Ledger convergence) Assume A1-A3 and fix 0 < β < 1. For any agent i and any event sequence {e T } T ≥0 with bounded age sup T T -t(e T ) < ∞,

where ρ ∞ i (e) is a finite limit.

Proof 2 (Proof sketch) Packet drops Bernoulli-sample edges with retention probability 1-ε loss . Under bounded degree (A2) and finite path length (ring buffer 256) only a finite number of causal paths can affect e. The first Borel-Cantelli lemma then guarantees that every path is eventually observed. Together with the square-summable policy drift (A3) this yields almost-sure convergence; see Appendix A.2 for the full argument.

Guarding against spurious edges. Because Granger tests operate on noisy, finite data, we must bound the false causal edges that slip into C T .

Theorem 3 (Time-uniform false-positive control) Fix m = 8 and h t = h 0 + √ 2 log t in Algorithm 1. Under the null hypothesis of no causal influence,

Proof 3 (Sketch) Apply the Chernoff-Ville inequality to the self-normalised stream of F-statistics and union-bound over time [27]. Full derivation in Appendix A.3.

Corollary 1 (Design rule for h 0 ) Choosing h 0 = 2 log(1/α max ) achieves a ledger-wide FP probability ≤ α max ; e.g., h 0 = 4.89 bounds FP risk at 10 -5 .

Lemma 1 tells us ρ i (e) behaves like a probability distribution over agents; Theorem 2 assures reviewers that these probabilities stabilize despite 20% packet loss; and Theorem 3 ensures we can tune the threshold sequence so that, with overwhelming probability, no non-existent causal link ever pollutes the audit trail. Together, the results certify the soundness of the ledger as a statistical substrate for the downstream detector and intervention logic analyzed in Sections 5.3-5.4.

We now show that the adaptive CUSUM detector introduced in Algorithm 2 meets two core requirements of a deployable alarm system:

  1. the false-alarm rate is capped at a designer-chosen level α, and 2. the detection delay grows at most linearly with the decision threshold, matching the classical Page-Lorden benchmark [48,34].

Throughout this subsection we assume the null hypothesis H 0 (“no drift”) holds until an unknown change-point τ ⋆ , after which the monitored statistic’s mean increases by ∆ > 0. Let S t be the one-sided Page statistic,

and let h t be its adaptive threshold, updated by the Robbins-Monro recursion

Line 4 of Algorithm 2 raises an alarm whenever S t ≥ h t and then resets S t to zero; the binary alarm flag is denoted alarm t .

A. False-alarm control:

Theorem 4 (Long-run false-positive rate) For step size η t = t -0.6 and the threshold recursion above,

Pr alarm t = 1 = α.

Proof 4 (Sketch) Define f (h) = Pr H0 (S ≥ h) -α. Since f is continuous and strictly decreasing, the Robbins-Monro update constitutes a stochastic approximation to the unique root h ⋆ of f . The standard SA conditionst η t = ∞ and t η 2 t < ∞-ensure h t → h ⋆ almost surely [56,33]. Stationarity of (S t , h t ) under H 0 then yields the desired long-run frequency; Appendix A.4 details the coupling argument.

Corollary 2 (Lorden bound with adaptive threshold) Let a permanent drift of size ∆ > 0 occur at time τ ⋆ , and recall the slack variable δ > 0. Then

where h ⋆ = lim t→∞ h t . The bound is identical to the classical Lorden delay [34] except that h ⋆ is learned online rather than fixed a priori.

Proof 5 (Sketch) Conditional on h t → h ⋆ , the adaptive scheme behaves like a standard fixed-threshold CUSUM; Lorden’s analysis applies verbatim once δ < ∆. Appendix A. 4 Collectively, the results certify that the detector is both statistically reliable and computationally lightweight, thereby satisfying the real-time governance requirements of large-scale networked MAS.

The final analytical step is to prove that the supervisory playbook of §4.4 actually delivers the headline promise: once a harmful norm is detected, inexpensive penalties keep the long-run fraction of compromised events below a designer-chosen ceiling η ⋆ without eroding overall social welfare. The analysis casts the closed-loop system as a renewal-reward process whose renewal epochs are the beginnings of intervention windows.

Theorem 5 (Bounded-compromise) Suppose Assumptions A1-A4 hold. After every alarm the supervisor imposes a penalty ∆r i,t = -λ ρ i (e • ) for H consecutive steps, where e • is the sentinel event. If the penalty satisfies λH ≥ g max +ε for some ε > 0, then lim sup

In addition, the cumulative welfare shortfall obeys

Proof 6 (Sketch) Partition time into i.i.d. cycles that start immediately after an intervention window ends and finish at the next alarm. During a window the adversary’s expected net gain is at most g max H -λH, which is negative by construction. Consequently, the marked point process that counts compromises forms a super-martingale with negative drift during interventions and non-positive drift otherwise. Doob’s optional stopping theorem bounds its stationary mean, yielding the compromise ratio. Bounding the cumulative loss of private reward by ∆ max per step gives the welfare bound. Details-including the derivation of the renewal reward kernel-appear in Appendix A.5.

Practical tuning and trade-offs. Equation η ⋆ = αH/(λH -g max ) highlights two dials available to system designers:

• Penalty amplitude λ: Raising λ tightens the compromise ceiling η ⋆ but increases per-step supervisory cost and the risk of over-penalising honest but noisy agents. • Window length H: Longer windows amortize communication overhead but slow the feedback loop. Empirical results in §7 suggest H = 25 balances these factors well for 100-agent traffic control.

Proposition 1 (Optimal penalty magnitude) Fix H and target η ⋆ . The smallest penalty achieving the bound in Theorem 5 is

Proof 7 Re-arrange η ⋆ = αH/(λH -g max ) for λ.

Connection to reward shaping. The intervention rule can be viewed as a potential-based reward shaper [43] where the shaping term is the scaled responsibility score. Potential-based shaping is known to preserve the set of optimal policies in single-agent MDPs; our theorem extends the intuition to multi-agent settings with partial observability by showing that undesirable equilibria become transient once shaping makes them strictly sub-optimal.

Empirical validation. In Section 7 we instantiate λ = λ min + 0.1 and observe η emp = 0.06 ± 0.01 in a population of rational learners and η emp = 0.19 ± 0.03 when 5% of agents are Byzantine, matching the theoretical ceiling to within one standard error.

Taken together, Theorem 5, Proposition 1, and the accompanying empirical evidence show that inexpensive, localized penalties are enough to keep emergent harm on a tight statistical leash while preserving aggregate welfare.

The analytical guarantees of the previous subsections are meaningful only if the accountability layer can run on finite hardware. Proposition 2 quantifies the asymptotic cost of storage and network traffic, while the subsequent paragraph converts those asymptotics into concrete numbers for the deployment profile of Section 4. We conclude with a brief remark on CPU overhead, completing the cost picture. Synthesis. Together with Theorems 3-5, the above bounds confirm that AAF achieves auditable accountability at bounded cost: statistical guarantees hold under packet loss, policy drift, and moderate Byzantine behavior, while the resource footprint fits comfortably within off-the-shelf edge hardware. Section 7 corroborates the analytic predictions across 360 Monte-Carlo trials and performs sensitivity tests over ε loss , g max , and graph topology, demonstrating that the cost envelope remains stable under realistic operational variability.

Section 4 described the design logic of the Adaptive Accountability Framework (AAF). Here we present the systems engineering choices that turn that logic into executable, reproducible artefacts. We walk from the lowest-level event record up to dashboards and DevOps pipelines, so readers can (i) re-run every experiment in Section 7, (ii) port AAF to their own MARL stack, and (iii) audit performance or security claims without inspecting privileged code. Throughout, we respect three engineering constraints: per-agent overhead <5% of the control loop, strict library agnosticism, and regulator-grade provenance.

Languages and libraries: The reference stack uses Python 3.11 for orchestration and Rust 1.76 for latency-critical paths. Numerics rely on NumPy 1.27 and SciPy 1.13; agent scheduling and environment dynamics build on the Mesa ABM framework [30]. Hashing, incremental Granger updates, and kernel-SHAP samplers are either numba-JITed or rewritten in Rust and cross-compiled to WebAssembly/WASI, shaving 2-3× wall time versus CPython on ARM cores.

Hardware profile: All experiments ( §7) run on a Slurm cluster: dual-socket AMD EPYC-7742 nodes (64 c × 2, 256 GiB RAM) plus one NVIDIA A40 GPU per job when deep RL is enabled. The CPU governor is fixed to “performance”; Linux 6.7-LTS runs with the complete-preemption patch to minimize scheduler noise.

Containerization: Every component ships as an OCI image built via nix-flakes. Exact wheel versions and OS packages are pinned, and the resulting SHA-256 digest is embedded into each ledger snapshot-useful when an auditor needs to prove reproducibility long after deployment.

Each control step yields a 40-byte record

where BLAKE3 provides 3× throughput of SHA-256 with 32-byte keyed mode for domain separation [44]. Records accumulate in a lock-free ring (length 256) and stream via framed gRPC (HTTP/2 + MessagePack). Aggregators batch 1,000 records (≈40 KB) and commit a Merkle-prefix insertion; the root hash is stored in FoundationDB 7.2, leveraging its multi-version concurrency control (MVCC) to offer linearizable snapshot reads-essential for post-mortem forensics.

Anchoring to an append-only Immudb chain (4096-bit RSA signatures) happens hourly in an async coroutine so that audit durability never blocks control flow.

Dual-plane networking. The control plane (heartbeats, digests, alarms) rides a tri-replicated Apache Kafka 3.7 cluster with exactly-once semantics. The data plane (high-rate state broadcasts) uses ZeroMQ PUB/SUB encrypted by CurveCP; a 128-entry loss-tolerant queue cushions transient link outages. If a subscriber lags >128 messages it issues I_NEED_CATCHUP and replays the gap from a Kafka compacted topic.

Logical time. Each record carries a Hybrid Logical Clock (HLC) ⟨wall, ctr⟩ that merges a 48-bit physical nanosecond time-stamp with a 16-bit Lamport counter, preserving causal order even under ±250 ms NTP skew [32]. In a 100-node WAN emulation the worst-case mis-ordering was 0.8 ms.

Fault model. All gRPC calls are retried with exponential back-off. A dual circuit breaker in envoy-proxy halts cascading failure: agent-side breakers trip at >20% error rate; server-side breakers trip at P50 latency >75 ms. Agents then fall back to local logging and best-effort UDP gossip until the breaker half-opens.

Supervisor replicas subscribe to FoundationDB deltas every 20 ms via a watch. An asyncio task (i) updates normspecific Z (m) t vectors, (ii) evolves the CUSUM state S (m) t

, and (iii) writes ⟨S t , h t ⟩ to Redis-5 with TTL = 30 s. Mean compute cost is 0.07 ms per cycle on a 2.2 GHz Xeon-<1% of the control loop.

When an alarm fires the governor publishes a RewardPatch Protobuf over ZeroMQ. Agents verify an ECDSA-P-384 signature, apply the penalty in O(1), and-if running a neural policy-hot-load a two-layer “safety head” compiled to WebAssembly using wasmtime-17. P99 alarm-to-patch latency is <22 ms on a 1 Gbit LAN.

Cryptography. Event digests use keyed BLAKE3; hybrid TLS 1.3 sessions negotiate x25519-Kyber768 for postquantum forward secrecy [7]. Snapshot hashes are counter-signed in an HSM with ECDSA-P-384 and time-stamped via RFC-3161. PII minimisation. Agent IDs are pseudonymised via AES-SIV; linking records requires access to both the ledger and an escrowed look-up table, meeting the “unlinkability unless audited” clause of the EU AI Act.

Nightly Monte-Carlo sweeps (10 seeds, 5 topologies, 3 adversary mixes) run in GitHub Actions via act-runners. Latency or FP-rate regressions block merge to main. Production deploys via Helm on Kubernetes 1.29 (Calico CNI); canary replicas (10% traffic shadow) must sustain FP < 5% and patch delivery < 50 ms for 15 min or Argo rolls back automatically. Prometheus 2.51 scrapes 87 metrics per pod; Grafana panels visualize (i) long-run compromise ratio, (ii) per-agent SHAP responsibility, and (iii) alarm-to-patch latency histogram. PDF snapshots ship weekly to a WORM-locked S3 bucket.

Operational dashboard. A Next.js 14 front-end (OIDC via Keycloak) renders social reward, Gini index, active alerts, and a DAG explorer that reconstructs the causal sub-graph around sentinel event e ⋆ ; edges are color-coded by responsibility mass.

Data-science notebook. A multi-tenant JupyterHub mounts read-only FoundationDB snapshots via fdb-sqlean. Analysts can run SQL-like queries, export to Pandas, and push results into statistical notebooks. Role-based access control enforces “least privilege” (GDPR Recital 78).

AAF touches only the message bus (digest gossip) and the reward API. Any MARL stack that exposes these hooks-RLlib, PettingZoo, OpenSpiel, Unity ML-Agents-can host AAF. For resource-starved hardware we supply a 32-KB Rust micro-kernel that maintains the ring buffer, computes BLAKE3 hashes, and emits CAN-FD frames on an ARM Cortex-M7 running at 600 kHz.

With these engineering choices AAF remains bandwidth-frugal, cryptographically auditable, and platform-agnostic, fully instantiating the theoretical model of Sections 4-5. The next section benchmarks this implementation and probes how closely empirical behavior tracks our analytical bounds.

This section empirically evaluates the Adaptive Accountability Framework (AAF). We (i) describe the simulation environment and parameter grid, (ii) state the evaluation metrics and baselines, and (iii) report quantitative and qualitative results that validate the analytical guarantees of Section 5. All experiments are fully reproducible: a one-line nix flake command (’nix run github:alqithami/aaf-sim’) launches the end-to-end pipeline on either local CPU or the Slurm cluster described in §6.

Our principal benchmark is a resource-sharing game that abstracts distributed power-grid dispatch, shared wireless bandwidth, or cloud capacity brokering. The environment maintains a pool of R max = 100 resource units replenished at a fixed rate R in per step. An agent A i submits a request q i,t ∈ [0, q max ] and receives an allocation a i,t from a proportional-share rule modulated by the distribution policy parameter:

where α = 1 corresponds to fair share and α → 0 enforces strict proportionality; dist_policy=none sets α = 1.

Agents observe only their own allocation plus a noisy 2-hop neighborhood view on a Watts-Strogatz graph [65]; partial observability is toggled by exposing (on) or hiding (off) a noisy global queue length.

Each agent earns r private i,t

= a i,t -penaltyFactor • 1{q i,t ≥ 0.6R max }, so large greedy requests incur a cost. A SOCIAL bonus r social With five random seeds per cell the design comprises 3 × 3 × 2 × 3 × 5 = 270 full-length runs (1,000 steps each). An additional 90 runs inject a 5% Byzantine sub-population that always requests the maximum resource.

  1. PPO-only: agents train under proximal policy optimization without any audit or intervention; this is the strongest “naïve learning” baseline. 2. Static guard: a centralized threshold caps q i,t ≤ 0.6R max but performs no causal tracing.

Evaluation metrics. (i) Norm-violation ratio C T /T ; (ii) Detection delay T alarm -τ ⋆ ; (iii) Social welfare J soc = 1 T t

1 N i r i,t ; (iv) Inequality via the Gini index [21]; and (v) Bandwidth overhead.

Across all 360 configurations, AAF (i) reduces the compromise ratio from C T /T = 0.48 ± 0.11 (PPO-only) to 0.07 ± 0.02, (ii) achieves a median detection delay of 9 steps (95th-percentile 17), in line with Corollary 2, and (iii) lifts mean social welfare by 12-18% over PPO-only while reducing the Gini index by up to 33%. Bandwidth overhead never exceeds 5.4% of the application traffic budget.

Table 1 summarises the N = 10, 50 slices. When N = 100 (not shown for space) the trends persist but absolute rewards fall ≈10% because congestion dominates. A two-way ANOVA confirms significant effects of partialObs and penaltyFactor (interaction p < .01). High penalties without observability increase inequality, but enabling partial observability or a redistribution rule (α = 0.25) offsets this effect. In almost all conditions the accountability layer nudges the Gini below 0.05-a level often cited as “near-egalitarian” in energy-grid studies [54]. Outliers correspond to transient hoarding bursts that were detected but not fully countered before run end; these motivate future work on dynamic penalty scaling.

In the N = 50, penalty=0.05, partialObs=off, α = 1 scenario, a subset of four agents formed a “cartel” that requested > 0.8R max each step. The governor flagged the resulting spike in Gini at t = 137; reward shaping triggered at t = 146, lowering the cartel’s average per-step gain from 10.4 to 3.1. Communication throttling then muted the highest-MI link; the cartel disintegrated by t ≈ 190. Subsequent causal-graph visualization in the Grafana dashboard showed responsibility flowing back to previously sidelined agents, and the final compromise ratio settled at 0.08-well under the η ⋆ = 0.11 analytical ceiling for that parameter set. • Environment: flake.nix specifies exact library versions; Docker images published to GHCR under the same tag. • Randomness: All PRNG seeds recorded in ledger metadata and exported as seeds.csv.

• Hardware: CPU/GPU specs in §6; no vendor-specific features required.

Empirical evidence aligns with the analytical bounds: AAF curbs harmful norms within nine steps median, keeps the compromise ratio below η ⋆ , and delivers double-digit gains in both welfare and equity-all while consuming < 5% additional bandwidth and CPU.

The preceding experiments demonstrate that an end-to-end accountability layer built on cryptographic provenance, online sequential tests, and cost-bounded interventions can markedly suppress collusion and hoarding without sacrificing collective reward. We now interpret these results, dissect residual weaknesses, connect the work to current regulatory and ethical debates, and sketch avenues for extension.

Continuous oversight beats episodic audits: Across every Monte-Carlo seed ( §7) the compromise ratio fell below the target η ⋆ within nine control steps median-roughly two seconds in our 5Hz traffic-signal benchmark. Classical “audit-once” schemes could not have reacted in time; by the time an offline audit completed, the resource pool would already have been captured. The finding supports the behavioral-science claim that timely sanctions deter norm-violating coalitions more effectively than harsher but delayed punishment [47].

Local penalties suffice for global goals: The bounded-compromise theorem (Thm. 5) predicted that a local, Shapleyweighted penalty would globally stabilize the system once λH > g max . Empirically, we observed η emp within one standard error of the analytic ceiling, even when 5% of agents behaved adversarially. This corroborates the long-standing conjecture that potential-based shaping [43] scales from single-to multi-agent settings when augmented with causal attribution.

Finite parameter sweep: Our grid spanned {10, 50, 100} agents and two penalty magnitudes. A 5000-agent swarm, or penalty factors close to unity, might stress the message bus or saturate the intervention budget. Future work should stress-test the framework under orders-of-magnitude larger populations and adversaries who actively spoof causal edges.

Synthetic workload: The resource-sharing game abstracts energy dispatch and cloud bandwidth, but real grids include AC power-flow constraints, non-convex unit-commitment costs, and legal market rules. Porting AAF to such domains may require domain-specific norm predicates and rate-limited interventions to satisfy physical safety margins.

Compute footprint: Although the prototype meets a 5% CPU budget on EPYC and Cortex-A55 cores, embedded PLCs in industrial control may offer only tens of MHz of head-room. A tiny-ML re-implementation or a hardware cryptographic accelerator could mitigate this concern.

Privacy and surveillance. Recording every agent action invites secondary use risks. AAF mitigates the danger via keyed-hash digests, AES-SIV pseudonyms, and an escrowed identity table ( §6). Nonetheless, deployers should carry out a data-protection impact assessment under GDPR Art. 35 and delete raw ring buffers once digests are anchored.

Over-deterrence and chilling effects. Reward penalties can skew agents toward risk-averse behavior, potentially lowering innovation in financial-market MAS [15]. A human-in-the-loop override-triggered by the yellow-flag dashboard-helps balance safety with exploration.

Procedural fairness. By design, Shapley-style attribution apportions blame according to marginal causal contribution. If agents differ in observability or computing power, this may correlate with protected attributes, leading to distributive inequity [38]. Auditors should therefore scrutinize ρ i (e) distributions for disparate impact.

The EU AI Act (2024) and the NIST AI Risk-Management Framework (2023) both highlight collective emergent risks -but offer few technical recipes for tracing responsibility in networked AI systems. AAF operationalizes these policy aspirations by (i) anchoring an immutable event ledger; (ii) providing real-time statistical tests with formal α-level guarantees [27]; and (iii) linking those tests to proportionate, auditable interventions. Hence the framework can serve as a reference architecture for regulators drafting sector-specific codes of practice.

Scalable cryptography: Replacing Merkle DAG anchoring with succinct zero-knowledge roll-ups (e.g., Halo 2) could compress the storage footprint by an order of magnitude while enabling selective disclosure proofs.

Meta-norm adaptation: Current norms are fixed at deployment. An interesting extension is a meta-controller that tunes α, λ, and even the set Φ via constrained Bayesian optimization, subject to fairness and throughput SLAs.

Human-aligned explanations: While the Grafana DAG explorer provides low-level proofs, policy operators may prefer natural-language rationales. Coupling the ledger to an LLM-based explainer guarded by the event digests to prevent hallucination could bridge the accountability-interpretability gap [31].

Cross-domain generalization: Finally, deploying AAF in safety-critical cyber-physical systems, autonomous driving rings, or drone swarms where communication is intermittent and hard real-time deadlines exist would test the limits of the bounded-compromise theorem under severe timing jitter.

AAF demonstrates that cryptographically verifiable, statistically rigorous, and computationally light accountability is attainable in contemporary MAS. Yet translating the prototype into industrial and societal infrastructure requires careful attention to privacy guarantees, domain constraints, and human governance processes. These themes set the stage for interdisciplinary collaborations that we outline in the conclusion.

This paper introduced an Adaptive Accountability Framework (AAF) that transforms a networked multi-agent system (MAS) into a self-auditing socio-technical organism. By uniting a tamper-evident Merkle ledger, Shapley-style responsibility scores, time-uniform sequential tests, and cost-bounded interventions, AAF provides-both in theory (Section 5) and practice (Section 7)-formal assurances that harmful emergent norms can be detected within a handful of control steps and suppressed below any designer-chosen ceiling η ⋆ . A4 Budgeted adversary: Adversarial agents harvest at most g max expected utility per step, whereas the supervisor may spend up to c max > g max .

Unless noted otherwise, all expectations and probabilities are taken with respect to the joint probability space that realizes environment dynamics, stochastic policies, packet loss, and observation noise.

A.1 Proof of Lemma 1

Statement recap: For any event e recorded in the ledger, N i=1 ρ i (e) = 1 almost surely.

Proof 9 (Detailed proof.) Write P(e) for the set of all directed source-to-e paths in the causal-edge DAG and let P i (e) ⊆ P(e) denote those whose first edge originates from an action of agent i. Because the DAG has no directed cycles, every path in P(e) has precisely one distinct originating agent; thus the sets (P i (e)) N i=1 form a disjoint partition of P(e). Then Proof 10 (Detailed Proof.)

Step 1: Eventual observation of edges. Fix an event e and a causal path p = ⟨e k0 , . . . , e km = e⟩ of length m ≤ ℓ max ; in our implementation ℓ max = 256 equals the ring-buffer length. Because every edge traversal triggers at most one Granger test, edge (e kj-1 → e kj ) is attempted exactly once. Under A1 the probability that the edge’s record is available when the test is performed is 1 -ε loss . Hence all m edges in p are detected with probability (1 -ε loss ) m ≥ (1 -ε loss ) ℓmax > 0. The Bernoulli trials for distinct attempts are independent, so the indicator that any given edge is never observed has geometric tail. Let X p = 1{p never fully observed}.

Step The first Borel-Cantelli lemma therefore implies P[X p = 1 infinitely often] = 0, i.e. every causal path is eventually observed almost surely (a.s.).

Step Statement recap: With threshold h t = h 0 + √ 2 log t (h 0 > 0), the probability that a spurious causal edge is ever inserted is at most α = e -h 2 0 /2 .

Proof 11 (Detailed proof.) Under H 0 the m-lag Granger residual-variance ratio has exact F distribution F (m, n-m)/m with n = m + 1 degrees-of-freedom [35,26]. Let U t = 1{F t > h t }. Then, by the Chernoff bound for χ 2 tails, P[U t = 1] ≤ e -h 2 t /2 = e -h 2 0 /2 t -1 . Taking the union bound over t ≥ 1 yields

e -h 2 0 /2 t -1 = e -h 2 0 /2 , as desired. Setting h 0 = 2 log(1/α) recovers any target α ∈ (0, 1). □

Statement recap: With gain sequence η t = t -0.6 the adaptive CUSUM maintains long-run false-alarm frequency α.

Proof 12 (Detailed Proof.) Define f (h) = P H0 (S ≥ h) -α. The process {h t } satisfies the Robbins-Monro recursion h t+1 = h t -η t f (h t ) + η t ϵ t+1 with martingale difference ϵ t+1 = Z t+1 -P H0 (S ≥ h t ). Because S t has bounded increments, ϵ t+1 is square integrable with E ϵ t+1 = 0 and sup t E ϵ 2 t+1 < ∞. Assumption t η 2 t < ∞ ensures the Kushner-Clark condition, so h t → h ⋆ a.s. where f (h ⋆ ) = 0 [33]. Ergodicity of the CUSUM statistic under H 0 implies A3 (Diminishing learning rate): The exponent ξ > 1 2 guarantees t ∥π i,t+1 -π i,t ∥ < ∞, which is sufficient for the bounded-difference martingale arguments. Empirically, RL schedulers such as Adam with a t -1 decay meet this requirement. A4 (Budget dominance): If g max ≥ c max an adversary can match or exceed the supervisor’s corrective power, allowing C T /T → 1 in the worst case; no bounded-cost policy can prevent systemic capture. Thus the assumption is information-theoretically tight.

The lemmas and theorems above rigorously ground the empirical claims in Section 7: the ledger converges despite 20% packet loss, the detector maintains a 5% false-alarm rate, and inexpensive interventions provably cap long-run harm.

A.7 Proof of Corollary 2(Lorden Bound for Adaptive CUSUM)

Statement recap. For a persistent drift ∆ > 0 starting at τ ⋆ we must show

where h ⋆ = lim t→∞ h t .

Proof 14 Conditional on h t → h ⋆ (Theorem 4) the adaptive rule behaves like a fixed-threshold CUSUM with threshold h ⋆ . Page’s original result [48] bounds the expected delay by h ⋆ /(∆ -δ) when the post-change mean is µ 0 + δ + ∆.

Taking the supremum over τ ⋆ yields the worst-case delay, completing the proof. □

log t; Phase 3: Merkle-DAG maintenance; VT ← VT ∪ {e}; MERKLEINSERT digest(e) ; return updated LT and Merkle root;

provides the full argument. C. Computational footprint. Updating (Z t , S t , h t ) for |Φ| = 3 norms requires 38 ± 4 µs on a 2.2 GHz Xeon Silver and 0.27 ± 0.02 ms on an ARM Cortex-A55-well below the five-per-cent budget stipulated in §6. Memory cost is O(1) per norm because only the current CUSUM state and adaptive threshold are stored. D. Practical interpretation.

| ≤ ∆ max .

Elliptic Curve Digital Signature Algorithm[29].

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut