ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code
AI-assisted software generation has increased development speed, but it has also amplified a persistent engineering problem: systems that are functionally correct may still be structurally insecure. In practice, prompt-based security review with large language models often suffers from uneven coverage, weak reproducibility, unsupported findings, and the absence of an immutable audit trail. The ESAA architecture addresses a related governance problem in agentic software engineering by separating heuristic agent cognition from deterministic state mutation through append-only events, constrained outputs, and replay-based verification. This paper presents ESAA-Security, a domain-specific specialization of ESAA for agent-assisted security auditing of software repositories, with particular emphasis on AI-generated or AI-modified code. ESAA-Security structures auditing as a governed execution pipeline with four phases reconnaissance, domain audit execution, risk classification, and final reporting and operationalizes the workflow into 26 tasks, 16 security domains, and 95 executable checks. The framework produces structured check results, vulnerability inventories, severity classifications, risk matrices, remediation guidance, executive summaries, and a final markdown/JSON audit report. The central idea is that security review should not be modeled as a free-form conversation with an LLM, but as an evidence-oriented audit process governed by contracts and events. In ESAA-Security, agents emit structured intentions under constrained protocols; the orchestrator validates them, persists accepted outputs to an append-only log, reprojects derived views, and verifies consistency through replay and hashing. The result is a traceable, reproducible, and risk-oriented audit architecture whose final report is auditable by construction.
💡 Research Summary
The paper introduces ESAA‑Security, a domain‑specific specialization of the Event‑Sourced, Auditable Architecture (ESAA) designed to address the shortcomings of current large‑language‑model (LLM) based security reviews, especially for code that is generated or heavily modified by AI. Traditional prompt‑driven reviews suffer from uneven coverage, poor reproducibility, unsupported findings, and a lack of immutable audit trails. ESAA‑Security tackles these problems by separating the heuristic reasoning of autonomous agents from deterministic state mutation through an append‑only event log, contract‑bound structured outputs, and replay‑based verification.
The architecture organizes a security audit into a four‑phase pipeline:
- Reconnaissance – automatic discovery of the technology stack, architecture, data flows, and attack surfaces, with all observations recorded as events.
- Domain Audit Execution – execution of playbook‑driven checks across 16 security domains (secrets, authentication, authorization, input validation, supply‑chain, API security, file upload, cryptography, AI/LLM security, DevSecOps, etc.). A total of 95 concrete checks are defined, each producing a structured JSON object containing check identifier, status, evidence, severity, explanation, and remediation guidance.
- Risk Classification – aggregation of check‑level findings into a vulnerability inventory, assignment of CIA‑based severity levels (CRITICAL, HIGH, MEDIUM, LOW, INFO), and construction of a risk matrix that maps impact and remediation priority.
- Final Reporting – generation of technical remediation steps, best‑practice guidance, an executive summary with a 0‑100 security score, and a final audit report delivered in both markdown and JSON formats.
All agent actions are mediated by an orchestrator that validates intents against pre‑defined JSON schemas (contracts). The orchestrator enforces a strict set of invariants: claim‑before‑work, complete‑after‑work, prior‑status consistency, lock ownership, boundary discipline, and immutability of completed tasks. Any violation results in immediate rejection (fail‑closed), preventing malformed or malicious events from contaminating the log.
The event‑sourcing core ensures that the append‑only log is the single source of truth. Current audit state is reconstructed by replaying the log into materialized read models (CQRS pattern). Each event is cryptographically chained, guaranteeing tamper‑evidence. At audit termination, the orchestrator replays the entire log to verify that the derived state matches the final report, making the report auditable by construction.
A notable contribution is the inclusion of an AI/LLM security domain, which adds checks for prompt injection, model extraction, data leakage, and other threats unique to AI‑augmented development. This extends traditional OWASP Top 10 and ASVS coverage to address emerging risks in AI‑generated software.
The authors outline three research questions:
- RQ1: Can an event‑sourced execution model make agent‑assisted security audits replay‑verifiable and traceable at the level of findings, classifications, and final reports?
- RQ2: Can security review be operationalized as a structured audit process with explicit domain coverage, check‑level evidence, and risk‑oriented outputs rather than free‑form prompting?
- RQ3: Do the artifacts produced (severity classification, risk matrix, technical fixes, executive report) provide practical value for prioritizing remediation in AI‑generated software?
Evaluation is designed around five dimensions: protocol compliance, replay‑verifiable integrity, coverage completeness, artifact completeness, and usefulness of the risk report. The plan includes two case studies of differing scale to assess both the orchestrator run and the full artifact chain.
Key insights from the paper include:
- Governed State Transitions: By forcing agents to emit only contract‑validated intents, the system eliminates the nondeterminism inherent in LLM text generation.
- Immutable Audit Trail: The append‑only log, combined with cryptographic hashing, provides a tamper‑evident record that can be independently verified.
- Deterministic Re‑creation: Replay of the event log guarantees that any stakeholder can reconstruct the exact audit state, ensuring reproducibility.
- Risk‑Centric Output: Structured findings feed directly into a risk matrix and executive summary, turning raw vulnerability data into actionable business decisions.
- Extensibility: The layered design (roadmap, playbooks, contracts, event store, read models) allows the same framework to be adapted for other compliance or quality‑assurance domains.
In conclusion, ESAA‑Security reframes security auditing from an ad‑hoc LLM conversation to a rigorously governed, event‑sourced workflow. This shift delivers traceability, reproducibility, and objective risk assessment, addressing the unique challenges posed by AI‑generated code while providing a foundation that can be extended to broader software governance contexts.
Comments & Academic Discussion
Loading comments...
Leave a Comment