Emergence-as-Code for Self-Governing Reliable Systems
SLO-as-code has made per-service} reliability declarative, but user experience is defined by journeys whose reliability is an emergent property of microservice topology, routing, redundancy, timeouts/fallbacks, shared failure domains, and tail amplification. As a result, journey objectives (e.g., “checkout p99 < 400 ms”) are often maintained outside code and drift as the system evolves, forcing teams to either miss user expectations or over-provision and gate releases with ad-hoc heuristics. We propose Emergence-as-Code (EmaC), a vision for making journey reliability computable and governable via intent plus evidence. An EmaC spec declares journey intent (objective, control-flow operators, allowed actions) and binds it to atomic SLOs and telemetry. A runtime inference component consumes operational artifacts (e.g., tracing and traffic configuration) to synthesize a candidate journey model with provenance and confidence. From the last accepted model, the EmaC compiler/controller derives bounded journey SLOs and budgets under explicit correlation assumptions (optimistic independence vs. pessimistic shared fate), and emits control-plane artifacts (burn-rate alerts, rollout gates, action guards) that are reviewable in a Git workflow. An anonymized artifact repository provides a runnable example specification and generated outputs.
💡 Research Summary
The paper addresses a fundamental gap in modern cloud‑native reliability engineering: while SLO‑as‑code lets teams version per‑service availability and latency targets, the user‑facing experience is defined by end‑to‑end journeys whose reliability emerges from a complex interplay of topology, routing, retries, timeouts, fallbacks, shared failure domains, and tail‑latency amplification. Because journey objectives (e.g., “checkout p99 < 400 ms”) are typically maintained outside code—often in product specs or dashboards—they drift as services evolve, creating a recurring “sync tax” that either hides regressions or forces over‑provisioning.
To solve this, the authors propose Emergence‑as‑Code (EmaC), a vision that makes journey‑level reliability declarative, computable, and governable. EmaC separates intent (what the team wants: a journey objective, a control‑flow expression built from a small operator set, and a governance policy) from evidence (the actual runtime artifacts such as distributed traces, service‑mesh configuration, and deployment metadata). An intent file declares a journey as a tree of operators—Series, Parallel, Cond, Race, K‑of‑N, Timeout—each leaf bound to an atomic SLO (e.g., OpenSLO) and telemetry source.
A runtime inference component continuously consumes evidence and synthesizes a candidate journey model: the effective operator graph, branch probabilities, redundancy sets, and hypothesized failure domains, each annotated with provenance and confidence scores. The model need not be perfect; manual operator specifications can be mixed with inferred data, and any mismatch between declared and inferred failure domains is surfaced as a reviewable delta.
From the last accepted model, an EmaC compiler/controller derives bounded journey SLOs. Availability is expressed as an interval (
Comments & Academic Discussion
Loading comments...
Leave a Comment