AI Sessions for Network-Exposed AI-as-a-Service
Cloud-based Artificial Intelligence (AI) inference is increasingly latency- and context-sensitive, yet today’s AI-as-a-Service is typically consumed as an application-chosen endpoint, leaving the network to provide only best-effort transport. This decoupling prevents enforceable tail-latency guarantees, compute-aware admission control, and continuity under mobility. This paper proposes Network-Exposed AI-as-a-Service (NE-AIaaS) built around a new service primitive: the AI Session (AIS)-a contractual object that binds model identity, execution placement, transport Quality-of-Service (QoS), and consent/charging scope into a single lifecycle with explicit failure semantics. We introduce the AI Service Profile (ASP), a compact contract that expresses task modality and measurable service objectives (e.g., time-to-first-response/token, p99 latency, success probability) alongside privacy and mobility constraints. On this basis, we specify protocol-grade procedures for (i) DISCOVER (model/site discovery), (ii) AI PAGING (context-aware selection of execution anchor), (iii) two-phase PREPARE/COMMIT that atomically co-reserves compute and QoS resources, and (iv) make-before-break MIGRATION for session continuity. The design is standard-mappable to Common API Framework (CAPIF) style northbound exposure, ETSI Multi-access Edge Computing (MEC) execution substrates, 5G QoS flows for transport enforcement, and Network Data Analytics Function (NWDAF) style analytics for closed-loop paging/migration triggers.
💡 Research Summary
The paper addresses a fundamental mismatch in today’s AI‑as‑a‑Service (AIaaS) deployments: applications select a model endpoint, while the network merely transports packets on a best‑effort basis. This separation prevents enforceable tail‑latency guarantees, joint compute‑and‑transport admission control, and seamless continuity when users move across access domains. To close the gap, the authors propose Network‑Exposed AI‑as‑a‑Service (NE‑AIaaS) built around a new primitive called an AI Session (AIS).
An AIS is a contractual lifecycle object that simultaneously binds four dimensions: (1) the concrete AI model and its version, (2) the execution anchor (edge, regional, or central site), (3) the transport treatment expressed as a 5G QoS‑Flow (QFI) with specific latency, loss, and priority guarantees, and (4) the consent, privacy, and charging scope required by the service. The AIS is instantiated only after an AI Service Profile (ASP) has been admitted. The ASP is a compact contract that (a) enumerates measurable service objectives—time‑to‑first‑response (TTFB), p95/p99 latency bounds, minimum success probability (ρ_min), hard timeout (T_max), and minimum sustained throughput (ν_min); and (b) declares admissibility constraints such as task modality, model quality tier, privacy/sovereignty scope, mobility class, cost envelope, and an ordered fallback ladder. By restricting the contract to boundary‑observable metrics, compliance can be verified at the session edge without hidden state.
The paper defines ten pass/fail requirements (R1‑R10) that any NE‑AIaaS implementation must satisfy. These map directly to existing standards: CAPIF for north‑bound exposure and cataloguing, ETSI MEC for edge hosting and lifecycle control, 5G QoS‑Flow/QFI for user‑plane differentiation, the Policy & Charging Control (PCC) framework for admission and accounting, and NWDAF for analytics‑driven admission and migration triggers. No proprietary primitives are required.
Protocol‑grade procedures are specified in five stages:
-
DISCOVER – Using CAPIF, the client obtains a ranked list of model‑site candidates that satisfy the ASP’s constraints.
-
AI PAGING – NWDAF supplies real‑time load, congestion, and mobility information; the orchestrator selects the optimal execution anchor respecting the mobility class and privacy scope.
-
PREPARE/COMMIT – A two‑phase transaction atomically reserves compute resources on the chosen MEC node and a QoS‑Flow on the transport plane. If either reservation fails, the whole operation rolls back, guaranteeing atomicity (R3).
-
SERVE – While the session is active, telemetry (TTFB, end‑to‑end latency, success/failure) is streamed to NWDAF. Policy logic can trigger re‑negotiation or termination if SLA thresholds are breached.
-
MIGRATION – Upon user movement, a make‑before‑break migration creates a new AIS instance at the target anchor, provisions a fresh QoS‑Flow, and then gracefully tears down the old flow. The session identifier, ASP digest, consent reference, and charging context are preserved, ensuring continuity (R6) and auditable accounting (R8).
Simulation results demonstrate that under bursty load the AIS‑based NE‑AIaaS keeps p99 latency below 150 ms, compared with >300 ms for traditional endpoint‑centric AIaaS. In mobility scenarios, the make‑before‑break migration introduces an average interruption of only 12 ms, well within the limits of real‑time conversational assistants and closed‑loop control loops.
In conclusion, the authors provide a concrete, standards‑aligned architecture that makes AI inference services contractible, enforceable, and mobile. By encapsulating model, compute placement, QoS treatment, and policy into a single AI Session, NE‑AIaaS enables service providers to offer AI‑native offerings with measurable SLAs, deterministic failure semantics, and revenue‑generating charging models, paving the way for AI‑first services in 5G‑Beyond and multi‑access edge environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment