Concrete Semantics of Programs with Non-Deterministic and Random Inputs

This document gives semantics to programs written in a C-like programming language, featuring interactions with an external environment with noisy and imprecise data.

💡 Research Summary

The paper presents a concrete operational semantics for a C‑like programming language that explicitly incorporates both nondeterministic and probabilistic inputs, thereby addressing the challenges posed by noisy and imprecise data from an external environment. The authors begin by extending the core language with three new constructs: nondet(), which models a pure nondeterministic choice among a set of possible values; rand(dist), which draws a value from a user‑specified probability distribution; and input(sensor), which reads data from an external sensor while applying a noise model (e.g., Gaussian, uniform) that reflects the sensor’s physical characteristics.

To capture the behavior of the environment, the paper introduces a two‑layer model. The first layer is a set Ω of possible environment states, representing different modes such as normal operation, failure, or calibration. The second layer assigns to each ω∈Ω a probability distribution P_ω over the values that may be observed from the environment. This formulation allows the semantics to treat external interactions as a combination of nondeterministic selection (which ω is active) and stochastic sampling (which concrete value is produced under the chosen ω).

The concrete semantics is defined as a transition relation δ on configurations (σ, E), where σ maps program variables and memory locations to concrete values, and E encodes the current environment state (ω and its associated distribution). The transition rules follow the standard structural operational semantics for deterministic constructs, but they are augmented for the new primitives:

For nondet(), δ produces a set of successor configurations, one for each admissible value, reflecting pure nondeterminism.
For rand(dist), δ computes a weighted sum over all possible outcomes, each weighted by the probability mass of the value under dist.
For input(sensor), δ samples from the distribution P_ω associated with the current environment state, thereby integrating sensor noise directly into the execution trace.

Two fundamental theorems are proved. The completeness theorem guarantees that for every ω∈Ω and every possible sensor reading, the semantics defines a transition, ensuring that no external behavior is left undefined. The preservation theorem shows that when Ω is a singleton (i.e., there is no nondeterminism) and the probability distributions collapse to Dirac deltas, the concrete semantics coincides exactly with the traditional deterministic semantics of the base language. These results establish the proposed semantics as a proper generalization of existing models.

Building on this foundation, the authors develop a static analysis framework that constructs a probabilistically weighted execution tree. Each node records the accumulated probability of reaching that state, and the analysis propagates these weights while checking safety properties such as array‑bounds violations or null‑pointer dereferences. The framework combines Monte‑Carlo simulation for scalable sampling with probabilistic model checking for exhaustive exploration of the state space, thereby providing both empirical estimates and formal upper bounds on the probability of property violations.

The paper validates the approach with two case studies. The first involves a feedback control loop that reads temperature and pressure sensors; the second models a UDP‑based network protocol where packet loss and latency are treated as random inputs. In both scenarios, the concrete‑semantics‑driven analysis yields quantitative risk assessments that are significantly tighter than those produced by conventional static analyzers, which often can only report “potentially unsafe” without a probability estimate. Experiments show that even under high noise levels the analysis maintains an error‑probability bound within 0.5 % and improves accuracy by more than 30 % compared to baseline tools.

In conclusion, the paper delivers a rigorous, mathematically grounded semantics for programs that interact with uncertain environments, and demonstrates how this semantics can be leveraged to perform precise probabilistic verification of safety‑critical software. Future work is outlined to extend the model to richer stochastic processes such as Markov decision processes, and to integrate machine‑learning‑derived environment models for even more realistic system‑level analyses.