CT-NOR: Representing and Reasoning About Events in Continuous Time
We present a generative model for representing and reasoning about the relationships among events in continuous time. We apply the model to the domain of networked and distributed computing environments where we fit the parameters of the model from timestamp observations, and then use hypothesis testing to discover dependencies between the events and changes in behavior for monitoring and diagnosis. After introducing the model, we present an EM algorithm for fitting the parameters and then present the hypothesis testing approach for both dependence discovery and change-point detection. We validate the approach for both tasks using real data from a trace of network events at Microsoft Research Cambridge. Finally, we formalize the relationship between the proposed model and the noisy-or gate for cases when time can be discretized.
💡 Research Summary
The paper introduces CT‑NOR (Continuous Time Noisy‑OR), a generative probabilistic model designed to capture causal relationships among events that occur in continuous time. Traditional noisy‑or models operate on discretized time and treat the logical OR of binary causes as a probabilistic gate. However, many real‑world systems—especially distributed computing environments—produce event logs with precise timestamps, making a continuous‑time formulation more appropriate.
In CT‑NOR each potential cause is modeled as an independent Poisson process with its own intensity (λ) and a stochastic delay distribution (θ) governing the time between the cause’s occurrence and its possible influence on a target event. The target event’s timestamp is defined as the minimum of all such delayed arrival times, effectively turning the logical OR into a “minimum‑delay” operator. A background Poisson process with intensity μ accounts for spontaneous target events that are not triggered by any observed cause.
Parameter estimation is performed via an Expectation‑Maximization (EM) algorithm. In the E‑step the current parameters are used to compute the posterior responsibility that each cause contributed to each observed target event; this is essentially a soft assignment of causality based on the conditional Poisson likelihood. The M‑step updates λ, θ, and μ using sufficient statistics derived from the responsibilities, yielding closed‑form update equations that guarantee monotonic increase of the observed‑data log‑likelihood.
Once the model is fitted, two hypothesis‑testing procedures are applied. 1) Dependency discovery: For any cause–target pair the null hypothesis H0: λ = 0 (no influence) is tested against the alternative H1: λ > 0. A likelihood‑ratio (LR) statistic is computed from the EM‑derived maximum likelihoods, and p‑values are obtained via either a χ² approximation or a bootstrap that respects the Poisson structure. 2) Change‑point detection: The event stream is segmented into overlapping windows; parameters are re‑estimated independently in each window. A second LR test compares adjacent windows, and a statistically significant jump in log‑likelihood signals a change point, indicating a shift in system behavior (e.g., configuration change, fault, or workload transition).
The authors validate CT‑NOR on a large trace collected at Microsoft Research Cambridge. The trace contains hundreds of thousands of events such as inter‑server packet transmissions, file‑system accesses, and service invocations (causes), and error or performance‑degradation notifications (targets). Compared with baseline methods—simple cross‑correlation, Granger causality, and a discretized noisy‑or—the CT‑NOR approach achieves higher precision and recall in identifying true causal links. The explicit modeling of delay distributions enables the system to not only flag that a cause influences a target but also to estimate the typical latency, which is valuable for root‑cause analysis. In change‑point experiments, CT‑NOR accurately pinpoints known system upgrades and network outages, demonstrating its suitability for real‑time monitoring and early‑warning systems.
A theoretical contribution of the paper is the formal connection between CT‑NOR and the classic noisy‑or gate. By discretizing time into sufficiently fine bins, the probability of more than one Poisson event per bin becomes negligible; under this limit the minimum‑delay operator reduces to a logical OR, and CT‑NOR converges to the standard noisy‑or formulation. This establishes CT‑NOR as a principled continuous‑time extension rather than an ad‑hoc modification.
Overall, the work offers a mathematically sound, computationally tractable framework for reasoning about temporally precise event streams. Its EM‑based learning, rigorous hypothesis testing, and demonstrated effectiveness on real distributed‑system data make it a strong candidate for deployment in production monitoring pipelines. Moreover, the methodology is generic enough to be applied to other domains where events are timestamped, such as healthcare (patient events), finance (transaction cascades), and IoT sensor networks, opening avenues for broader impact.
Comments & Academic Discussion
Loading comments...
Leave a Comment