Rigorous Computing of Rectangle Scan Probabilities for Markov Increments
Extending recent work of Corrado, we derive an algorithm that computes rigorous upper and lower bounds for rectangle scan probabilities for Markov increments. We experimentally examine the closeness of the bounds computed by the algorithm and we examine the range of tractable input variables.
💡 Research Summary
The paper addresses the problem of computing rectangle scan probabilities for Markov increments, a class of probabilities that arise in scan statistics such as “does there exist a run of ℓ consecutive boxes containing more than k events?”. Formally, for a multinomial random vector N∼Mn,p the task is to evaluate
P(N₁+…+N_ℓ ≤ k, …, N_{d‑ℓ+1}+…+N_d ≤ k).
A naïve approach would enumerate the support D={x∈ℕ^d : Σ_i x_i=n}, but |D| grows combinatorially, making direct summation infeasible for realistic n and d.
The key insight is to view the multinomial vector as a Markov increment Y=(Y₁,…,Y_d), where each Y_k is the count in box k and the cumulative sums S_k=Σ_{i=1}^k Y_i form a Markov chain. This structure allows the probability of interest to be expressed as a joint event of the form
P(Y₁∈A₁, …, Y_d∈A_d)
with appropriately chosen finite sets A_k.
Theorem 2.1 establishes a recursion for the joint probabilities
p(k,x)=P(X_k=x, Y₁∈A₁,…,Y_k∈A_k)
as
p(k,x)=∑{y∈A_k} P(X_k=x | X{k‑1}=x·y⁻¹) · p(k‑1, x·y⁻¹).
Here (X,·) is a group, and the mapping f_k(x_{k‑1},x_k)=x_{k‑1}⁻¹·x_k is a bijection, guaranteeing that the recursion is exact. The recursion is a dynamic‑programming scheme that only needs addition and multiplication.
To handle the rectangle scan (ℓ‑window) case, the paper bundles ℓ consecutive increments into a vector V_k=(Y_k,…,Y_{k+ℓ‑1}) and shows (Lemma 3.1) that the sequence (V_k){k=1}^{d‑ℓ+1} is itself a Markov chain on X^ℓ. The original event becomes
P(V₁∈B₁, …, V{d‑ℓ+1}∈B_{d‑ℓ+1})
where B_k consists of ℓ‑tuples whose sum lies in A_k. When B_k is infinite, the authors prove that a finite subset M_k⊂B_k can be chosen without changing the probability (e.g., for X=ℤ with addition, M_k={ (y₁,…,y_ℓ)∈ℕ^ℓ : Σ y_i ≤ k }).
Algorithm A implements the recursion:
- Initialise p(1,x)=P(X₁=x) for x∈A₁.
- For k=2,…,d compute p(k,·) using the recursion over the Cartesian product A₁·…·A_k.
- Return the sum of p(d,·) over A₁·…·A_d.
All operations are positive, so the algorithm can be executed under two opposite IEEE‑754 rounding modes: “round‑up” (producing an upper bound) and “round‑down” (producing a lower bound). Lemma 5.1 guarantees that the interval formed by these two results always contains the exact probability. The paper derives explicit formulas for absolute and relative errors of such interval estimates and shows that, in double precision, the maximal relative error is bounded by 1/(2⁵³+1)≈1.1·10⁻¹⁶ for probabilities in the range
Comments & Academic Discussion
Loading comments...
Leave a Comment