RESID: A Practical Stochastic Model for Software Reliability
A new approach called RESID is proposed in this paper for estimating reliability of a software allowing for imperfect debugging. Unlike earlier approaches based on counting number of bugs or modelling inter-failure time gaps, RESID focuses on the probability of “bugginess” of different parts of a program buggy. This perspective allows an easy way to incorporate the structure of the software under test, as well as imperfect debugging. One main design objective behind RESID is ease of implementation in practical scenarios.
💡 Research Summary
The paper introduces RESID (Reliability Estimation for Software under Imperfect Debugging), a stochastic framework designed to estimate software reliability while explicitly accounting for imperfect debugging and program structure. Traditional reliability models fall into two categories: time‑based models that treat inter‑failure times as exponential and assume that each detected bug is perfectly fixed, and count‑based models that treat bug occurrences as Poisson processes but ignore the control‑flow structure of the program. Both approaches have serious practical limitations—time‑based models are unrealistic because debugging is rarely perfect, and count‑based models cannot exploit structural information that could improve estimation accuracy.
RESID addresses these gaps by decomposing a program into “chunks,” defined as sequences of consecutive instructions without internal branching. Each chunk is assumed to be buggy with an a‑priori probability p, and the presence of bugs in different chunks is taken to be independent. When a chunk is debugged, the probability that it remains buggy is multiplied by a factor α∈(0,1), which quantifies debugging inefficiency. After k debugging attempts on a particular chunk, its residual bug probability becomes p·α^k. Thus, α close to 0 models highly effective debugging, while α close to 1 models almost no improvement after each attempt.
Data collection proceeds by instrumenting each chunk with a lightweight logging statement that records the chunk identifier whenever control enters it. For each run of the program with independent random input, three pieces of information are gathered: (1) whether the run terminated successfully, (2) if a bug was observed, the identity of the buggy chunk, and (3) the sequence of visited chunks. In the presence of loops, the exact iteration that first triggers a bug may be unknown; consequently, the log is truncated at the first occurrence of the buggy chunk, because any subsequent execution after a bug is considered unreliable.
From the accumulated logs, the following statistics are extracted: m – the total number of distinct bugs that were detected and (imperfectly) fixed; k – the maximum number of debugging attempts made on any chunk; and n_i – the number of runs in which a chunk that had been debugged i times executed without encountering a bug. The likelihood of the observed data, given p, can be expressed as
L(p) ∝ p^m · ∏_{i=0}^k (1 − p·α^i)^{n_i}
and the log‑likelihood is
ℓ(p) = m log p + Σ_{i=0}^k n_i log(1 − p·α^i) .
The authors prove that ℓ(p) is strictly concave on (0,1) because log p and log(1 − p·α^i) are concave and the coefficients are non‑negative. Consequently, if at least one bug is observed (m>0) and at least one chunk runs correctly on its first attempt (n_0>0), the log‑likelihood has a unique maximum in (0,1). This guarantees the existence and uniqueness of the maximum‑likelihood estimator (MLE) ˆp. The condition n_0>0 is mild; in large programs it holds with probability approaching one.
To obtain ˆp, the paper suggests using a simple bisection algorithm over a small interval
Comments & Academic Discussion
Loading comments...
Leave a Comment