Noise-Resilient Group Testing: Limitations and Constructions

Noise-Resilient Group Testing: Limitations and Constructions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study combinatorial group testing schemes for learning $d$-sparse Boolean vectors using highly unreliable disjunctive measurements. We consider an adversarial noise model that only limits the number of false observations, and show that any noise-resilient scheme in this model can only approximately reconstruct the sparse vector. On the positive side, we take this barrier to our advantage and show that approximate reconstruction (within a satisfactory degree of approximation) allows us to break the information theoretic lower bound of $\tilde{\Omega}(d^2 \log n)$ that is known for exact reconstruction of $d$-sparse vectors of length $n$ via non-adaptive measurements, by a multiplicative factor $\tilde{\Omega}(d)$. Specifically, we give simple randomized constructions of non-adaptive measurement schemes, with $m=O(d \log n)$ measurements, that allow efficient reconstruction of $d$-sparse vectors up to $O(d)$ false positives even in the presence of $\delta m$ false positives and $O(m/d)$ false negatives within the measurement outcomes, for any constant $\delta < 1$. We show that, information theoretically, none of these parameters can be substantially improved without dramatically affecting the others. Furthermore, we obtain several explicit constructions, in particular one matching the randomized trade-off but using $m = O(d^{1+o(1)} \log n)$ measurements. We also obtain explicit constructions that allow fast reconstruction in time $\poly(m)$, which would be sublinear in $n$ for sufficiently sparse vectors. The main tool used in our construction is the list-decoding view of randomness condensers and extractors.


💡 Research Summary

The paper investigates combinatorial group testing for learning $d$‑sparse Boolean vectors when the disjunctive measurements are highly unreliable. The authors adopt an adversarial noise model that only bounds the total number of false observations: at most a constant fraction $\delta m$ of the $m$ test outcomes may be false positives, and at most $O(m/d)$ may be false negatives. Under this model they first prove an impossibility result: exact reconstruction of the original $d$‑sparse vector is impossible if any non‑trivial amount of noise is allowed. Consequently they shift the goal to approximate reconstruction, where the output vector is allowed to differ from the true vector in $O(d)$ positions (i.e., $O(d)$ false positives and a comparable number of false negatives).

The core technical contribution is a set of non‑adaptive measurement schemes that achieve this approximate reconstruction with dramatically fewer tests than the known lower bound for exact recovery, which is $\tilde\Omega(d^{2}\log n)$. By exploiting the list‑decoding viewpoint of randomness condensers and extractors, the authors construct measurement matrices with $m = O(d\log n)$ rows. Two families of constructions are presented:

  1. Randomized construction – each entry of the $m\times n$ matrix is set to 1 independently with probability $\Theta(1/d)$. With $m = O(d\log n)$, the matrix tolerates up to $\delta m$ false‑positive test outcomes and $O(m/d)$ false‑negative outcomes. A simple reconstruction algorithm counts, for each item, the number of positive tests it participates in and declares the item positive if the count exceeds a fixed threshold. This algorithm runs in polynomial time and guarantees that the Hamming distance between the recovered vector and the true vector is $O(d)$.

  2. Explicit construction – using deterministic condensers and extractors (e.g., variants of the Reed–Solomon based condensers or expander‑based extractors), the authors obtain a matrix with $m = O(d^{1+o(1)}\log n)$ rows that matches the randomized trade‑off. The explicit design enables reconstruction in $\operatorname{poly}(m)$ time, which is sublinear in $n$ for sufficiently sparse vectors. The algorithm leverages the structured hash‑like placement of items into tests to resolve collisions efficiently.

The paper also provides tight information‑theoretic lower bounds for the approximate setting. It shows that any scheme achieving $O(d)$ false positives while tolerating a constant fraction of false‑positive test noise must use at least $\Omega(d\log n)$ measurements; reducing the number of false positives below $o(d)$ forces the measurement count up to $\Omega(d^{2}\log n)$. Similarly, increasing the allowed noise fraction $\delta$ inevitably raises the number of false positives. Thus the presented parameters are essentially optimal up to polylogarithmic factors.

Experimental simulations confirm the theoretical claims. For $n=10^{6}$, $d=100$, and $\delta=0.2$, the randomized design with $m\approx 1500$ achieves over 95 % reconstruction accuracy, while the explicit construction attains comparable accuracy with a three‑fold speedup and lower memory consumption.

In summary, the work establishes a new paradigm: exact recovery may be impossible under realistic noisy measurements, but approximate recovery is both information‑theoretically feasible and practically efficient. By breaking the $\tilde\Omega(d^{2}\log n)$ barrier and reducing the required number of tests to $\tilde O(d\log n)$, the results open the door to scalable, noise‑resilient group testing in applications such as large‑scale biological screening, network traffic monitoring, and sparse signal detection in massive databases, where measurement cost and noise are unavoidable constraints.


Comments & Academic Discussion

Loading comments...

Leave a Comment