Almost Asymptotically Optimal Active Clustering Through Pairwise Observations
We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses. At each time step, an agent is allowed to query pairs of items and observe bandit binary feedback. If the pair of items belongs to the same (resp.\ different) cluster, the observed feedback is $1$ with probability $p>1/2$ (resp.\ $q<1/2$). Leveraging the ubiquitous change-of-measure technique, we establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the clustering accuracy, formulated as a sup-inf optimization problem. Building on this theoretical foundation, we design an asymptotically optimal algorithm in which the stopping criterion involves an empirical version of the inner infimum – the Generalized Likelihood Ratio (GLR) statistic – being compared to a threshold. We develop a computationally feasible variant of the GLR statistic and show that its performance gap to the lower bound can be accurately empirically estimated and remains within a constant multiple of the lower bound.
💡 Research Summary
The paper studies the problem of clustering M items into an unknown number K of groups by actively querying pairs of items and observing noisy binary feedback. When a queried pair belongs to the same cluster, the oracle returns 1 with probability p (> ½); otherwise it returns 1 with probability q (< ½). Both p and q are unknown constants. The goal is to design a δ‑correct algorithm (i.e., it outputs the true clustering with probability at least 1 − δ) that minimizes the expected number of queries E
Comments & Academic Discussion
Loading comments...
Leave a Comment