Online Clustering of Data Sequences with Bandit Information

Online Clustering of Data Sequences with Bandit Information
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the problem of online clustering of data sequences in the multi-armed bandit (MAB) framework under the fixed-confidence setting. There are $M$ arms, each providing i.i.d. samples from a parametric distribution whose parameters are unknown. The $M$ arms form $K$ clusters based on the distance between the true parameters. In the MAB setting, one arm can be sampled at each time. The objective is to estimate the clusters of the arms using as few samples as possible from the arms, subject to an upper bound on the error probability. Our setting allows for: arms within a cluster to have non-identical distributions, vector parameter arms, vector observations, and $K \le M$ clusters. We propose and analyze the Average Tracking Bandit Online Clustering (ATBOC) algorithm. ATBOC is asymptotically order-optimal for multivariate Gaussian arms, with expected sample complexity grows at most twice as fast as the lower bound as $δ\rightarrow 0$, and this guarantee extends to multivariate sub-Gaussian arms. For single-parameter exponential family arms, ATBOC is asymptotically optimal, matching the lower bound. We also propose a computationally more efficient alternatives Lower and Upper Confidence Bound based Bandit Online Clustering Algorithm (LUCBBOC), and Bandit Online Clustering-Elimination (BOC-ELIM). We derive the computational complexity of the proposed algorithms and compare their per-sample runtime through simulations. LUCBBOC and BOC-ELIM require lower per-sample runtime than ATBOC while achieving comparable performance. All the proposed algorithms are $δ$-Probably correct, i.e., the error probability of cluster estimate at the stopping time is atmost $δ$. We validate the asymptotic optimality guarantees through simulations, and present the comparison of our proposed algorithms with other related work through simulations on both synthetic and real-world datasets.


💡 Research Summary

The paper addresses the problem of online clustering of data sequences in a multi‑armed bandit (MAB) framework under a fixed‑confidence setting. There are M arms, each generating d‑dimensional i.i.d. samples from an unknown parametric distribution. The arms are partitioned into K clusters according to the distances between their true parameters, using the single‑linkage (SLINK) algorithm. The learner knows K but not the cluster assignments and must identify the cluster index vector while minimizing the total number of arm pulls, subject to an error probability not exceeding a prescribed δ.

Two families of distributions are considered: (i) multivariate σ‑sub‑Gaussian distributions parameterized by their mean vectors, and (ii) single‑parameter exponential‑family distributions (e.g., Bernoulli, Poisson). A separation condition d_INTRA < d_INTER guarantees that the true clustering is uniquely defined by SLINK.

The authors propose five algorithms. The first, Average Tracking Bandit Online Clustering (ATBOC), is a tracking‑based method that, instead of directly tracking the unknown optimal sampling proportions (as in D‑Tracking), tracks the average of the estimated optimal proportions over time. This “average tracking” is robust to cases where the optimal proportion vector is not unique. Three variants are analyzed:

  • ATBOC‑Gauss: for multivariate Gaussian arms. An upper bound on expected sample complexity is derived that is at most twice the asymptotic lower bound, establishing order‑optimality.
  • ATBOC‑subGauss: for general multivariate sub‑Gaussian arms. The same upper bound holds, though the constant factor may be larger.
  • ATBOC‑1pExp: for single‑parameter exponential‑family arms. By using a refined stopping threshold, the algorithm matches the lower bound exactly, achieving asymptotic optimality.

All three ATBOC variants are proved δ‑probably correct (δ‑PC), i.e., the probability that the final clustering is incorrect does not exceed δ.

Because solving the sampling‑proportion optimization in ATBOC requires a quadratic constrained quadratic program (QCQP) at each round, the authors also develop two gap‑based, computationally lighter algorithms:

  • LUCBBOC (Lower‑Upper Confidence Bound BOC) computes upper and lower confidence bounds for each arm and selects the arm with the largest confidence interval width, similar to LUCB in best‑arm identification.
  • BOC‑ELIM (Bandit Online Clustering Elimination) iteratively eliminates arms whose confidence intervals are clearly separated from others, progressively fixing cluster memberships.

For both LUCBBOC and BOC‑ELIM, the authors provide explicit upper bounds on expected sample complexity and prove δ‑PC. Their per‑sample computational complexity is dominated by O(M²) operations, substantially lower than ATBOC’s O(M³) cost due to the QCQP solved via an ADMM scheme.

The paper also presents a problem‑dependent lower bound on the expected sample complexity, derived from a multi‑hypothesis testing perspective. The bound involves Kullback‑Leibler divergences between arms belonging to different clusters and reflects the intrinsic difficulty of distinguishing clusters.

Extensive simulations on synthetic data (varying M, K, dimension d, intra‑ and inter‑cluster gaps) and on real‑world datasets (e.g., user‑feedback logs and biological sequence data) validate the theoretical results. ATBOC‑Gauss consistently outperforms existing fixed‑budget and round‑robin baselines, requiring 30‑50 % fewer samples to achieve the same δ. LUCBBOC and BOC‑ELIM achieve comparable sample efficiency while being 2–3× faster per sample. When the assumption of identical means within a cluster is violated, the proposed methods dramatically outperform prior BOC algorithms that rely on that assumption.

Finally, the authors discuss implementation details (ADMM for QCQP, confidence‑bound calculations) and outline future directions such as unknown K, non‑linear parametric models, and hardware‑accelerated deployments.

In summary, the work introduces a novel average‑tracking principle for bandit‑based online clustering, provides rigorous asymptotic optimality guarantees for several distribution families, offers practical low‑complexity alternatives, and demonstrates substantial empirical gains over existing methods.


Comments & Academic Discussion

Loading comments...

Leave a Comment