Approximation Algorithms for Bregman Co-clustering and Tensor Clustering

Reading time: 6 minute
...

📝 Original Info

  • Title: Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
  • ArXiv ID: 0812.0389
  • Date: 2009-11-09
  • Authors: Researchers from original ArXiv paper

📝 Abstract

In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.

💡 Deep Analysis

Deep Dive into Approximation Algorithms for Bregman Co-clustering and Tensor Clustering.

In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.

📄 Full Content

Partitioning data points into clusters is a fundamentally hard problem. The well-known Euclidean kmeans problem that partitions the input data points (vectors in R d ) into K clusters while minimizing sums of their squared distances to corresponding cluster centroids, is an NP hard problem [19] (exponential in d). However, simple and frequently used procedures that rapidly obtain local minima exist since a long time [23,28].

Because of its wide applicability and importance, the Euclidean k-means problem has been generalized in several directions. Specific examples relevant to this paper include:

• Bregman clustering [7], where instead of minimizing squared Euclidean distances one minimizes Bregman divergences (which are generalized distance functions, see (3.10) or [13] for details), • Bregman co-clustering [9] (which includes both Euclidean [16] and information-theoretic coclustering [18] as special cases), where the set of input vectors is viewed as a matrix and one simultaneously clusters rows and columns to obtain coherent submatrices (co-clusters), while minimizing a Bregman divergence, and • Tensor clustering or multiway clustering [34], especially the version based on Bregman divergences [8], where one simultaneously clusters along various dimensions of the input tensor.

For these problems too, the commonly used heuristics perform well, but do not provide theoretical guarantees (or at best assure local optimality). For k-means type clustering problems-i.e., problems that group together input vectors into clusters while minimizing “distance” to cluster centroidsthere exist several algorithms that approximate a globally optimal solution. We refer the reader to [1,2,6,27], and the numerous references therein for more details.

In stark contrast, approximation algorithms for tensor clustering are much less studied. We are aware of only two very recent attempts (both papers are from 2008) for the two-dimensional special case of co-clustering, namely, [4] and [31]-and both of the papers follow similar approaches to obtain their approximation guarantees. Both prove a 2α 1 -approximation for Euclidean co-clustering, Puolamäki et al. [31] an additional factor of (1 + √ 2) for binary matrices and an ℓ 1 norm objective, and Anagnostopoulos et al. [4] a factor of 3α 1 for co-clustering real matrices with ℓ p norms. In all factors α 1 is an approximation guarantee for clustering either rows or columns. In this paper, we build upon [4] and obtain approximation algorithms for tensor clustering with Bregman divergences and arbitrary separable metrics such as ℓ p -norms. The latter result is of particular interest for ℓ 1norm based tensor clustering, which may be viewed as a generalization of k-medians to tensors. In the terminology of [7], we focus on the “block average” versions of co-and tensor clustering.

Additional discussion and relevant references for co-clustering can be found in [9], while for the lesser known problem of tensor clustering more background can be gained by referring to [3,8,10,21,29,34].

The main contribution of this paper is the analysis of an approximation algorithm for tensor clustering that achieves an approximation ratio of O(mα), where m is the order of the tensor and α is the approximation factor of a corresponding 1D clustering algorithm. Our results apply to a fairly broad class of objective functions, including metrics such as ℓ p norms or Hilbertian metrics [24,33], and divergence functions such as Bregman divergences [13] (with some assumptions). As corollaries, our results solve two open problems posed by [4], viz., whether their methods for Euclidean co-clustering could be extended to Bregman co-clustering, and if one could extend the approximation guarantees to tensor clustering. Owing to the structure of the algorithm, our results also give insight into proprties of the tensor clustering problem as such, namely, a bound on the amount of information inherent in the joint consideration of several dimensions.

In addition, we provide extensive experimental validation of the theoretical claims, which forms an additional contribution of this paper.

Traditionally, “center” based clustering algorithms seek partitions of columns of an input matrix X = [x 1 , . . . , x n ] into clusters C = {C 1 , . . . , C K }, and find “centers” µ k that minimize the objective

where the function d(x, y) measures cluster quality. The “center” µ k of cluster C k is given by the mean of the points in C k when d(x, y) is a Bregman divergence [7]. Co-clustering extends (2.1) to seek simultaneous partitions (and centers µ IJ ) of rows and columns of X, so that the objective function

is minimized; µ IJ denotes the (scalar) “center” of the cluster described by the row and column index sets, viz., I and J. Formulation (2.2) is easily generalized to tensors, as shown in Section 2.2 below. However, we first recall basic notation about tensors before formally presenting the tensor clustering problem. Tens

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut