An Approximation Ratio for Biclustering

Reading time: 5 minute
...

📝 Original Info

  • Title: An Approximation Ratio for Biclustering
  • ArXiv ID: 0712.2682
  • Date: 2007-12-17
  • Authors: Researchers from original ArXiv paper

📝 Abstract

The problem of biclustering consists of the simultaneous clustering of rows and columns of a matrix such that each of the submatrices induced by a pair of row and column clusters is as uniform as possible. In this paper we approximate the optimal biclustering by applying one-way clustering algorithms independently on the rows and on the columns of the input matrix. We show that such a solution yields a worst-case approximation ratio of 1+sqrt(2) under L1-norm for 0-1 valued matrices, and of 2 under L2-norm for real valued matrices.

💡 Deep Analysis

Deep Dive into An Approximation Ratio for Biclustering.

The problem of biclustering consists of the simultaneous clustering of rows and columns of a matrix such that each of the submatrices induced by a pair of row and column clusters is as uniform as possible. In this paper we approximate the optimal biclustering by applying one-way clustering algorithms independently on the rows and on the columns of the input matrix. We show that such a solution yields a worst-case approximation ratio of 1+sqrt(2) under L1-norm for 0-1 valued matrices, and of 2 under L2-norm for real valued matrices.

📄 Full Content

The standard clustering problem [8] consists of partitioning a set of input vectors, such that the vectors in each partition (cluster) are close to one another according to some predefined distance function. This formulation is the objective of the popular K-means algorithm (see, for example, [9]), (c) Biclusters of the data matrix returned by our scheme, that is, using twice an optimal one-way clustering algorithm, once on the 4 row vectors and another on the 6 column vectors, with L 1 -norm. Resulting clusterings are {R 1 , R 2 } = {{1, 3, 4}, {2}} for rows and {C 1 , C 2 , C 3 } = {{b, f }, {a, e}, {d, e}} for columns. For visual clarity, the rows and columns of the original matrix in (a) have been permuted in (b) and (c) by making the rows (and columns) of a single cluster adjacent.

where K denotes the final number of clusters and the distance function is defined by the L 2 -norm. Another similar example of this formulation is the K-median algorithm (see, for example, [3]), where the distance function is given by the L 1 -norm. Clustering a set of input vectors is a well-known NPhard problem even for K = 2 clusters [4]. Several approximation guarantees have been shown for this formulation of the standard clustering problem (see [3,9,2] and references therein).

Intensive recent research has focused on the discovery of homogeneous substructures in large matrices. This is also one of the goals in the problem of biclustering. Given a set of N rows in M columns from a matrix X, a biclustering algorithm identifies subsets of rows exhibiting similar behavior across a subset of columns, or vice versa. Note that the optimal solution for this problem necessarily requires to cluster the N vectors and the M dimensions simultaneously, thus the name biclustering. Each submatrix of X, induced by a pair of row and column clusters, is typically referred to as a bicluster. See Figure 1 for a simple toy example. The main challenge of a biclustering algorithm lies in the dependency between the row and column partitions, which makes it difficult to identify the optimal biclusters. A change in a row clustering affects the cost of the induced submatrices (biclusters), and as a consequence, the column clustering may also need to be changed to improve the solution.

Finding an optimal solution for the biclustering problem is NP-hard.

This observation follows directly from the reduction of the standard clustering problem (known to be NP-hard) to the biclustering problem by fixing the number of clusters in columns to M . To the best of our knowledge, no algorithm exists that can efficiently approximate biclustering with a proven approximation ratio. The goal of this paper is to propose such an approximation guarantee by means of a very simple scheme. Our approach will consist of relieving the requirement for simultaneous clustering of rows and columns and instead perform them independently. In other words, our final biclusters will correspond to the submatrices of X induced by pairs of row and columns clusters, found independently with a standard clustering algorithm. We sometimes refer to this standard clustering algorithm as one-way clustering. The simplicity of the solution alleviates us from the inconvenient dependency of rows and columns. More importantly, the solution obtained with this approach, despite not being optimal, allows for the study of approximation guarantees on the obtained biclusters.

Here we prove that our solution achieves a worst-case approximation ratio of 1 + √ 2 under L 1 -norm for 0-1 valued matrices, and of 2 under L 2 -norm for real valued matrices.

Finally, note that our final solution is constructed on top of a standard clustering algorithm (applied twice, once in row vectors and the other in column vectors) and therefore, it is necessary to multiply our ratio with the approximation ratio achieved by the used standard clustering algorithm (such as [3,9]). For clarity, we will lift this restriction in the following proofs by assuming that the applied one-way clustering algorithm provides directly an optimal solution to the standard clustering problem.

This basic algorithmic problem and several variations were initially presented in [6] with the name of direct clustering. The same problem and its variations have also been referred to as two-way clustering, co-clustering or subspace clustering. In practice, finding highly homogeneous biclusters has important applications in biological data analysis (see [10] for review and references), where a bicluster may, for example, correspond to an activation pattern common to a group of genes only under specific experimental conditions.

An alternative definition of the basic biclustering problem described in the introduction consists on finding the maximal bicluster in a given matrix. A well-known connection of this alternative formulation is its reduction to the problem of finding a biclique in a bipartite graph [7]. Algorithms for detecting bicliques en

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut