Transductive-Inductive Cluster Approximation Via Multivariate Chebyshev Inequality
Approximating adequate number of clusters in multidimensional data is an open area of research, given a level of compromise made on the quality of acceptable results. The manuscript addresses the issue by formulating a transductive inductive learning…
Authors: Shriprakash Sinha
T ransductiv e-Inductiv e Cluster Appro ximation Via Multiv ariate Cheb yshev Inequalit y Shriprak ash Sinha T u Delft, Dept. of Mediamatics, F aculty of EEMCS, Mekelw eg 4, 2628 CD Delft, The Netherlands {Shriprakash.Sinha@gmail.com} Abstract. Appro ximating adequate num b er of clusters in multidimen- sional data is an op en area of research, giv en a lev el of compromise made on the qualit y of acceptable results. The manuscript addresses the is- sue b y formulating a transductive inductive learning algorithm which uses multiv ariate Cheb yshev inequalit y . Considering clustering problem in imaging, theoretical proofs for a particular lev el of compromise are deriv ed to show the con v ergence of the reconstruction error to a finite v alue with increasing (a) n umber of unseen examples and (b) the num- b er of clusters, resp ectiv ely . Upp er b ounds for these error rates are also pro ved. Non-parametric estimates of these error from a random sample of sequences empirically p oint to a stable num ber of clusters. Lastly , the generalization of algorithm can be applied to m ultidimensional data sets from different fields. Keyw ords: T ransductiv e Inductiv e Learning, Multiv ariate Chebyshev Inequalit y 1 In tro duction The estimation of clusters has b een approached either via a batch framew ork where the en tire data set is presented and different initializations of seed p oints or prototypes tested to find a mo del of cluster that fits the data like in k -means [9] and fuzzy C -means [2] or an online strategy clusters are appro ximated as new examples of data are presen ted one at a time using v ariational Dirichlet pro cesses [7] and incremental clustering based on randomized algorithms [3]. It is widely known that approximation of adequate num ber of clusters using a m ultidimensional data set is a open problem and a v ariety of solutions ha v e b een prop osed using Monte Carlo studies [5], Bay esian-Kullbac k learning scheme in mean squared error setting or gaussian mixture [19], mo del based approaches [6] and information theory [17], to cite a few. This work deviates from the general strategy of defining the num b er of clus- ters apriori. It defines a level of compromise, tolerance or confidence in the qualit y of clustering whic h giv es an upp er b ound on the num b er of clusters gen- erated. Note that this is not at all similar to defining the num b er of clusters. It only indicates the level of confidence in the result and the requirement still is to 2 Sinha estimate the adequate num b er of clusters, which may b e wa y b elow the b ound. The curren t work focuses on dealing with the issue of appro ximating the n um b er of clusters in an online paradigm when the confidence level has b een sp ecified. In certain aspects it finds similarity with the recent work on conformal learning theory [18] and presen ts a no vel w ay of finding the appro ximation of cluster with a degree of confidence. Conformal learning theory [15], which has its foundations in employing a transductiv e-inductive paradigm deals with the idea of estimating the quality of predictions made on the unlabeled example based on the already processed data. Mathematically , given a set of already pro cessed examples ( x 1 , y 1 ), ( x 2 , y 2 ), ..., ( x i − 1 , y i − 1 ), the conformal predictors give a p oint prediction ˆ y for the unseen example x i with a confidence level of Γ ε . Thus it estimates the confidence in the quality of prediction using the original lab el y i after the prediction has b een made and before mo ving on to the next unlab eled example. These predictions are made on the basis of a non-conformity measure which chec ks how m uch the new example is different from a bag of already seen examples. A bag is considered to b e a finite sequence Z ( z 1 , z 2 , ..., z i − 1 ) of examples, where z i = ( x i , y i ). Then using the idea of exchange ability , it is known from [18], under a weak er assump- tion that for every p ositive integer i , every p ermutation π of { 1 , 2 , ..., i } , and ev ery measurable set E ⊂ Z i , the probability distribution P { ( z 1 , z 2 , ... ) ∈ Z ∞ : ( z 1 , z 2 , ..., z i ) ∈ E } = P { ( z 1 , z 2 , ... ) ∈ Z ∞ : ( z π (1) , z π (2) , ..., z π ( i ) ) ∈ E } . A pre- diction for the new example x i is made if and only if the frequency (p-v alue) of exc hanging the new example with another example in the bag is ab ov e certain v alue. This man uscript finds its motiv ation from the foregoing theory of online prediction using transductiv e-inductiv e paradigm. The researc h w ork applies the concept of coupling the creation of new clusters via transduction and aggregation of examples into these clusters via induction. It finds its similarity with [15] in utilizing the idea of prediction region defined by a certain level of confidence. It presen ts a simple algorithm that differs significantly from conformal learning in the following asp ect: (1) Instead of w orking with sequences of data that contain lab els, it w orks on unlabeled sequences. (2) Due the first form ulation, it becomes imp erativ e to estimate the num b er of clusters whic h is not known apriori and the prop osed algorithm comes to rescue by employing a Chebyshev inequality . The inequality helps in pro viding an upp er b ound on the num ber of clusters that could b e generated on a random sample of sequence. (3) The quality of the prediction in conformal learning is chec ked based on the p-v alues generated online. The current algorithms relaxes this restriction in chec king the quality online and just estimates the clusters as the data is presented. (4) The foregoing step makes the algorithm a weak learner as it is sequence dep endent . T o take sto c k of the problem, a global solution to the adequate num b er of cluster is appro ximated by estimating kernel density estimates on a sample of random sequences of a data. Finally , the lev el of compromise captured by a parameter in the inequalit y giv es an upp er bound on the n umber of clusters generated. In case of clustering in static images, for a particular parameter v alue, theoretical pro ofs TI Cluster Approximation Via Multiv ariate Chebyshev Inequality 3 sho w that the reconstruction error conv erges to a finite v alue with increasing (a) n umber of unseen examples and (b) the num ber of clusters. Empirical kernel densit y estimates of reconstruction error o ver a random sample of sequences on toy examples indicate the n umber of clusters that hav e high probability of lo w reconstruction error. It is not ne c essary that lab ele d data ar e always pr esent to c ompute the r e c onstruction err or . In that case the prop osed algorithm stops short at density estimation of approximated num b er of clusters from a random sequence of examples, with a certain degree of confidence. Another dimension of the prop osed work is to use the generalization of mul- tiv ariate formulation of Chebyshev inequality [1], [10], [11]. It is known that Cheb yshev inequality helps in proving the conv ergence of random sequences of differen t data. Also the multiv ariate formulation of the Chebyshev inequalit y fa- cilitates in pro viding bounds for m ultidimensional data whic h is often afflicted b y the curse of dimensionality making it difficult to compute multiv ariate probabil- ities. One of the generalizations that exist for multiv ariate Cheb yshev inequality is the consideration of probability conten t of a multiv ariate normal random v ec- tor to lie in an Euclidean n -dimensional ball [14], [13]. This w ork emplo ys a more conserv ativ e approach is the employmen t of the Euclidian n -dimensional ellip- soid which restricts the spread of the probabilit y conten t [4]. W ork by [9] and [16] provide motiv ation in employmen t of multiv ariate Chebyshev inequality . Efficien t implemen tation and analysis of k -means clustering using the mul- tiv ariate Chebyshev inequality has b een shown in [9]. The current work differs from k -means in (1) providing an online setting to the problem of clustering (2) estimating the num b er of clusters for a particular sequence representation of the same data via conv ergence through ellipsoidal multiv ariate Chebyshev inequal- it y , giv en the level of confidence, compromise or tolerance in the qualit y of results (3) generating global approximations of num b er of clusters from non-parametric estimates of reconstruction error rates for sample of random sequences represen t- ing the same data and (4) not fixing the cluster n umber apriori. It must be noted that in k -means, the solutions may b e different for different initializations for a particular v alue of k but the v alue of k as such remains fixed. In the prop osed w ork, with a high probability , an estimate is made regarding the minimum n um- b er of clusters that can represent the data with low reconstruction error. This outlo ok broadens the p ersp ective of finding multiple solutions which are upp er b ounded as w ell as approximating a particular num b er of cluster which hav e sim- ilar solutions. This similarit y in solutions for a particular num b er of cluster is attributed to the constraint imp osed by the Chebyshev inequality . A key p oint to b e noted is that using increasing levels of compromise or confidence as in conformal learners, the prop osed work generates a nested set of solutions. Low confidence or compromise levels generate tight solutions and vice versa. Thus the prop osed weak learner provides a set of solutions which are robust ov er a sample. This manuscript also extends the w ork of [16] on employmen t of multiv ariate Cheb yshev inequality for image representation. W ork in [16] presents a hybrid mo del based on Hilb ert space filling curve [8] to trav erse through the image. 4 Sinha Since this curve preserve the lo cal information in the neigh b ourho o d of a pixel, it reduces the burden of goo d image representation in low er dimensions. On the other side, it acts as a constrain t on processing the image in a particular fashion. The current work remov es this restriction of pro cessing images via the space filling curves by considering any pixel sequence that represen ts the image under consideration. Again, a single sequence may not be adequate enough for the learner to syn thesize the image to a recongnizable level. This can b e attributed to the fact that in an unsup ervised paradigm the num b er of clusters are not kno wn apriori and also the learner w ould b e sequence dep endent. T o reiterate, the prop osed w ork addresses the issues of • r e c o gnizability , by defining a level of compromise that a user is willing to mak e via the Cheb yshev parameter C p and • se quenc e sp e cific solution , b y taking random samples of pixel sequences from the same image. The latter helps in estimating a p opulation dep endent solution whic h would b e robust and stable synthesis. Regularization of these error ov er appro ximated num b er of clusters for different levels of compromise leads to an adequate num b er of clusters that syn thesize the image with minimal deviation from the original image. Th us the current work pro vides a new p ersp ective in approximation of clus- ter num b er at a particular confidence level. T o test the prop ositions made, the problem of clustering in images is taken into account. Generalizations of the al- gorithm can b e made and applied to different fields inv olving multidimensional data sets in an online setting. Let I b e an RGB image. A pixel in I is an ex- ample x i with N dimensions (here N = 3). It is assumed that examples app ear randomly without rep etition for the prop osed unsup ervised learner. Note that when the s ample space (here the image I ) has finite num b er of examples in it (here M pixels), then the total num ber of unique sequences is M !. When M is large, M ! → ∞ . Currently , the algorithm works on a subset of unique se- quences sampled from M ! sequences. The probability of a sequence to o ccur is equally lik ely (in this case 1 / M ). RGB images from the Berkeley Segmen tation Benc hmark (BSB) [12] ha ve b een taken into consideration for the current study . 2 T ransductive-Inductiv e Learning Algorithm Giv en that the examples ( z i = x i ) in a sequence app ear randomly , the c hallenge is to (1) learn the asso ciation of a particular example to existing clusters or (2) create a new cluster, based on the information pro vided by already pro cessed examples. The current algorithm handles the tw o issues via (1) ev aluation of a nonconformit y measure defined by multiv ariate Chebyshev’s inequality form ula- tion and (2) 1-Nearest Neighbour (NN) transductive learning, resp ectiv ely . The m ultiv ariate formulation of the generalized Chebyshev inequality [4] is applied to a new example using a single Chebyshev parameter. This inequality tests the deviation of the new example from the mean of a cluster of examples and gives a lo wer probabilistic b ound on whether the example b elongs to the cluster under in vestigation. If the new random example passes the test, then it is asso ciated with the cluster and the mean and cov ariance matrix for the cluster is recom- TI Cluster Approximation Via Multiv ariate Chebyshev Inequality 5 Algorithm 1 Unsup ervised Learner 1: pro cedure Unsuper vised Learner ( img , C p ) 2: [ nr ows , ncols ] ← size( img ) 3: M ← nr ows × ncols T otal no. of unseen examples 4: pt cntr ← 0 Number of examples encountered 5: pt idx ← { 1 , 2 , ..., M} T otal no. of indicies of unseen examples Initialize V ariables 6: cluster cntr ← 0 Number of clusters 7: C umE r r val ← 0 Cummulativ e v alue 8: E r r 1 ← [] Error rate as no. of examples increase 9: E r r 2 ← [] Error rate as no. of clusters increase 10: while Card( pt idx ) examples remain unpro cessed do pt idx ⊂ { 1 , 2 , ..., M} 11: Cho ose a random example x i s.t. i ∈ pt idx 12: pt cntr ← pt cntr + 1 13: Up date pt idx i.e pt idx ← pt idx − { i } 14: C RI T E RI ON ← [] 15: E r r val ← 0 16: ∀ q clusters were q ∈ { 0 , 1 , 2 , ..., cluster cntr } 17: E r r val ← P ` k =1 ( x k − E q ( x )) 2 x means all examples in cluster q 18: C umE r r val ← C umE r r val + E r r val 19: Compute D ← ( x i − E q ( x )) T Σ − 1 q ( x i − E q ( x )) 20: If D q < C p C p is Chebyshev parameter 21: C RI T E RI ON ← [ C RI T E RI O N ; D q , q ] 22: If more than one cluster that associates to x i , i.e leng th ( C RI T E RI ON ) ≥ 1 23: Asso ciate x i to selected cluster q with minim um D q 24: E r r 1 ← [ E r r 1 , E r r val /pt cntr ] 25: If x i is not asso ciated with any cluster, i.e sum ( F OU N D ) == 0 26: E r r 2 ← [ E r r 2 , E r r val /pt cntr ] 27: cluster cntr ← cl uster cntr + 1 28: Using 1-NN find x j closest to x i s.t. j ∈ pt idx 29: Up date pt idx i.e pt idx ← pt idx − { j } 30: F orm a new cluster { x i , x j } 31: end while 32: end pro cedure puted. In case there exists more than one cluster which qualify for asso ciation, then the cluster with lo west deviation to the new example is pick ed up for asso- ciation. It is also p ossible to assign the new example to a random chosen cluster from the selected clusters to induce noise and then chec k for the approximations on the num b er of cluster. This has not b een considered in the current work for the time b eing. In case of failure to find any asso ciation, the algorithm employs 1-NN transductive algorithm to find a closest neighbour of the current example under processing. This neighbour together with the curren t example forms a new cluster. Sev eral imp ortant implications arise due to the usage of a probabilistic in- equalit y measure as a nonconformal measure. These will b e elucidated in detail in the later sections. An imp ortant p oint to consider here is the usage of 1-NN 6 Sinha algorithm to create a new cluster. Even though it is known that 1-NN suffers from the problem of the curse of dimensionality , for problems with small dimen- sions, it can b e employ ed for transductive learning. The aim of the prop osed w ork is not to address the curse of dimensionalit y issue. Also, note that in the general sup ervised conformal learning algorithm, a prediction has to b e made b efore the next random example is pro cessed. This is not the case in the cur- ren t unsup ervised framework of the conformal learning algorithm. In case the curren t random example fails to asso ciate with any of the existing clusters, un- der the constrain t yielded by the Chebyshev parameter, the NN helps in finding the closest example (in feature space) from the remaining unpro cessed sample data set, to form a new cluster. Thus the formation of a new cluster dep ends on the strictness of the Chebyshev parameter C p . The pro cedure for unsup ervised conformal learning is presented in algorithm 1. It do es not strictly follo w the idea of finding confidence on the prediction as lab els are not present to be tested against. The goal here is to reconstruct the clusters from a single pixel sequence suc h that they represent the image. The quality of the reconstruction is taken up later on when a random sample of pixel sequences are used to estimate the probabilit y density of the reconstruction error rates. Note that in the algorithm, E q ( x ) represents the mean of the examples x in the q th cluster and Σ p is the co v ariance matrix of N D feature examples of the q th cluster. 3 Theoretical Perspective Application of the multiv ariate Chebyshev inequalit y that yields a probabilistic b ound enforces certain imp ortan t implications with regard to the clusters that are generated. F or the purp ose of elucidation of the algorithm, the starfish image is taken from [12]. 3.1 Multiv ariate Chebyshev Inequality Let X b e a sto chastic v ariable in N dimensions with a mean E [ X ]. F urther, Σ b e the co v ariance matrix of all observ ations, each containing N features and C p ∈ R , then the multiv ariate Chebyshev Inequality in [4] s tates that: P { ( X − E [ X ]) T Σ − 1 ( X − E [ X ]) ≥ C p } ≤ N C p P { ( X − E [ X ]) T Σ − 1 ( X − E [ X ]) < C p } ≥ 1 − N C p (1) i.e. the probability of the spread of the v alue of X around the sample mean E [ X ] b eing greater than C p , is less than N / C p . There is a minor v ariation for the univ ariate case stating that the probabilit y of the spread of the v alue of x around the mean µ b eing greater than C p σ is less than 1 / C 2 p . Apart from the minor difference, b oth formulations conv ey the same message ab out the probabilistic b ound imp osed when a random v ector or num b er X lies outside the mean of the sample b y a v alue of C p . TI Cluster Approximation Via Multiv ariate Chebyshev Inequality 7 Fig. 1. A random sequence of Starfish Image segmented via unsup ervised conformal learning algorithm. ( C p , NoClust, TRErr) represent the tuple containing the Cheby- shev paramen ter ( C p ), n umber of clusters generated (NoClust) while using C p and the total reconstruction error of the generated image from the original image (TR- Err)(a) (3 , 1034 , 17 . 746), (b) (5 , 271 , 36 . 32), (c) (7 , 159 , 54 . 71), (d) (9 , 45 , 40 . 591), (e) (11 , 31 , 62 . 606), (f ) (13 , 29 , 66 . 061), (g) (15 , 33 , 65 . 424), (h) (17 , 24 , 64 . 98). 3.2 Asso ciation to Clusters Once a cluster is initialized (say with x i and x j ), the size of the cluster dep ends on the n umber of examples getting asso ciated with it. The m ultiv ariate formalism 8 Sinha of the Chebyshev inequalit y controls the degree of uniformit y of feature v alues of examples that constitute the cluster. The asso ciation of the example to a cluster happ ens as follows: Let the new random example (say x t ) b e considered for chec king the asso ciation to a cluster. If the spread of example x t from E q ( x ) (the mean of the q th cluster { x i , x j } ), factored b y the cov ariance matrix Σ q , is below C p , then x t is considered as a part of the cluster. Using Chebyshev inequalit y , it b oils do wn to: P { ( x t − E q [ x i , x j ]) T Σ − 1 q ( x t − E q [ x i , x j ]) ≥ C p } ≤ N C p P { ( x t − E q [ x i , x j ]) T Σ − 1 q ( x t − E q [ x i , x j ]) < C p } ≥ 1 − N C p (2) Satisfaction of this criterion suggests a p ossible cluster to which x t could b e asso ciated. This test is conducted for all the existing clusters. If there are more than one cluster to which x t can b e asso ciated, then the cluster which shows the minimum deviation from the new random p oint is chosen. Once the cluster is c hosen, its size is extended to by one more example i.e. x t . The cluster now constitutes { x i , x j , x t } . If no association is found at all, a new cluster is initialized and the process rep eats until all unseen examples hav e b een processed. The satisfaction of the inequality gives a lo wer probabilistic b ound on size of cluster b y a v alue of 1 − ( N / C p ), if the second version of the Chebyshev form ula is under consideration. Thus the size of the clusters grow under a probabilistic constraint in a homogeneous manner. F or a highly inhomogeneous image, a cluster size ma y b e very restricted or small due to big deviation of pixel intensities from the cluster it is b eing tested with. Once the pixels ha ve b een assigned to resp ective decompositions, all pixels in a single decomp osition are assigned the av erage v alue of in tensities of pixels that constitute the decomp osition. Thus is done under the assumption that decom- p osed clusters will b e homogeneous in nature with the degree of homogeneity con trolled by C p . Figure 1 shows the results of clustering for v arying v alues of C p for the starfish image from [12]. 3.3 Implications In [ ? ] v arious implications hav e b een prop osed for using multiv ariate Cheb yshev inequalit y for image representation using space filling curve. In order to extend on the their work, a few implications are reiterated for further developmen t. The inequality b eing a criterion, the probability asso ciated with the same giv es a b elief based b ound on the satisfaction of the criterion. In order to pro ceed, first a definition of De c omp osition is needed. Definition 1. L et D b e a de c omp osition which c ontains a set of p oints x with a me an of E q ( x ) . The set exp ands by testing a new p oint x t via the Chebyshev ine quality P { ( x t − E q ( x )) T Σ − 1 q ( x t − E q ( x )) < C p } ≥ 1 − N C p . TI Cluster Approximation Via Multiv ariate Chebyshev Inequality 9 The decomp osition may include the p oint x t dep ending on the outcome of the criterion. A p oint to b e noted is that, if the new p oint x t b elongs to D , then D can b e represented as ( x t − E q ( x )) T Σ − 1 q ( x t − E q ( x )). Lemma 1. De c omp ositions D ar e b ounde d by lower pr ob ability b ound of 1 − ( N / C p ) . Lemma 2. The value of C p r e duc es the size of the sample fr om M to an upp er b ound of M / C p pr ob abilistic al ly with a lower b ound of 1 − ( N / C p ) . Her e M is the numb er of examples in the image. Lemma 3. As C p → N the lower pr ob ability b ound dr ops to zer o, implying lar ge numb er of smal l de c omp ositions D c an b e achieve d. Vic e versa for C p → ∞ . It was stated that the image can b e reconstructed from pixel sequences at a certain level of compromise. F rom lemma 2, it can b e seen that C p reduces the sample size while inducing a certain amount of error due to loss of information via av eraging. This reduction in sample size indicates the level of compromise at which the image is to pro cessed. This reduction in sample size or level of compromise is directly related to the construction of probabilistically b ounded decomp ositions also. Since the decomp ositions are generated via the usage of C p in equation 1, the b elief of their existence in terms of a low er probability b ound (from lemma 1) suggests a confidence in the amount of error incurred in reconstruction of the image. F or a particular pixel, this reconstruction error can b e computed by squaring the difference b etw een the v alue of the intensit y in the original image and the in tensity v alue assigned after clustering. Since a somewhat homogeneous decomposition is b ounded probabilistically , the re- construction error of pixels that constitute it are also b ounded probabilistically . Th us for all decompositions, the summation of reconstruction errors for all pixels is b ounded. The b ound indicates the confidence in the generated reconstruction error. Also, by lemma 2, since the num b er of decomp ositions or c lusters is upp er b ounded, the total reconstruction error is also upp er b ounded. It no w remains to b e prov en that for a particular level of compromise, the error rates con verge as the num b er of pro cessed examples and the num b er of clusters increase. In algorithm 1, three error rates are computed as the random sequence of ex- amples get pro cessed. F or eac h original pixel x i ∈ R N in the image, let x R i b e the in tensity v alue assigned after clustering. Then the reconstruction error for pixel x i is norm-2 || x i − x R i || 2 . Since a pixel is assigned to a particular decomp osition D q , it gets a v alue of the mean of the all pixels that constitute the decomposition D q . Thus the reconstruction error for a pixel turns out to be || x i − E q ( x ) || 2 . F or eac h cluster q , the reconstruction error is E r r D q = P n i =1 || x i − E q ( x ) || 2 . Note that the error also indicates how muc h the examples deviate from the mean of their respective cluster. As new examples are processed based on the information presen t from the previous examples, the total error computed at after processing the first pt cntr examples in a random sequence is E r r v al = P cluster cntr q =1 E r r D q . The error rate for these pt cntr examples is E r r 1 = E r r v al /pt cntr . Finally , an error rate is computed that captures ho w the deviation of the examples from 10 Sinha Fig. 2. Error rate E r r 1 for a particular sequence with increasing num b er of examples with C p = 7. their resp ective cluster means happ en, after the formation of a new cluster. This error is denoted b y E r r 2 . The formula for E r r 2 is the same as E r r 1 but with a minute change in conception. The E r r v al are divided by the total num b er of p oin t pro cessed after the formation of ev ery new cluster. Theorem 1. L et Z i b e a r andom se quenc e that r epr esents the entir e image I . If Z i is de c omp ose d into clusters via the Chebyshev Ine quality using the unsup er- vise d le arner, then the r e c onstruction err or r ate E r r 1 c onver ges asymptotic al ly with a pr ob abilistic al ly lower b ound or c onfidenc e level of 1 − N / C p or gr e ater. Pr o of. It is known that the total reconstruction error after pt cntr examples hav e b een pro cessed, is E r r v al = P cluster cntr q =1 E r r D q . And the error rate is E r r 1 = E r r v al /pt cntr . It is also known from equation 1 that an example is asso ciated to a particular decomp osition D q if it satisfies the constraint ( x t − E q ( x )) T Σ − 1 q ( x t − E q ( x )) < C p . Since C p defines lev el of compromise on the image via lemma 2 and the decomp ositions D q is almost homogeneous, all examples that constitute a decomp osition ha ve similar attribute v alues. Due to this similarity b etw een the attribute v alues, the non-diagonal elements of the cov ariance matrix in the inequalit y ab o ve approac h to zero. Thus, Σ − 1 q ≈ det | Σ − 1 q | I , w ere I is the iden tity matrix. The inequality then equates to: ( x t − E q ( x )) T det | Σ − 1 q | I ( x t − E q ( x )) / C p det | Σ − 1 q | ( x t − E q ( x )) T I ( x t − E q ( x )) / C p ( x t − E q ( x )) T I ( x t − E q ( x )) / C p det | Σ − 1 q | TI Cluster Approximation Via Multiv ariate Chebyshev Inequality 11 || x t − E q ( x )) || 2 / C p det | Σ − 1 q | (3) Th us, if x i = x t w as the last example to b e asso ciated to a decomp osition, the reconstruction error || x i − E q ( x ) || for that example would b e upp er b ounded b e C p det | Σ − 1 q | . Consequently , the total error after pro cessing pt cntr examples is also upp er b ounded, i.e. E r r v al = cluster cntr X q =1 E r r D q = cluster cntr X q =1 n X i =1 || x i − E q ( x ) || 2 / cluster cntr X q =1 n X i =1 C p det | Σ − 1 q | / cluster cntr X q =1 n X i =1 C p det | Σ − 1 q | (4) Th us the error rate E r r 1 = E r r v al /pt cntr is also upp er b ounded. Different de- comp ositions may hav e differen t Σ − 1 q , but in the worst case scenario, if the decomp osition with the low est cov ariance is substituted for every other decomp o- sitions, then the upp er b ound on the error is C p det | Σ − 1 lowest |× pt cntr Σ clust cntr q =1 Σ n i =1 (1) whic h equates to C p det | Σ − 1 lowest | . u t It is imp ortant to note that this error rate con verges to a finite v alue asymptot- ically as the num b er of pro cessed examples increases. This is b ecause initially when the learner has not seen enough examples to learn and solidify the knowl- edge in terms of a stable mean and v ariance of decomp ositions, the error rate E r r 1 increases as new examples are presented. This is attributed to the fact that new clusters are formed more often in the intial stages, due to lack of prior kno wledge. After a certain time, when large num b er of examples hav e b een en- coun tered to help solidify the knowledge or stabilize the decomp ositions, then addition of further examples do es not increment the error. This stability of clus- ters is chec k ed via the multiv ariate form ulation of the Chebyshev inequality in equation 2. The stability also casues the error rate E r r 1 to stabilize and thus in- dicate its con vergence in a b ounded manner with a probabilistic confidence lev el. Th us for any v alue of pt cntr , there exists an upp er bound on reconstruction error, whic h stabilizes as pt cntr increases. F or C p = 7, the image (c) in figure 1 sho ws the clustered image that is gener- ated using the unsup ervised conformal learning algorithm. Pixels in a cluster of the generated image ha ve the mean of the cluster as their intensit y v alue or the lab el. This holds for all the clusters in the generated image. The total n umber of clusters generated for a particular random sequence was 159. The error rate E r r 1 is depicted in figure 2. 12 Sinha Fig. 3. Error rate E r r 2 for a particular sequence with increasing n umber of clusters and C p = 7. Theorem 2. L et Z i b e a r andom se quenc e that r epr esents the entir e image I . If Z i is de c omp ose d into clusters via the Chebyshev Ine quality using the unsup er- vise d le arner, then the r e c onstruction err or r ate E r r 2 c onver ges asymptotic al ly with a pr ob abilistic al ly lower b ound or c onfidenc e level of 1 − N / C p or gr e ater. Pr o of. The error rate E r r 2 is the computation of error after each new cluster is formed. The upp er b ound on E r r 2 as the num b er of clusters or decomp ositions increase follows a pro of similar to one presented in theorem 1.. u t Again for the same C p = 7, the image (c) in figure 1, the error rate E r r 2 is depicted in figure 3. Intuitiv ely , it can b e seen that b oth the reconstruction error rates conv erge to an approximately similar v alue. The theoretical pro ofs and the lemmas suggest that, for a giv en level of compromise C p there exists an upp er bound on the reconstruction error as well as the num b er of clusters. But this reconstruction error and the n umber of clusters is dep endent on a pixel sequence presented to the learner. Do es this mean that for a particular level of compromise one ma y find v alues of reconstruction error and n um b er of clusters that may never con v erge to a finite v alue, when a random sample of pixel sequences that represent an image are pro cessed by the learner? Or in a more simplified wa y , is it p ossible to find a reconstruction error and the n umber of clusters at a particular level of compromise that b est represen ts the image? This p oin ts to the problem of whether an image can b e reconstructed at a particular lev el of compromise where there is a high probabilit y of finding a lo w reconstruction error and the num b er of clusters, from a sample of sequences. TI Cluster Approximation Via Multiv ariate Chebyshev Inequality 13 Fig. 4. The probabilit y density estimates for (a) E r r 1 (b) E r r 2 and (c) the num b er of clusters obtained via the unsup ervised conformal learner, generated ov er 1000 random sequences representing the same image with C p = 10. The existence of such a probabilit y v alue would require the knowledge of the probabilit y distribution of the reconstruction error ov er increasing (1) num b er of examples and (2) num b er of clusters generated. In this work, k ernel density estimation (KDE) is used to estimate the probability distribution of the recon- struction error E r r 1 and E r r 2 . T o inv estigate into the quality of the solution obtained, the error rates were generated for different random sequences and a KDE was ev aluated on the observ ations. The density estimate empirically p oint to the least error rates with high probabilit y . It was found that the error rates E r r 1 , E r r 2 and the n umber of clusters, all conv erge to a particular v alue, for a giv en image. 14 Sinha Fig. 5. Beha viour of reconstruction error (via KDE) and n umber of cluster or decom- p ositions (via KDE) based on increasing v alues of C p . F or C p = 10, the probability density estimates w ere generated using the den- sit y estimates on error rates and the num b er of clusters obtained on 1000 random sequences of the same image. It was found that the error rates E rr 1 , E r r 2 and the num b er of clusters con verge to 33 . 1762, 35 . 9339 and 38, respectively . Figure 4 shows the graphs for the same. It can b e seen from graphs (a) and (b) in figure 4, that b oth E r r 1 and E r r 2 con verge nearly to the similar v alues. It can b een noted that with increasing v alue of the parameter C p , the b ound on the decomp osition expands whic h further leads to generation of lo w er num ber of clusters required to reconstruct the image. Thus it can b e exp ected that at lo wer levels of compromise, the reconstruction error (via KDE) is lo w but the n umber of clusters (via KDE) is very high and vice versa. Figure 5 shows the b eha viour of these reconstruction error and num b er of clusters generated as the level of compromise increases. High reconstruction error do es not necessarily mean that the represen tation of the image is bad. It only suggests the granularit y of reconstruction obtained. Thus the reconstruction of the image can yield finer details at low lev el of compromise and p oint to segmentations at high level of compromise. Regularization ov er the level of compromise and the num b er of clusters w ould lead to a reconstruction which has low reconstruction error as w ell as adequate num b er of decomp ositions that represen t an image prop erly . There are a few points that need to be remem b ered when applying suc h an online learning paradigm. The reconstructed results come near to original image only at a level of imposed compromise. As the size of dataset or the image increases, the time consumed and the n umber of computations in v olved for pro cessing also increases. T o start with, the learner would p erform well in clean images than on noisy images. Adaptations need to b e made for pro cessing noisy images or the pre-processing w ould be a necessary step b efore application of suc h an algorithm. Other inequalities can also b e tak en into account for multiv ariate information online. It w ould b e tough to compare the algorithm with other TI Cluster Approximation Via Multiv ariate Chebyshev Inequality 15 p o werful clustering algorithms as the prop osed work presents a w eak learner and pro vides a general solution with no tigh t b ounds on the quality of clustering. Nev ertheless, the current work contributes to estimation of cluster num b er in an unsup ervised paradigm using transductive-inductiv e learning strategy . It can b e said that for a fixed Chebyshev parameter, in a b o otstrapp ed sequence sampling en vironment without replacement, the unsup ervised learner conv erges to a finite error rate along with the a finite num b er of clusters. The result in terms of clustering and the error rates may not be the most optimal (where the meaning dep ends on the goal of optimization), but it do es give an affirmativ e clue that image decomp osition is robust and conv ergent. 4 Conclusion A simple transductive-inductiv e learning strategy for unsup ervised learning paradigm is presented with the usage of multiv ariate Cheb yshev inequality . Theoretical pro ofs of con vergence in n umber of clusters for a particular level of compromise sho w (1) stabilit y of result o ver a sequence and (2) robustness of probabilistically estimated appro ximation of cluster num ber ov er a random sample of sequences, represen ting the same multidimensional data. Lastly , upp er b ounds generated on the num b er of clusters p oint to a limited searc h space. References 1. PO Berge. A note on a form of tohebyc heff ’s theorem for t w o v ariables. Biometrika , 29(3-4):405, 1938. 2. J.C. Bezdek. Pattern r e c o gnition with fuzzy obje ctive function algorithms . Kluw er Academic Publishers, 1981. 3. M. Charik ar, C. Chekuri, T. F eder, and R. Motw ani. Incremen tal clustering and dynamic information retriev al. In Pr o c e e dings of the twenty-ninth annual ACM symp osium on The ory of c omputing , pages 626–635. ACM, 1997. 4. X. Chen. A new generalization of cheb yshev inequality for random v ectors. arXiv:math.ST , 0707(0805v1):1–5, 2007. 5. R.C. Dub es. Ho w many clusters are b est?-an exp eriment. Pattern R e c o gnition , 20(6):645–663, 1987. 6. C. F raley and A.E. Raftery . How man y clusters? which clustering metho d? answers via mo del-based cluster analysis. The c omputer journal , 41(8):578, 1998. 7. R. Gomes, M. W elling, and P . Perona. Incremen tal learning of nonparametric ba yesian mixture mo dels. In Computer Vision and Pattern R e c o gnition, 2008. CVPR 2008. IEEE Conferenc e on , pages 1–8. Ieee, 2008. 8. D. Hilb ert. Uber die stetige abbildung einer linie auf ein flachenstuc k. Math. Ann. , 38:459460, 1891. 9. T. Kan ungo, D.M. Mount, N.S. Netany ahu, C.D. Piatko, R. Silverman, and A.Y. W u. An efficient-means clustering algorithm: Analysis and implemen tation. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e , pages 881–892, 2002. 10. DN Lal. A note on a form of tcheb yc heff ’s inequality for tw o or more v ariables. Sankhy¯ a: The Indian Journal of Statistics (1933-1960) , 15(3):317–320, 1955. 16 Sinha 11. A.W. Marshall and I. Olkin. Multiv ariate cheb yshev inequalities. The Annals of Mathematic al Statistics , pages 1001–1014, 1960. 12. D. Martin, C. F o wlkes, D. T al, and J. Malik. A database of h uman segmen ted natu- ral images and its application to ev aluating segmen tation algorithms and measuring ecological statistics. Pr o c. 8th Int’l Conf. Computer Vision , 2:416–423, July 2001. 13. D. Monhor. A cheb yshev inequality for multiv ariate normal distribution. Pr ob a- bility in the Engineering and Informational Scienc es , 21(02):289–300, 2007. 14. D. Monhor and S. T akemoto. Understanding the concept of outlier and its relev ance to the assessmen t of data qualit y: Probabilistic bac kground theory . Earth, Planets, and Sp ac e , 57(11):1009–1018, 2005. 15. G. Shafer and V. V o vk. A tutorial on conformal prediction. The Journal of Machine L e arning R ese ar ch , 9:371–421, 2008. 16. S. Sinha and G. Horst. Bounded multiv ariate surfaces on monov ariate internal functions. IEEE International Confer enc e on Image Pr o c essing , (18):1037–1040, 2011. 17. S. Still and W. Bialek. Ho w many clusters? an information-theoretic p ersp ectiv e. Neur al c omputation , 16(12):2483–2506, 2004. 18. V. V ovk, A. Gammerman, and G. Shafer. Algorithmic le arning in a r andom world . Springer V erlag, 2005. 19. L. Xu. How man y clusters?: A ying-yang mac hine based theory for a classical op en problem in pattern recognition. In Neur al Networks, 1996., IEEE International Confer enc e on , volume 3, pages 1546–1551. IEEE, 1996.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment