On Constrained Spectral Clustering and Its Applications

Constrained clustering has been well-studied for algorithms such as $K$-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constra…

Authors: Xiang Wang, Buyue Qian, Ian Davidson

On Constrained Spectral Clustering and Its Applications
Noname manuscr ipt No. (will be inserted by the editor) On Constrained Sp ectral Clustering and Its Applications Xiang W ang · Buyue Qian · Ian Davidson Receiv ed: date / Accepted: date Abstract Constra ined cl ustering has b een well-studied for a l gori thms such as K - means and hierarchical clust ering. Ho wev er, how to satisfy m any const raints i n these algori thmic setting s has been shown to b e in tract a ble. One alt ernati ve t o enco de m any const raints is to use sp ectral cl ustering , whi ch remai ns a developing area. In this pa p er, we prop ose a flexi ble framework for co nstrain ed sp ectr al clus- tering. In contrast to some previo us effo rts that impl icitl y enco de Must-Link and Cannot -Link constrai nts b y mo d i fying t he gra ph Lapl acian o r constraini ng the u n - derlying ei g enspace, we p r esent a mo re natur al and principl ed formulation, which explicit ly enco des the const r aints as part of a constrai ned opt imiza tion problem. Our m et ho d offers seve ral pr actica l adv an tages: i t can encode the deg ree o f b e- lief in Must-Link and Cannot -Link constr aints; it guarantees to low er-b ound how well the gi ven constrai nts are satisfied using a us er -sp ecified threshol d; it can b e solved deter m inist i cally in p olynom ial t ime throu g h generali zed eig end eco mp osi- tion. F urt hermore, by i nheritin g the ob jective funct i on from sp ectral clust ering and enco di ng the co nstrai nts explicit ly , m uc h of the existing analysi s of uncon- strained sp ectra l clustering techniques rema ins v alid for o ur formulati o n. W e v al- idate t h e effectiveness of our a pp r oach by empirica l results on b ot h art ificia l and real da tasets. W e also dem onstra te an innov ative use of enco d i ng la rge number of constrai nts: t r ansfer l earning via constra ints. Keywords Sp ectral clusteri ng · C onstra ined clust er ing · T ransfer learning · Graph parti tion X. W ang Departmen t of Computer Science, University of Calif ornia, Davis. Davis, CA 95616, USA. E-mail: xiang@ucdavis.edu B. Qian Departmen t of Computer Science, University of Calif ornia, Davis. Davis, CA 95616, USA. E-mail: byqian@ucda vis.edu I. D a vidson Departmen t of Computer Science, University of Calif ornia, Davis. Davis, CA 95616, USA. E-mail: davidson@cs.ucda vis.edu 2 Xiang W ang et al. −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (a) K-means −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (b) Sp ectral clustering −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (c) Spectral clustering −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (d) Constrained sp ectral clustering Fig. 1 A motiv ating example for constrained sp ectral clustering. 1 Intr oduction 1.1 Background and Mot iv at i on Sp ect ral clust ering is an im p ortant cl u st ering technique t hat has b een extensively studied in t he image pro cessing , data mining, and machine learni ng communi- ties (Shi a nd Mal ik (20 00); von Luxburg (2 007); Ng et al (2001)). It is considered sup er ior to tra ditio nal clustering algori thms lik e K -mea ns in ter ms of having deter- minist i c p oly nomia l-tim e solut ion, the a bility to mo del ar bitra r y shap ed clu s t ers, and its equiv alence to certa in g raph c ut probl ems. F o r exa m ple, sp ectr al cluster- ing i s abl e to captur e the underlyi ng mo on-sha p ed clust ers as shown in Fig. 1( b) , whereas K -means would f ail (Fig . 1(a) ). T he a dv antage of sp ectral clus t ering has also b een v a l idated by many real -world appli catio ns, such as i mage segmentatio n (Shi and Mali k (200 0)) and m ining so cial netw o rks (Whi te and Smyth (200 5)). Sp ect ral clusteri ng was originall y prop osed to a ddress a n unsup er vised l ea rning problem: the da ta i nstances are unla b eled, a nd a ll av ailabl e inform atio n is enco ded in t he graph L aplaci an. Howev er, there are cases w here unsup erv ised s p ect ral clustering b ecom es insufficient. Usi ng the same toy da ta, as shown in (Fi g. 1(c)) , when the two mo ons are under-samp l ed, the clust ers b eco me so sparse t hat the separati on of them beco m es difficult. T o help spect ral cl ustering r ecov er from a n undesirable part itio n, we can intro duce si de inform atio n i n v ario us form s, i n either small or large amo unts. F o r exampl e: On Constrained Sp ectral Cl ustering and Its A ppl i cations 3 1. P airwise constraint s : Domai n exp erts may expl icitl y a ssign cons t raints that state a pair of inst ances must b e in the sa me cluster (Must-Link , ML fo r short) or that a pair of i nstances cannot b e in th e sa m e cluster (Ca nnot-Li nk, CL for short). F or instance, as shown in Fig. 1(d), we a ssigned sev eral ML (sol id lines) a nd CL (dashed lines) constra ints, then appli ed our constrai ned sp ectral clustering alg o rithm , which we w ill describ e later. As a result, the two mo ons were successfully recov er ed. 2. P artial lab eling : T here can b e la b el s on some of the inst ances, which are neither complet e nor exhaustive. W e demonstr ate in Fig. 9 t h a t even small amounts o f lab eled info rmati on can g reatly im prov e clustering resul ts when compared again s t the ground truth part itio n, as inferred by the lab els . 3. Alternative weak distance metrics : In some sit uatio ns ther e may b e mo re than o ne dist ance met rics av ail a ble. F or example, i n T able 3 and accompa nying paragr aphs we descr ib e cl ustering do cu m ents using dista nce functio ns based on different langua ges (f ea tures). 4. T ransfer of knowledge : In the context of t ransfer lea rning (Pan and Y ang (201 0)), i f we treat the g raph L a placia n a s the targ et doma in, we could t ransfer knowledge from a different but related gra ph , which can b e viewed as t he source domai n . W e discuss this dir ect ion in Sect ion 6.3 and 7.5. All the af o rementioned si d e inf ormat ion can b e tra ns f ormed into pairwi se ML and CL const raints, w hich co uld eit her b e hard (binary ) or soft (degree o f b elief ). F or example, if the side infor mati o n comes from a source gra ph, we ca n construct pairwi se co n s t raints by assuming that the more si mila r tw o i nstance are in t he source gr aph, the m ore li kely they b elo ng to t he sa me cluster in the t arget g raph. Consequently the const r aints shoul d naturall y b e represented by a degree of b elief, rather than a binary assert ion. How to m a ke use of these si de info r mati on to imp r ov e clusterin g fal ls into the ar ea of const rained cluster ing (Basu et al (2008)). In general, const rained clustering is a categor y of techniques t hat try to incorp ora te ML and CL co n- straints into ex isting cl ustering schemes. It has been well studied on a lgor ithms such as K -means clustering, mix ture m o del, hiera rchical clusteri ng, and density- based cl ustering . Previo us studies sho wed that sa tisfyi ng al l const raints at o nce (Davidson and Ravi (20 07a)), increm entall y (Da vidson et a l (2 007)), or even prun- ing constraints ( Davidson and Ravi (200 7b)) is intractable. F urt hermore, it was shown that a lgori thms tha t buil d set part i tions incrementally (such as K -means and EM) are prone t o b eing ov er -constra i ned (Davidson and R avi (2 006)). In con- trast, incorp o rating constraints into spect ral clusteri ng i s a prom ising directio n since, unlike exist ing al gorit hms, all data instances are assig ned simultaneousl y to clusters, even if the given co nstrai nts are inconsi stent. Constra ined sp ectral clu st ering is still a developing area . Prev i ous work on this topic can b e div ided into two catego ries, based on how they enforce t he constrai nts. The fir st categ ory (Kamv ar et al (200 3); Xu et al (2 005); Lu a nd C a rreira -Perpi˜ n´ an (200 8); W ang et a l (20 09); Ji and Xu (2 006)) directly m anipula te the graph Lapl a- cian (or eq uiv alently , the affini ty mat rix) a cc o rding to the given constr aints; then unconstra i ned sp ectral cluster ing is appl ied o n the mo dified graph Laplaci an. The second categ ory use constra ints t o restrict the fea sible soluti on spa ce (De Bie et al (200 4); Colema n et al (200 8); Li et al (200 9); Y u and Shi (200 1, 2004)). Existi ng metho ds in b oth cat egories shar e several limi tati ons: 4 Xiang W ang et al. – They are desig n ed to ha ndle onl y binary const r aints. Howev er, as we h ave stated a b ove, in many real-world appli catio ns, con s t raints are made av ail able in t he form of real -v alued degree of b el i ef, ra ther than a yes or no assert ion. – They aim to sat isfy a s many con st raints as p o ssible, which co uld lead to inflex- ibili ty i n p r actice. F o r exampl e, th e given set of const raints could be noi sy , and satisfy ing som e of t he constr aints could act ually hurt the overall p erfor m ance. Also, it is reaso nable to igno r e a small p orti o n of constra ints in exchange for a clustering w ith muc h lower co st. – They do no t o ffer a ny nat ural interpretatio n of either the wa y that const raints are enco ded or the impl icati on o f enfo rcing them. 1.2 Our Contributi ons In this pa p er , we st udy how t o inco rp orat e large a mounts of pa i rwise constrai nts into sp ect ral clustering , in a flexible m a nner that a ddresses t he limita tions of previous work. Then we show the pract ical benefit s of o ur app r oach, including new appli c a tions previo usly not po ssible. W e extend beyond binary ML/CL constra ints and propo se a more flexi ble framework to accomm o date general-type side informati on. W e all ow the binar y constrai nts t o b e relaxed to real-v alued degree o f b elief that tw o da ta insta nces bel ong to the sa me c l uster o r two different clust ers. Moreov er, inst ead of try ing to sa t isfy each and every constra i nt that has b een given, we use a user-sp ecifi ed threshold to lower b ound how well the g iven constrai nts must b e satisfied. Ther e- fore, our metho d pro vides maximum fle xibili t y in terms of both representing constrain ts and satisfying them . This, in a dditi o n to handl ing large amounts of constrai nts, al lows the enco di ng of new s tyles of inform atio n such as entire gr aphs and alterna tive dist ance metri cs in thei r raw form witho ut consi d er ing issues such as constr a int inc o nsistenci es and ov er- co nstrai ning. Our co ntributions are: – W e pro p ose a pr incipled fra mework for co nstrai ned sp ectral clustering that ca n incorp or ate larg e amo unts o f b oth hard and soft cons t raints. – W e show how to enforce constraints in a flexible wa y: a user-sp ecified threshold is i ntroduced so th a t a lim i ted amount of co nstrai nts can b e ignored in exchange for l ower clust ering cost. T his al lows i ncorp ora ting side inform a tion in its raw form w ithout consider i ng iss u es such a s inconsist ency and ov er-constra ining. – W e ex tend t he o b jective function of unconst r ained sp ect r al cl ustering b y enco d- ing const raints expl icitl y a nd creating a novel const rained o ptimi z a tion prob- lem. Thus our formulatio n natura lly cov ers unconstra ined sp ectral clusteri ng as a sp ecial case. – W e sh ow that o ur o b jective function can b e tur ned into a g eneraliz ed eigenv al ue problem, which can b e solved determi nistica lly in p ol ynomi a l tim e. This i s a ma jor adv a ntage ov er const rained K -means cl ustering , which produces non- determini stic solutio ns whil e b ei ng i ntractable even for K = 2 ( Drineas et al (200 4); Davidson and Ravi (2 0 07b)). – W e i nterpret our fo rmulation from bo th the graph cut pers p ect ive and the Laplaci an embedding p ersp ecti ve. On Constrained Sp ectral Cl ustering and Its A ppl i cations 5 – W e v alida te the effecti veness of our approach and its adv antage over ex isting metho ds usi n g standard b enchmarks a nd new innov a tive appli catio ns such as transfer learning . This pap er is an extensi on of our previous work (W ang and Davidson (2 010)) with the fo l lowing additi ons: 1) we extend o u r alg orithm from 2-way pa r titi on to K -way pa r titi on ( Section 6 .2); 2) we add a new g eometr ic interpretation to our algorit hm (Sect ion 5. 2); 3) we show how to apply ou r algo rithm to a novel applica tion (Sectio n 6.3), na mely tra nsfer lear ning, and test i t wit h a r eal-world fMRI dat aset (Sect ion 7.5); 4 ) we present a much mo r e comprehensi ve exp eriment section wit h more task s co nducted on more data sets (Sect ion 7 .2 and 7.4). The rest of the paper is o rgani zed as follows: In Section 2 we briefly survey previous work on constrai ned spect ral clustering ; Sect i on 3 provides pr el imina ries for sp ectr al clus t ering; i n Secti on 4 we formal ly i ntroduce o ur f ormulatio n f or constrai ned sp ectra l clust er ing a nd show how to sol ve it efficiently; i n Sectio n 5 we interpret our o b jective fr om two different p ersp ect i ves; in Sect ion 6 we di scuss the implementati o n of our a lgori thm and p o ssible ex tensions; we em pirical ly ev a luate our approach in Secti o n 7; S ect ion 8 concludes the pap er. 2 Related W ork Constra ined clustering is a ca t egory o f m et ho ds t hat ext end c l ustering fro m unsu- per vised setti ng to semi-sup ervi sed set t ing, where si de inform atio n is av ai lable in the form of, or ca n b e conv erted into, pairw ise constra ints. A n umber o f al g orit hms hav e been pro p o sed on how to incorp ora te constr aints i nto spectr al clust ering, which can b e gro up ed i nto tw o cat egories. The first categ ory manipul ates the graph La placia n directly . Ka mv ar et al (200 3) prop osed the sp ectra l learning algo r ithm that set s t he ( i, j )-t h entry o f the affinity matri x to 1 if t here is a ML between no de i and j ; 0 for C L . A new graph Laplaci an is then co mputed based on the mo dified affini ty ma trix. In (Xu et al (200 5)), the constr aints are encoded i n the sa me wa y , but a random walk matri x i s used instea d of t h e norm alized Lapl acian. Kulis et al (2005) pr op osed to add b oth p o s i tive (for ML) and negative (for CL) p enal ties t o the affi nity ma- trix (they then used k ernel K -means, inst ead of sp ect ral clusteri ng, t o fin d t he partit ion based on the new kernel). Lu and Car r eira-Perpi ˜ n´ an (20 08) prop osed to propaga te the constra ints in the a ffinity m atrix . In J i and Xu ( 2006); W ang et a l (200 9), the graph Laplaci an is mo dified by combining the const r aint matri x as a regular izer. The limi tati on of these approa ches is that there is no pr i ncipled wa y to d eci de the weights of the constra ints, and there i s no gu a rantee tha t how well the give constra ints wi ll b e satisfied. The second categ ory manipula tes t he eigenspace directl y . F or exam ple, the subspace trick i ntroduced by De Bie et al ( 2 004) alters the eigenspace whi ch t he cluster indi cator vector is pro jected onto, based o n the given constra ints. Thi s technique was later ext ended in Coleman et a l ( 2008) to accom mo dat e inco nsis- tent co nstrai nts. Y u and Shi ( 2001, 2 004) enco ded pa rtia l g r ouping info rmati on as a subspace pro jectio n. Li et al (200 9) enforced co nstrai nts by regula rizing t he sp ec- tral emb edding. T his type of a pproaches usually st rictl y enf orce g iven co nstrai nts. As a result, the results are often over-constrained, whi ch m akes the algor ithms sen- 6 Xiang W ang et al. T able 1 T able of notations Sym bol Meaning G An undirected (w eigh ted) graph A The affinity matrix D Th e degree matrix I The identit y m atri x L/ ¯ L The unnormalized/normalized graph Laplacian Q/ ¯ Q The unnormalized/normalized constrain t matrix vol The volume of graph G sitive t o noi se a nd i nconsistenci es in the constra int set. Mo reov er , it is non-t rivia l to extend these approa ches to incorp ora te soft constr aints. In addi tion, Gu et al (201 1 ) p r op osed a sp ectral kernel desig n that co mbines multiple cl ustering ta sks. The learned kernel is constra i ned in such a way tha t the data di stribut ions o f any t wo ta sks are as cl ose as p ossi ble. Their problem setting differs fr om o urs b ecause we aim to p er f orm singl e-task cl ustering by us- ing two ( disagreei ng) data so urces. W ang et al (200 8) show ed how to incorp or ate pairwi se constrai nts into a p enalized matri x factor izati on fra m ework. T heir m atrix approximati on ob jective functio n, which is different from our norma lized min-cut ob jective, is solved by an EM-like al gorit hm. W e would l ike to st r ess that the pros a nd cons of sp ectra l clusteri ng a s com- pared to other clusteri ng schemes, suc h a s K -means cl ustering , hierar chical clus - tering, etc., hav e b een thoro ug hly studi ed and well est ablished. W e do not clai m that constra ined spect ral cluster i ng is universally sup eri or to ot her constra ined clustering schemes. The go al of t his work i s to provide a way to incorp ora te con- straints i nto spect ral cluster ing that is m ore flexible and p r incipled as com pared with existi ng const rained spect ral clustering techniques. 3 Background and Preli m inaries In thi s pap er we foll ow the standar d graph mo del that is com monly used i n the sp ect ral clusteri n g litera ture. W e reit er ate some o f the definitions a nd prop erties in this sectio n, such as gra ph Lapla cian, nor mali zed min-cut , eigendecomp o sitio n and so for t h, to make thi s paper self-contained. Readers who are famil iar w ith the ma teria ls can ski p to o u r formulati on in Section 4 . Im p ortant no tati ons used througho ut th e rest of t he pap er a re l isted in T a ble 1. A co llecti on of N dat a instances i s m o deled by an und i rected, weigh ted graph G ( V , E , A ), where each data instanc e corr esp o nds to a vertex (no de) in V ; E is the edge set and A is the asso ciated affinity ma trix. A is symmetri c and non-nega tive. The dia gona l ma trix D = di ag( D 11 , . . . , D N N ) is called the degr ee m atri x of graph G , wher e D ii = N X j =1 A ij . Then L = D − A is called the unno rmal ized gra ph La placia n of G . Assum ing G is connected (i.e. any no de is rea chable fr om a ny other no de), L has the fo llowing prop erties: On Constrained Sp ectral Cl ustering and Its A ppl i cations 7 Property 1 (Properties of graph Laplacian (von Luxburg (2007))) L et L b e the gr aph L aplacian of a c onne cte d gr aph, then: 1. L is symmetric and p ositive semi -definite. 2. L has one and only one eigenvalue e qual to 0, and N − 1 p ositive eigenvalues: 0 = λ 0 < λ 1 ≤ . . . ≤ λ N − 1 . 3. 1 is an eigenve ctor of L with eigenvalue 0 ( 1 is a c onstant ve ctor whose entries ar e al l 1 ). Shi and Malik (200 0) showed that the eigenv ect ors of the gr aph La placi a n can be relat ed t o the no rmal ized min-cut (Ncut ) o f G . The ob jecti ve functi on ca n be writt en as: argmi n v ∈ R N v T ¯ L v , s.t. v T v = vol , v ⊥ D 1 / 2 1 . (1) Here ¯ L = D − 1 / 2 LD − 1 / 2 is call ed t he normali ze d graph Lapl acian (von Luxburg (200 7)); v ol = P N i =1 D ii is the volume of G ; t he first constrai nt v T v = v ol norma lizes v ; the second constra int v ⊥ D 1 / 2 1 rules out t he pri ncipal eigenv ector of ¯ L as a t rivia l soluti on, b ecause it do es no t define a mea ningful cut on th e graph. The r elaxed cluster i ndicato r u can b e recov ered from v as: u = D − 1 / 2 v . Note t hat the result of spectra l clustering i s so l ely deci ded by t he affi nity structure of gr a ph G as enco ded in the matr i x A ( and thus th e graph Laplac i an L ). W e wi ll then describ e our extensio ns on how to incorp o rate si d e inform atio n so that t he resul t of clustering w ill reflect b o th the a ffinity structure o f t he g raph and the structure of the side info rmati on. 4 A Fle x ible F ramework for Constrained Spe ctral Clustering In this secti on, we show how to i ncorp ora te side i nforma tion in to sp ectr al cl u st ering as pairw ise constraints. Our formulat i on allows b ot h hard and soft constr aints. W e pro p o se a new co nstrai ned opti m izati on formulatio n for constr a ined sp ect ral clustering . Then we show how to solve the ob jective function by co nv erti ng it into a general ized eig env alue system. 4.1 The Ob jective F unction W e enco de side informa tion with an N × N const raint matr ix Q . T ra ditio nally , constrai ned clusteri ng only accom mo dat es bina ry co nstraints, na mely Must-Link and Canno t-Link: Q ij = Q j i =      +1 if M L ( i, j ) − 1 if C L ( i, j ) 0 no side informa tion av ailabl e . 8 Xiang W ang et al. Let u ∈ {− 1 , +1 } N be a clust er indi cator vector, where u i = +1 if no de i b elo ngs to cluster + and u i = − 1 i f no de i bel ongs to cluster − , then u T Q u = N X i =1 N X j =1 u i u j Q ij is a m easure of how well the constr aints i n Q ar e sa t isfied by the assi gnment u : the mea sure wil l increase by 1 if Q ij = 1 and no de i and j ha ve the sa me si gn in u ; it wi l l decrease by 1 if Q ij = 1 but node i and j have different si gns in u o r Q ij = − 1 but no de i and j hav e the sam e si gn in u . W e ext end the ab ov e enco ding scheme to accommo date so ft co nstrai nts by relaxi ng the cl uster indi cator vector u as well as t he const raint ma trix Q such that: u ∈ R N , Q ∈ R N × N . Q ij is p o sitive if w e b eli eve no des i and j b elo ng to the same cluster; Q ij is negati ve if we b elieve no des i and j b elong to different clust ers; the m agnit ude of Q ij indicat es how stro ng th e b el ief is. Consequently , u T Q u b ecomes a real-v a lued measure of how well t h e constrai nts in Q are sat isfied i n the relax ed sense. F or ex ample, Q ij < 0 means we b elieve no d es i and j bel ong t o di fferent cl usters; in or der to improve u T Q u , we should assig n u i and u j with v alues of different si gns; simil a rly , Q ij > 0 mea ns no des i and j a re bel ieved to b elo ng to the same cluster ; we should assign u i and u j with v alues o f the sa m e sign. T he la rger u T Q u is, the b ett er the cluster a ssignment u conform s to the given constra ints in Q . Now g iven this real-v alued m easure, rather t han trying to sa tisfy a ll the con- straints in Q indi vidual ly , we ca n lower-bound this measure with a constant α ∈ R : u T Q u ≥ α. F ollowing the notat ions in Eq.(1), we substitut e u wit h D − 1 / 2 v , ab ove inequa lity beco mes v T ¯ Q v ≥ α, where ¯ Q = D − 1 / 2 QD − 1 / 2 is the normalize d co nstrai nt matri x. W e a pp end t his l ow er-bo und constraint to th e ob jective function of uncon- strained sp ect ral cluster ing i n Eq.(1) and we hav e: Problem 1 (Constrained Sp ectral Clustering) Given a norm alized gr aph Lapl a- cian ¯ L , a normal ized constr aint matrix ¯ Q and a threshol d α , we wan t to optim izes the foll owing ob jecti ve function: argmi n v ∈ R N v T ¯ L v , s.t. v T ¯ Q v ≥ α, v T v = v ol, v 6 = D 1 / 2 1 . (2) Here v T ¯ L v is t he co s t of the cut we wan t t o mini mize; the first co nstrai nt v T ¯ Q v ≥ α is t o low er b o und how well t he const raints in Q ar e satisfied; the second co nstrai nt v T v = v ol norma lizes v ; the third const raint v 6 = D 1 / 2 1 rules out the tr ivial soluti on D 1 / 2 1 . Supp o se v ∗ is the o p t imal so lutio n to Eq.(2), then u ∗ = D − 1 / 2 v ∗ is the optim al cluster indi cator vector. On Constrained Sp ectral Cl ustering and Its A ppl i cations 9 It is easy to see that the unc o nstrai ned sp ectra l clusteri ng i n Eq.(1) i s covered as a speci al case o f Eq.(2) wher e ¯ Q = I (no co nstrai nts a re enco ded) and α = v ol ( v T ¯ Q v ≥ v ol i s t rivia lly sati sfied given ¯ Q = I and v T v = vol ). 4.2 Solvi ng the Ob jecti ve F unct ion T o sol ve a constrai ned o ptimi zatio n problem, we follow t he Karush-Kuhn-T uc ker Theorem (Kuhn and T uck er (1 9 82)) to derive the necessary condit ions for the op- timal so lutio n to t he pro blem. W e can fin d a set of candi d a tes, or fe asible solutions , that sat i sfy all the necessary co nditio ns. T hen we cho ose the optimal so lutio n among the feasible so luti o ns using bru t e-force m etho d, given the si ze o f the feasi - ble set is finite and sma ll. F or o ur ob jective f u nc t ion in Eq.(2), we intro duce the Lagrang e multipliers as follows: Λ ( v , λ, µ ) = v T ¯ L v − λ ( v T ¯ Q v − α ) − µ ( v T v − v ol ) . (3) Then according to the KKT Theorem, a ny feasible soluti on to Eq.(2) must satisfy the foll owing condi tions: (Stat ionari ty) ¯ L v − λ ¯ Q v − µ v = 0 , (4) (Prima l feasi bility) v T ¯ Q v ≥ α, v T v = vol , (5) (Dual feasibi lity) λ ≥ 0 , (6) (Compl ementary sl a ckness) λ ( v T ¯ Q v − α ) = 0 . (7) Note that Eq.(4) com es fr o m ta king t he deriv ative of Eq.(3) wi th respect to v . Also note tha t we dismi ss t he co nstraint v 6 = D 1 / 2 1 at th i s m oment, b ecause it can b e chec ked indep endently after we find the feasi ble sol utions. T o solve Eq.( 4)-(7), we start w ith lo oking at the complementary sl ackness requirement in Eq.(7), which implies either λ = 0 or v T ¯ Q v − α = 0. If λ = 0, we will hav e a triv ial problem b ecause the second t erm from Eq. ( 4) will b e elim inated and t he pr oblem w ill b e reduced to unconstr ained sp ectr al cl ustering . Therefo re in t he foll owing we fo cus o n t he case where λ 6 = 0 . In thi s ca se, for Eq.(7) to hold v T ¯ Q v − α must b e 0. Consequently t he KKT conditio ns b ecome: ¯ L v − λ ¯ Q v − µ v = 0 , (8) v T v = v ol, (9) v T ¯ Q v = α, (10) λ > 0 , . (11) Under t he a ssumptio n tha t α is arbi trari ly assig ned by user a nd λ and µ are indep endent v ari ables, Eq.( 8-1 1) cannot b e so l ved expli citly , and it m ay pr o duce infinite number of feasibl e soluti ons, i f o ne exist s. As a w ork aro und, we t emp ora rily drop t he assum ption t hat α is an arbitr ary v alue assigned by the user. Instead, we assume α , v T ¯ Q v , i.e. α is defined as such that Eq.(10) ho lds. Then we intro duce an auxil iary v ar iable, β , which is defined as the ra tio between µ and λ : β , − µ λ v ol. (12) 10 Xiang W ang et al. Now we substit ute Eq.( 12) into Eq.(8) we obtai n: ¯ L v − λ ¯ Q v + λβ v ol v = 0 , or equiv alently: ¯ L v = λ ( ¯ Q − β v ol I ) v (13) W e i m mediat ely notice that Eq.(1 3) is a genera lized eig env alue pr oblem fo r a given β . Next we show tha t the subs t ituti on o f α with β do es not co mprom i se our origi nal intention o f low er b o unding v T ¯ Q v in Eq.( 2). Lemma 1 β < v T ¯ Q v . Pr o of Let γ , v T ¯ L v , by left -hand multiplyi ng v T to b oth s i des of Eq.(1 3) w e have v T ¯ L v = λ v T ( ¯ Q − β v ol I ) v . Then incorp o rati n g Eq.( 9) a nd α , v T ¯ Q v we have γ = λ ( α − β ) . Now reca l l that L is po sitive semi-definite (Pro p erty 1), and so is ¯ L , which means γ = v T ¯ L v > 0 , ∀ v 6 = D 1 / 2 1 . Consequently , we have α − β = γ λ > 0 ⇒ v T ¯ Q v = α > β . In summary , our algor ithm works as foll ows (the exact impl ementation is shown in Algo rithm 1): 1. Generating candidates : The user sp eci fies a v a lue for β , and we sol ve the generali zed eigenv a lue syst em g iven in Eq.(13). Note tha t bo th ¯ L and ¯ Q − β /vol I are Hermiti an matr ices, thus the general ized eig env alues a re g uaranteed to b e real numbers. 2. Finding the fe asible set : Removing g eneraliz ed eigenvectors asso cia ted with non-p osit ive eig env alues, and norma lize the rest suc h that v T v = v ol . Note that the t rivia l so lutio n D 1 / 2 1 is autom atica lly r emov ed i n this step b ecause if i t is a general i zed eig env ecto r in Eq.(13), the asso ci ated eigenv al ue would b e 0. Since w e hav e at m ost N genera lized eigenvectors, the n umb er of fea sible eigenv ectors is at most N . 3. Choosi ng the optimal sol uti on : W e choose from t he feasible sol utions the one t hat mini m izes v T ¯ L v , say v ∗ . According t o Lemm a 1, v ∗ is the opti mal soluti on to t he ob jective fun ct ion in Eq.(2) for any given β and β < α = v ∗ T ¯ Q v ∗ . On Constrained Sp ectral Cl ustering and Its A ppl i cations 11 Algorithm 1: Constr a ined Sp ectr al Clusteri ng Input : Affinity matrix A , constrain t matrix Q , β ; Output : The optimal (relaxed) cluster indicator u ∗ ; 1 v ol ← P N i =1 P N j =1 A ij , D ← diag( P N j =1 A ij ); 2 ¯ L ← I − D − 1 / 2 AD − 1 / 2 , ¯ Q ← D − 1 / 2 QD − 1 / 2 ; 3 λ max ( ¯ Q ) ← the largest eigen v alue of ¯ Q ; 4 if β ≥ λ max ( ¯ Q ) · vol then 5 return u ∗ = ∅ ; 6 end 7 else 8 Solv e the generalized eigenv alue system in Eq.(13); 9 Remo v e eigen v ectors associated with non-posi tive eigen v alues and normalize the rest by v ← v k v k √ vol ; 10 v ∗ ← argmin v v T ¯ L v , where v is among the feasible eigenv ectors generated in the previous step; 11 return u ∗ ← D − 1 / 2 v ∗ ; 12 end 4.3 A Sufficient C onditi on for the Exist ence o f So lutio ns On o ne hand, our metho d describ ed ab ove is g uaranteed to generate a finite num- ber of feasibl e sol u t ions. On the other hand, we need to set β appropri ately so tha t the genera lized eigenv a lue system in Eq.(13) co mbined wi th the KKT conditio ns in Eq.(8-11) w ill gi ve ri se to at lea st one feasible sol ution. In this secti on, we discuss such a sufficient condi tion: β < λ max ( ¯ Q ) · v ol, where λ max ( ¯ Q ) i s the largest eigenv al ue of ¯ Q . In this case, the ma trix o n the right hand side of Eq.(13), namely ¯ Q − β /v ol · I , wi ll have at least o ne p o sitive eigenv al ue. Consequently , t he genera lized eigenv a lue system in Eq.(13) w ill hav e at lea s t o ne po sitive eigenv a lue. Moreov er, the n umber o f feasible ei g env ecto rs will i ncrease if we make β small er . F or exam ple, if w e set β < λ min ( ¯ Q ) v ol , λ min ( ¯ Q ) to be the small est ei genv al ue o f ¯ Q , then ¯ Q − β / v ol · I b ecom es p ositi ve definite. T hen the generali zed eigenv alue syst em in Eq. (13) will generat e N − 1 feasi ble eigenv ect ors (the trivi al solut ion D 1 / 2 1 with eigenv a lue 0 i s dr o pp ed). In practice, we norma lly choose t he v a lue of β w ithin the range ( λ min ( ¯ Q ) · v ol, λ max ( ¯ Q ) · v ol ) . In tha t range, t he great er β is, t he mor e th e solu t ion will b e biased towards sati s- fying ¯ Q . Ag ain, not e t hat whenever we hav e β < λ max ( ¯ Q ) · v ol , the v alue of α wil l always b e b ounded by β < α ≤ λ max v ol. Therefore we do not need to ta ke ca re of α explicit ly . 12 Xiang W ang et al. Fig. 2 An illustrative example: the affinity structure says { 1 , 2 , 3 } and { 4 , 5 , 6 } while the node labeli ng (coloring) says { 1 , 2 , 3 , 4 } and { 5 , 6 } . 4.4 An Illust rati ve Example T o illust rate how our algo r ithm works, we presen t a toy exampl e as fol lows. In Fig. 2, we hav e a gra ph a sso ciat ed wit h the fo llowing affinity matri x: A =         0 1 1 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0         Unconstra ined sp ectra l clustering will cut the gr a ph at edge (3 , 4) and split it i nto tw o symmet r ic par t s { 1 , 2 , 3 } and { 4 , 5 , 6 } (Fi g. 3 (a)). Then we intro duce constr aints a s enco ded in the fol lowing co nstrai nt matri x: Q =         +1 +1 +1 +1 − 1 − 1 +1 +1 +1 +1 − 1 − 1 +1 +1 +1 +1 − 1 − 1 +1 +1 +1 +1 − 1 − 1 − 1 − 1 − 1 − 1 +1 +1 − 1 − 1 − 1 − 1 +1 +1         . Q is essentially saying that we want to gro up no des { 1 , 2 , 3 , 4 } i nto one clust er and { 5 , 6 } the ot her. Alt hough thi s kind of “co mplete i nforma tion” co nstrai nt matr ix do es not happ en in practice, we use it here o nly to m ake the result more expli cit and intuitive. ¯ Q has two distinct eig env alues: 0 and 2 . 6667. As a nalyzed ab ove, β must b e small er than 2 . 666 7 v ol to guara ntee the exi stence o f a feasible so lutio n, and lar g er β means we want m o re constr aints i n Q to b e sa tisfied (in a r el axed sense). Thus we set β to v ol and 2 v ol resp ect ively , and see how i t w ill affect the r esultant constrai ned cuts. W e solve the general ized eigenv a l ue system in Eq.(1 3), and plot the clust er indi cator vector u ∗ in Fig . 3 (b) and 3(c), resp ect ively . W e can see that as β increas es, no de 4 is drag ged from the gro up of no des { 5 , 6 } to t he group of no des { 1 , 2 , 3 } , which conf orms to our expect atio n that grea t er β v alue impli es bet ter constra int sat isfacti on. On Constrained Sp ectral Cl ustering and Its A ppl i cations 13 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 (a) Unconstrained 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 (b) Constrained, β = v ol 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 (c) Constrained, β = 2 vol Fig. 3 The solutions to the illustrative example in Fig. 2 with different β . The x-axis is the indices of the instances and the y-axis is the corresp onding entry v alues in the optimal (relaxed) cluster i ndicator u ∗ . Notice that node 4 is biased tow ard nodes { 1 , 2 , 3 } as β i ncreases. 5 Interpr etations of Our F ormulation 5.1 A Gra p h Cut Interpretati on Unconstra ined spect ral clus t ering ca n b e interpreted as finding the N cut o f an unlab ele d graph. Si mila rly , our formulati on of constra ined spect ral cl ustering in Eq.(2) can b e interpreted as find i ng the N cut of a lab ele d/c olor e d gra ph. Sp eci fically , supp ose we hav e an undir ected weigh t ed gra ph. The no des of the graph are co l ored i n such a w ay t h a t nodes of the sam e col or ar e adv ised to be a ssigned into the same cluster while no des of different col ors are a dvised to be assi g ned into different clust ers ( e.g. Fig. 2). Let v ∗ be the solutio n t o the constrai ned o ptimi zatio n problem in Eq.(2). W e cut the g raph into tw o pa r ts based on the v a lues of the entries of u ∗ = D − 1 / 2 v ∗ . T hen v ∗ T ¯ L v ∗ can be i nterpreted as the cost of the cut (in a rel axed sense), whi ch we mini mize. On the o ther hand, α = v ∗ T ¯ Q v ∗ = u ∗ T Q u ∗ can b e interpreted as t he purity of t h e cut ( also in a rel axed sense), accordi ng to the col or of the nodes i n resp ective sides. F or example, if Q ∈ {− 1 , 0 , 1 } N × N and u ∗ ∈ {− 1 , 1 } N , then α equa ls t o the n umber o f constrai nts in Q tha t ar e satisfied b y u ∗ minus the number of constr aints vio lated. More general l y , if Q ij is a p o sitive num b er, then u ∗ i and u ∗ j having the sa me sign wil l co ntribute p osi tively to the puri ty of the cut, w hereas different sig ns will contribute neg atively to the purity of the cut . It i s no t difficult to see tha t the purity can b e maxi mized when there is no pair of no des wit h different colo rs t hat are a ssigned to the same side of the cut (0 viola tions) , which i s the ca se where all co nstrai nts in Q are satisfied. 5.2 A Geom etric Interpretat ion W e can a lso interpret ou r formulation as constr aining t he joint numerical range (Horn and Johnson (1990)) of t he gr aph Laplacian and the constra int ma trix. Sp eci fically , we consi der t he joint numerical ra nge: J ( ¯ L, ¯ Q ) , { ( v T ¯ L v , v T ¯ Q v ) : v T v = 1 } . (14) J ( ¯ L, ¯ Q ) essential ly ma ps a l l p ossibl e cuts v t o a 2-D pl a ne, where the x -co ordina te corresp onds to the cost of the cut, and the y -axis corresp o nds t o the co nstrai nt 14 Xiang W ang et al. −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (a) The unconstrained Ncut −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (b) The constrained Ncut 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.05 0.1 0.15 0.2 0.25 0.3 Cost of the Cut Constraint Satisfaction of the Cut unconstrained cuts constrained cuts unconstrained min−cut constrained min−cut lower bound α (c) J ( ¯ L, ¯ Q ) Fig. 4 The joint n umerical range of the normalized graph Laplacian ¯ L and the normalized constrain t matrix ¯ Q , as well as the optimal solutions to unconstrained/constrained sp ectral clustering. satisfa ction o f the cut. Acco rding to our ob jecti ve i n Eq.( 2) , we want to minimi ze the first term while lower-boundi ng the second term. Therefor e, we a re lo oking for the leftm ost p oint a mong those that are ab ov e the hori zontal line y = α . In Fig. 4(c), we vi sualize J ( ¯ L, ¯ Q ) by plo tting al l the unconstrai ned cut s gi ven by sp ectral clusteri ng a nd all the constr ained cuts given by our al gorit hm in the joint numerical rang e, based on the graph La placia n of a T wo-Moo n dat aset wit h a randoml y generat ed const raint m atrix . The horizo ntal li ne and the arrow indi- cate the co nstrained area fro m which we can select fea sible soluti ons. W e can see that mo st of the unconst rained cuts prop osed by sp ectr a l clusteri ng are fa r b elow the threshol d, whi ch sugg est s sp ect ral cluster i ng ca nn o t lead to the g r ound truth partit ion (as shown i n Fi g. 4(b)) witho ut the help o f const raints. On Constrained Sp ectral Cl ustering and Its A ppl i cations 15 6 Impleme ntat ion and Extens i ons In thi s sectio n , we di s c us s some implem entation issues of our met ho d. Then w e show how to genera lize i t to K -w ay partit ion where K ≥ 2. 6.1 Const rained Sp ect ral Clusteri ng for 2-W ay Partiti on The r outine of o ur metho d is simila r to t hat of unconstra i ned spect ral clustering . The input of the algor i thm is a n affinity matr i x A , the const raint ma trix Q , a nd a t hreshold β . Th en we solve the g enerali z ed eigenv a lue probl em in Eq.(13) and find all the feasible g enerali z ed eigenvectors. T he out put is the opt imal (rel axed) cluster assignment indicat or u ∗ . In pract ice, a part itio n is often derived from u ∗ by assigni ng no des co rresp onding t o the p ositi ve entries in u ∗ to one side of the cut, and neg ative entries to the o ther side. Our al gori thm is summ arized in Al gorit hm 1. Since o u r mo del enco des soft constrai nts a s degr ee of b el i ef, inco nsistent con- straints in Q wi l l not co r rupt our algo r ithm. Instead, they are enfo rced implici tly by maxi mizing u T Q u . Note t hat if t he constra int matri x Q is generated from a partia l lab eli ng, t hen the constrai nts in Q w ill alwa ys be co nsistent. Run time analysi s: The runtime o f our algor ithm is dom inated by that of the generali zed eig endecomp osi tion. In ot her words, the com pl exity of our algo rithm is o n a par wi th that of unconst r ained sp ectr a l clust er ing in bi g-O no tati on, w hich is O ( k N 2 ), N to b e th e num b er of data instances and k to b e the number of eigenpai r s we need t o compute. Her e k is a num b er la rge enough to guara ntee the existence of feasible solut ions. In pra ctice we norma lly hav e 2 < k ≪ N . 6.2 Extensio n t o K -W a y P artit ion Our a lgor i thm can b e natu r ally extended to K -wa y part itio n for K > 2, following what we usuall y do for unconstr ained sp ectral clustering (von L uxburg ( 2007)): instead of o nly usi ng the opti mal feas i ble eig env ecto r u ∗ , we preserve to p-( K − 1) eigenv ectors a sso ciat ed wit h posi tive eigenv a lues, and p erfor m K -means al gori t hm based on that embedding . Sp eci fically , t he const raint ma trix Q f ollows the same enco di ng scheme: Q ij > 0 if no de i and j are b elieved to b elo ng t o t he same cl u st er, Q ij < 0 otherwi s e. T o guarantee w e can find K − 1 feasi ble eigenv ectors, we set the t hreshold β such that β < λ K − 1 v ol, where λ K − 1 is the ( K − 1)-th l a rgest eig env alue of ¯ Q . Gi ven all t he feasibl e eigen- vectors, we pick t he top K − 1 in term s of minimi zing v T ¯ L v 1 . Let the K − 1 eigenv ectors form the col umns o f V ∈ R N × ( K − 1) . W e p erfo rm K -means clus t ering on the rows of V and g et the final clustering. Algorit hm 2 shows t he complet e routine. Note tha t K -means is only one of m any p ossible discreti zatio n techniques tha t can derive a K -wa y parti t ion fro m the relaxed indicato r ma trix D − 1 / 2 V ∗ . D ue t o 1 Here we assume the trivial solution, the eigenv ector with all 1’s, has been excluded. 16 Xiang W ang et al. Algorithm 2: Constr a ined Sp ectr al Clusteri ng for K -wa y Partition Input : Affinity matrix A , constrain t matrix Q , β , K ; Output : The cluster assignmen t indicator u ∗ ; 1 v ol ← P N i =1 P N j =1 A ij , D ← diag( P N j =1 A ij ); 2 ¯ L ← I − D − 1 / 2 AD − 1 / 2 , ¯ Q ← D − 1 / 2 QD − 1 / 2 ; 3 λ max ← the largest eigen v alue of ¯ Q ; 4 if β ≥ λ K − 1 vol t hen 5 return u ∗ = ∅ ; 6 end 7 else 8 Solv e the generalized eigenv alue system in Eq.(13); 9 Remo v e eigen v ectors associated with non-posi tive eigen v alues and normalize the rest by v ← v k v k √ vol ; 10 V ∗ ← argmin V ∈ R N × ( K − 1) trace( V T ¯ LV ), where the columns of V are a subset of the f easible eigenv ectors generated in the previous step; 11 return u ∗ ← kmeans ( D − 1 / 2 V ∗ , K ); 12 end the ortho gonal ity of the eigenvectors, they can b e indep endently discretized first, e.g. we can repla ce St ep 1 1 of Algo rithm 2 wi t h: u ∗ ← k means ( sign ( D − 1 / 2 V ∗ ) , K ) . (15) This addi tiona l st ep can hel p allevi ate the influence of p o ssible out liers on the K -means step in some cases. Moreov er, no tice that the feasible eigenv ectors, which are the columns of V ∗ , are treated equal ly in Eq.(1 5). This may no t b e ideal in pra ct ice b ecause these candidat e cut s are no t equal ly fav ored by gra ph G , i.e. some o f them have hig h er costs than the other. Therefo re, we can weight the c o lumns of V ∗ with the inv erse of their resp ective cost s: u ∗ ← k means ( sign ( D − 1 / 2 V ∗ ( V ∗ T ¯ LV ∗ ) − 1 ) , K ) . (16) 6.3 Using Con st rained Spect ral Clusterin g for T ra n s f er Learni ng The co nstrai ned spect ral cl ustering fr amework na turall y fits into the scenari o of transfer l earning b etween tw o grap h s. Assume we hav e two graphs, a source gr aph and a t arget grap h, w hich share the sam e set of no des but hav e different sets o f edges ( o r edge weights). The goal is to transfer k nowledge from the sourc e g raph so that we ca n improve the cut o n t he targ et g raph. The k n owledge to t r ansfer i s derived from the source g r aph in t he form of soft constrai nts. Sp eci fically , let G S ( V , E S ) b e the source g raph, G T ( V , E T ) th e targ et graph. A S and A T are their respect ive a ffinity mat rices. Then A S can b e considered as a con- straint matri x with only ML constr a ints. It carri es the struct ural knowledge fro m the source graph, and we can trans f er it t o t he target graph using our constr ained sp ect ral cluster ing fo rmulation: argmi n v ∈ R N v T ¯ L T v , s.t. v T A S v ≥ α, v T v = vol , v 6 = D 1 / 2 T 1 . (17) On Constrained Sp ectral Cl ustering and Its A ppl i cations 17 α is now the lower b ound of how muc h knowledge from t he source gra ph must b e enforced on the targ et graph. T o so lutio n to this is simi lar t o Eq.(13): ¯ L T v = λ ( ¯ A S − β vol ( G T ) I ) v (18) Note tha t since t he l argest eigenv alue of ¯ A S corresp onds to a tri vial cut, i n practice we shoul d set the t hr eshold such that β < λ 1 v ol , λ 1 to b e t h e second largest eigenv al ue of ¯ A S . This wil l guara ntee a fea s i ble eigenv ect or that is non-t rivia l . 7 T es ting and Innov ative Uses of Our W ork W e b egi n with th r ee sets of exp eriments to test our approach on sta n d a rd sp ec- tral cl ustering dat a sets. W e then show that si nce our approach can hand l e la rge amounts of soft constra ints in a flexib l e fashion, this o p ens up tw o innov a tive uses of o ur work: encoding multiple met rics for transl ated do cum ent cl ustering and transfer learning for fMRI anal ysis. W e aim to answer the fol lowing quest ions with the empiri cal st udy: – Can our algo r ithm effectiv ely incor p o rate side informa tion a nd gener ate se- mantically meanin g ful part i tions ? – Do es our a lgori thm conv erge to the underlying grou n d truth parti tion as mor e constrai nts are provided? – How do es our a lgori thm p er f orm on real-world dataset s, as ev aluated agai nst ground t ruth lab eli ng, w ith co mpariso n to exist ing t echniques? – How well do es our alg orit hm ha ndle soft constrai nts? – How well do es our alg orit hm ha ndle large amou nts of const r aints? Recall that i n Section 1 w e li sted four differen t typ es of side informati on: explicit pairwi se constra i nts, part ial l ab eling , alterna t ive metri cs, and tra nsfer o f knowledge. The empir ical result s presented in t his secti on ar e arra nged accordi ngly . All but one (the fMRI scans) datasets used in our experim ents are publicly av ail able online. W e i mplemented our algo rithm in MA TLAB, wh i ch is av ai lable online at http://bayou. cs.ucdavis .edu/ or by contacting t he author s. 7.1 Explici t Pairwise C onstra ints: Imag e Seg m entation W e demo nstrat e the effectiveness of our a lgori thm fo r ima ge segmentation usi ng explicit pairw ise const raints assigned by users. W e c ho ose t he i m age seg mentation applica tion for several r easons: 1) it is one of the applicat ions where sp ectral clust ering signi fic a ntly outp erfo rms other clus- tering techniques, e.g. K -means; 2) the results of imag e segm entation can b e easily interpreted and ev a luated by h uman; 3 ) instea d of generat ing random constrai nts, we can provide sema ntically meanin g ful co nstrai nts to see if the constra ined par- titi o n confo rms t o o ur exp ecta tion. The images we used were chosen f rom the Berkeley Segmentation Da t aset an d Benc hmark (Mar tin et al ( 2001)). The orig inal i mages a re 480 × 320 grayscale im - ages in jp eg forma t. F or efficiency considera t ion, we compressed t hem to 1 0 % of the origina l size, which is 48 × 32 pixels, as shown in Fig. 5 (a) a nd 6(a). Then 18 Xiang W ang et al. 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (a) Original i mage 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (b) No constraints 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (c) Constrain t Set 1 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (d) Constrain t Set 2 Fig. 5 Segmen tation of the elephant image. The images are reconstructed based on the relaxed cluster indicator u ∗ . Pixels that are closer to the red end of the spectrum belong to one segment and blue the other. The lab eled pixels are as b ounded by the black and white rectangles. 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (a) Original i mage 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (b) No constraint s 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (c) Constrain t Set 1 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (d) Constrain t Set 2 Fig. 6 Segment ation of the face image.The i mages are reconstructed based on the r elaxed cluster indicator u ∗ . Pixels that are closer to the red end of the spectrum belong to one segment and blue the other. The lab eled pixels are as b ounded by the black and white rectangles. affinity matr ix o f the imag e was com puted usi ng the RBF kernel, based on b ot h the p osit ions and the gr ayscale v a lues of t he pixels. As a ba seline, we used uncon- strained sp ectr al clusteri ng ( Shi and Malik ( 2 000)) to g enerate a 2-segmentatio n of the i mage. T hen we intro duced di fferent sets of constra ints to see if they can generate exp ect ed segmentation. Note that the resul ts of i m age segmentatio n v ary with the num b er of segm ents. T o sav e us fro m the com p l icati ons of param eter tun- ing, which is irrelev a nt to t he contribution of this work, we a lways set the num ber of segments to b e 2. The resul ts a re shown i n Fig. 5 and 6. T o v i suali ze t he resul tant segmentation, we reco nstructed the image using the entry v al ues in t he rel axed cluster indi cator On Constrained Sp ectral Cl ustering and Its A ppl i cations 19 vector u ∗ . In Fig. 5(b ), t he unconstrai ned sp ect ral clustering pa rtiti oned the ele- phant ima ge i nto tw o parts: t he sky (r ed pi x els) and the two elephants and the ground (blue pixels) . T his i s not satis f ying in th e sense t hat it failed to isol a te the elephants from the background (the sky a nd the ground). T o correct this, we i n- tro duced constrai nts b y la b eling tw o 5 × 5 blo cks to b e 1 (as b ounded by t he bla ck rectangl es i n Fig. 5 (c)): on e a t the upper -right co rner of the imag e ( the sky) a nd the other at the lower-right corner (the ground) ; we also l ab eled two 5 × 5 blo cks on t he heads of the two elephants t o b e − 1 (a s b ounded b y the white r ectangl es i n Fig. 5( c)). T o genera t e the constr aint matri x Q , a ML was a d d ed b etw een every pair of pixel s with the sam e lab el and a C L was add ed b etween every p a ir of pixel s with different la b el s. The pa ramet er β was set to β = λ max × v ol × (0 . 5 + 0 . 4 × # o f constr a ints N 2 ) , (19) where λ max is the la rgest eig env alue of ¯ Q . In thi s wa y , β is al wa ys b etween 0 . 5 λ max v ol and 0 . 9 λ max v ol , and it w ill gradua l ly increase as the num b er of co n- straints increases. F rom Fig . 5( c) we ca n see t hat with t he hel p of user superv i sion, our met h o d successfull y i sola t ed the t wo elephants (blue) from the background, which is the sky and the ground (red). Note that our metho d achiev ed this with very simple lab el ing: fou r squa red bl o cks. T o show the flexibi lity of our m etho d, we tri ed a di fferent set of const r aints o n the same elephant i mage with the same pa r ameter setting s. T his time we aimed to s epa rate t he two elepha nts fro m each other, which is imp ossi ble in the uncon- strained case b ecause the two elephants are not only simil ar in color (g rayscale v alue) but also adjacent in spa ce. Ag ain we used tw o 5 × 5 bl o cks (as bo unded by the black and white rectang les i n Fig. 5(d)), o ne o n the head o f the elephant on the left, l ab eled t o b e 1, and the other on the b o dy of the elephant o n the ri ght, lab eled to b e − 1. As s h own in Fig. 5 (d), our m etho d cut the i mage into two parts with one elephant on t he left (blue) and the other on the r ight (red), just li ke w h a t a human user would do . Simil a rly , we applied our metho d on a human face imag e as shown in Fig. 6(a ). The unco nstrai ned sp ect r al cl us t ering failed to isolate the h uman f ace from the background ( Fig. 6(b )). T hi s is becau s e the tal l hat break s the spat ial contin uity between the l eft side of t he ba ckground and the right side. Then we la b eled tw o 5 × 3 blo c ks t o b e in the same cl ass, one on ea ch side o f the background. As we intended, our metho d assigned the background of b ot h sides into the same clust er and thus i sola t ed the human fa ce with his t all hat fr om the ba ckground(Fig. 6(c)). Again, this was achiev ed simpl y by l ab eling two bl o cks in the ima g e, which cov ered ab out 3% o f all pi xels. Alternat ively , if we lab el ed a 5 × 5 blo c k in the hat to b e 1, and a 5 × 5 blo ck in the face to b e − 1, the resulta nt clustering will isol ate the hat from the rest of the im age ( Fig. 6(d)). 7.2 Explici t Pairwise C onstra ints: The Doubl e Mo on Dat aset W e further exami ne the behavior of our algorit hm on a synthetic data set using explicit constra ints tha t are derived from underlyi ng gr ound truth. W e clai m t hat our fo rmulation is a natura l ex t ension to spectr al clust ering. The question t o a sk t hen is whether the o utput of o ur alg orit h m conv erges to that of 20 Xiang W ang et al. −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (a) A Double Mo on sample and its Ncut 0 10 20 30 40 50 60 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Constraints Adjusted Rand Index (b) The conv ergence of our algorithm Fig. 7 The conv ergence of our algorithm on 10 random samples of the Double Mo on distri- bution. sp ect ral cl ustering . More sp ecifi ca lly , consider the gr ound trut h parti tion defined by p erform ing sp ectral clust ering on an ideal di stribut ion. W e draw an i mp erfect sample f r om t he distr ibutio n, on which sp ectr a l clustering is unable t o find the ground t ruth part itio n. Then we p erform our a lgori thm on this i m p erfect sampl e. As more and more constra ints a r e provided, we wan t to know whether or not the partit ion found by o ur a lgor ithm would converge t o the ground trut h pa rtiti on. T o answer the q uestion, we used t he D o uble Mo on di stributi on. As shown in Fig. 1, sp ectra l clustering i s a ble to find the two mo o ns when the sample is dense enough. In Fig. 7(a ), we genera ted an under-sampl ed instan ce of t he dist ributi on with 1 0 0 data p oints, on whi ch unconst r ained sp ectral clust ering could no longer find the ground truth part itio n. Then we perfo rmed our algo rithm on this im- per fect sample, and com pared t he part itio n f o und by our a lgori thm to the gro und truth partiti on in terms of a djusted Rand index ( ARI, H ub ert and Ara bie ( 1985)). ARI indica tes how well a given partit ion confo r m to the ground t ruth: 0 means the gi ven par titi on is no b etter t ha n a random assignm ent; 1 means the given par- titi o n matches th e gr o und trut h exactl y . F or each random sampl e, we gener ated 50 rando m sets of const raints a nd recor ded the average A R I. W e rep eated t he pro cess o n 10 different ra ndom samples o f the same size and rep orted t he result s in F i g. 7(b) . W e can see t hat o ur a lgor ithm co nsistently conv erg e to the gro und truth result a s more co nstrai nts ar e provided. N o tice that there i s perfo rmance drop when an ext reme small number of constraints are pro vided (l ess than 1 0), which i s exp ected b eca use such small num b er of constra ints are insuffi c i ent to hi nt a b etter pa rtiti on, and consequentially l ead to r andom p erturba tion to the results. As more constr a ints were pr ovided, the results were quickly sta biliz ed. T o illu s t rate the robust ness of the o ur a pproach, we created a Double Mo on sample with uni form ba ckground noise, as shown in Fig. 8. A lthoug h the sa mple is dense eno ugh (600 da ta instances in to tal) , spect ral clustering fails t o find the correctl y identify the two mo ons, due to the influence of bac kground no ise (1 00 data i nstances). How ever, with 20 const raints, our al gori thm i s abl e to recover the tw o mo ons in spit e o f t he background noi se. On Constrained Sp ectral Cl ustering and Its A ppl i cations 21 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (a) Spectral Clustering −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (b) Constrained Sp ectral Clustering Fig. 8 The partition of a noisy Double Mo on sample. T able 2 The U C I b ench marks Iden tifier #Instances #Attributes Hepatitis 80 19 Iris 100 4 Wine 130 13 Glass 214 9 Ionosphere 351 34 WDBC 569 30 7.3 Const raints from Partial Lab eli ng: Clust ering the UC I Benc hmarks Next we ev alu a te the p erform a nce o f our alg o rithm by clusterin g the UCI b ench- mark da tasets (A suncion a nd Newman (2 007)) wi th const raints deriv ed from gr o und truth lab eli ng. W e c ho se six differen t da tasets wi th cla s s lab el i nforma tion, na mely Hepa ti- tis, Iris, Wine, Glass, Io nosphere and Breast Cancer Wisconsin (Diagn o stic) . W e per formed 2 -wa y clusteri ng simply b y partit ioning the optimal cl uster indi cator according t o the sig n: p osi tive entries to one cluster and negati ve the o ther. W e remov ed the setosa cl ass fro m the Iri s data set, which is the cl ass that is known to b e well-separat ely from the other tw o. F or th e sam e reason we remov ed Cla ss 3 fr om the Wi ne dat a set, whi ch i s well-separated from the other two. W e also remov ed da ta insta nces with m issing v alues. The sta tisti cs of the dat a sets af ter prepro cessing are lis t ed in T able 2. F or each da ta set, we computed t he affi nity m atri x using the RBF kernel. T o generate constr aints, we rando mly selected pai rs of no des tha t the unconst rained sp ect ral cluster ing w rongl y partit ioned, a nd fil l in the corr ect rel a tion in Q accord- ing to gro und truth lab el s. The qua lity of the cl ustering result s was measur ed by adjusted Rand index. Si nce t h e constraints are guaranteed to b e co rrect, we set the threshold β such t hat there will b e only one feasible eigenv ect or, i.e. the one that b est conform s t o the co nstraint ma trix Q . In addi tion to comparin g our algor ithm (CSP) to unconstra ined sp ectral cl us- tering, we implem ented two state-of -the-art techniques: – Spect ral L earning (S L ) (Ka mv ar et al (20 03)) m o difies the affi nity matri x of the ori ginal gra ph direct l y: A ij is set to 1 if there i s a ML b etween no de i and j , 0 fo r C L. 22 Xiang W ang et al. – Semi-Sup ervised Kernel K -means (SSKK) (Kuli s et a l ( 2005)) adds p ena lties to the affi n i ty mat rix based on the given co nstrai nts, and then p erforms kernel K -means on the new kernel to find the part itio n. W e a lso tr i ed the a lgor i thm prop osed by Y u and Shi (200 1, 20 0 4), which en- co des partia l gro uping informa tion as a pro jecti o n matr ix, the subspace t rick pro- po sed by De Bie et al (20 0 4), and the affi nity pro pagat ion algor ithm pro p osed by Lu and Carrei r a-Perpi˜ n´ a n (2 008). Howev er, we were not able to use t hese algo- rithms to achiev e b ett er resul ts than SL and SSKK, hence t heir resul ts are not rep orted. Xu et al ( 2005) propo sed a v aria t ion of SL, where the const r aints ar e enco ded in the sam e w ay , but inst ead of the normal ized gra ph Laplacia n, they suggested to use the r a ndom walk ma trix. W e p erform ed their a l gori thm on the UCI data sets, which pro d uc ed l argel y identical result s t o that of SL. W e repo rt the adjust ed R and index agai nst the number of constra ints used (rangi ng f r om 5 0 to 500) so that we can see how the qua lity o f clust ering v ari es when more constr a ints a re provided. A t each stop, we randomly g enerated 100 sets of const raints. The mean, m a ximum and mini mum AR I of the 1 00 r andom trial s are rep ort ed in Fig. 9. W e al so rep or t the ratio of the constrai nts tha t were satisfied by the const rained p a rtit ion in Fig. 10. The observ ations are: – Across al l six datasets, our algorit hm i s able t o effectiv ely utili ze the con- straints and improv e ov er unconstra ined sp ectra l cl ustering (Baseline). On t he one h a nd, our alg orit hm can quickly im p r ov e the results wit h a small amount of co nstrai nts. On the ot her hand, as mor e const raints are provided, the p er- forma nc e of our al g orit hm consi stently increas es a nd co nverges t o the g r ound truth pa rtit ion (Fig . 9) . – Our al gori thm outp erfo rms the comp eti t ors by a large m argi n i n terms of ARI (Fig . 9). Since we hav e control ov er the l ow er -b o unding threshold α , our algo r ithm is able to sat i sfy al most al l the given constra ints (Fig. 10). – The p erform ance o f our al gorit hm has si gnifica ntly sma ller v ari ance ov er di f- ferent random constra int sets than its comp etito rs (Fig . 9 and 10), and the v ariance quickly diminishes a s more cons t raints a re provided. This suggest s that our a lgor ithm would p erfo rm more consistently in pra ctice. – Our algor ithm perfo rms esp ecial ly well on sparse gra phs, i.e. Fig . 9 (e)(f ) , where the com p et itor s suffer. The reas o n is t hat w hen t he gra ph is t o o spa rse, it im- plies many “ free” c ut s tha t are equa l ly go od t o unco nstrai n ed s p ect ral cl uster- ing. Ev en a f ter intro ducing a sma ll number of const raints, t he mo dified g raph remains to o spa rse for SL and SSKK, which are unable to identify the ground truth part itio n. In contrast, these free cuts a re not equiv alent when judged by the co ns t raint matrix Q of o ur al gori thm, whi ch can easi l y identify the one cut that min i mizes v T ¯ Q v . As a resul t , our a lgori thm out p erform s SL and SSKK significa ntly in thi s scena rio. 7.4 Const raints from Al ternat ive Metrics: The R euters Multili ngual Data set W e test ou r algori thm using soft co nstrai nts derived from al ternat ive metr i cs o f the same set of data insta nces. W e used the Reuters Mul t iling ual dataset, first i ntroduced by Ami ni et a l (200 9). Each ti me we randomly sampled 1000 do cuments which were origina lly On Constrained Sp ectral Cl ustering and Its A ppl i cations 23 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (a) Hepatitis 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (b) Iris 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (c) Wine 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (d) Glass 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (e) Ionosphere 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (f ) Breast Cancer Fig. 9 The p erformance of our algori thm (CSP) on six UCI dataset s, with comparison to un- constrained sp ectral clustering (Baseline) and the Sp ectral Learning al gori thm (SL). Adjusted Rand index ov er 100 random trials is r eported (mean, min, and max). writt en in o ne langua ge and then tra nslated into fo ur others, resp ectively . The stati stics of the dataset is l isted in T abl e 3. These do cum ents came wi t h ground truth la b els that categori ze t hem into six topi cs ( K = 6). W e cons t ructed one graph based on t he or igina l la n g uage, and anot her g raph based on the transl a - tion. The affinity ma trix was the cosine simil arity b etween the t f -idf v ect ors of tw o do cuments. Then we used o ne of the two graphs as the co nstraint matrix Q , whose entries ca n then b e viewed a s soft ML constra ints. W e enfor ce this con- straint m atri x to t he ot her g raph to see if it can hel p im p r ov e the clust ering. W e 24 Xiang W ang et al. 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (a) Hepatitis 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (b) Iris 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (c) Wine 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (d) Glass 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (e) Ionosphere 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (f ) Breast Cancer Fig. 10 The ratio of constraints that are actually satisfied. did no t compa re o ur alg orithm to exi sting techniques b ecause t hey are unable to incorp or ate soft const r aints. As shown in F ig. 11, unconstra ined sp ectral clust ering p er forms bet ter on t he origi nal version than the transla ted versions, which is not surprisi ng. If we use the origi nal v ersion as the co nstrai nts and enfor ce tha t onto a tra nslated version using our algo rithm , we yield a constr a ined cl ustering that is no t only b et ter than the unconstra i ned cl ustering on the transl ated version, but als o even b et t er t han the unconstra i ned clustering on t he orig inal version. T his indica tes that our al gori thm is not merely a tradeoff b etween the ori ginal graph a nd the g iven constrai nts. On Constrained Sp ectral Cl ustering and Its A ppl i cations 25 T able 3 The Reuters Multili ngual dataset Language #D o cuments #W or ds English 2000 21,531 F rench 2000 24,893 German 2000 34,279 Italian 2000 15,506 Spanish 2000 11,547 French German Italian Spanish 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Adjusted Rand Index EN Translation Trans. → EN EN → Trans. (a) English Do cument s and T ranslations English German Italian Spanish 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Adjusted Rand Index FR Translation Trans. → FR FR → Trans. (b) F renc h Do cumen ts and T ranslations Fig. 11 The p erform ance of our algorithm on Reuters Multil ingual dataset. Instead it i s able t o i ntegrate the k nowledg e from t h e con st raints into t he ori ginal graph and achiev e a b et t er partit ion. 7.5 T ra ns f er o f Knowledg e: Resting -State fMRI Ana lysis Finall y we a pply o ur al gori thm t o transf er learning on t he resti ng-stat e fMR I data . An fMRI sca n of a p erso n consi sts o f a sequence of 3 D images over time. W e can construct a graph from a given scan such that a no de i n the graph corresp o nd s to a vo x el in the i m age, a nd the edge weigh t b etw een two no des is (the absolut e v alue of ) the correl atio n between the two ti me sequences asso cia ted wi th the tw o vo xel s. Previous work has shown that by apply ing sp ect r al cl u s t ering to the resting - state fMRI we can find th e substructures in the br ain that ar e p erio dic a lly and simultaneously act iv at ed o ver time in t he resting state, w hich may i ndicate a netw o rk asso ciated with certain functi ons (v a n den Heuvel et al (20 08)). One o f t h e challenges o f rest i ng-sta te fMRI a nalysis is inst abili ty . Noise can b e easily introduced into th e scan result, e.g. t he sub ject m ov ed his/her hea d dur i ng the sca n, the sub ject was not at resting sta t e (act ively thinking a b out things during the scan), etc. Co nsequently , the result of sp ect ral cl ustering b ecomes inst able. If we appl y sp ectral clustering to tw o fMRI scans o f the same p erso n on tw o di fferent days, the no rmal ized min-cut s on the tw o different scans are so different that t hey provide lit tle insi ght i nto the brain activ ity of the sub ject (Fig. 12(a) and (b)). T o ov erco me t his pro blem, we use our f o rmulation to tr ansfer knowledge fro m Sca n 1 to Scan 2 and get a co nstrai ned cut, as shown in Fig . 12(c). Thi s cut represents what the two scans a gree on. The pa ttern captu r ed b y Fig. 12( c) is act ually the default m o de network (DMN), which is t he netw o rk t hat is p erio di cally activ a ted at resting state (Fig. 12(d) shows the idealized DMN as sp ecified by do main exp ert s). T o further illust rate the practi ca bility of our a pproach, we transf er the ideal ized DMN in Fig. 1 2(d) to a set of fMRI scans of eld er ly sub jects. The dataset was 26 Xiang W ang et al. 10 20 30 40 50 60 70 10 20 30 40 50 60 (a) Ncut of Scan 1 10 20 30 40 50 60 70 10 20 30 40 50 60 (b) Ncut of Scan 2 10 20 30 40 50 60 70 10 20 30 40 50 60 (c) Constrained cut by transferr ing Scan 1 to 2 10 20 30 40 50 60 70 10 20 30 40 50 60 (d) An idealized default mode netw ork Fig. 12 T ransfer learning on fM RI scans. collect ed a nd p r o cessed wi thin the research program of the Universit y of C alifo rnia at Davis Alzheimer ’ s Disea s e Center ( UCD ADC). The sub jects were categ orized into two gro ups: those diagnosed wit h cognitive syndr o me (20 indivi duals) and those witho ut cognit ive syndro me (11 i ndividua ls). F or each indiv idual scan, we enco de the i dealized DMN into a constra int m atrix (using the RBF kernel), and enforce the co nstrai nts onto the origina l fMRI scan. W e t hen compute t he cost of the const rained cut tha t is t he most simi lar to the DMN. If the cost of the constrai ned cut i s hi gh, i t mea ns there is great di sagreement between t he o rigin a l graph and t he gi ven co nstrai nts ( the ideal ized DMN), and vice versa. In other words, the cost of the constr ained cut can b e interpreted a s the cost of transferri ng the DMN to the parti cular f MRI scan. In Fig. 1 3, we plot the co sts of tr ansferri n g t he DMN t o bo th sub ject gro ups. W e can clear l y see that t he co sts of tra nsferring the DMN to p eople without cognit ive syndrome tend to b e low er tha n to p eopl e w ith cogni t ive syndrome. Thi s conform s well t o the observ ation made i n a recent study that the DMN i s o ften disrupted for p eople with the Al zheimer’ s disease (Buckner et al (2 0 08)). 8 Conclusion In this work we prop osed a pri ncipled and flexible fra mework for constra ined sp ec- tral clust ering tha t can incorp o rate large a mounts of bo th hard and soft co n- On Constrained Sp ectral Cl ustering and Its A ppl i cations 27 0 5 10 15 20 25 30 35 0.8 0.85 0.9 0.95 Cost of Transferring the DMN individuals without cognitive syndrome individuals with cognitive syndrome Fig. 13 The costs of transferri ng the i dealized default mo de netw ork to the fMR I scans of t wo groups of elderly individuals. straints. The flexibili ty of our framework lends i tself t o the use of all typ es o f side inf ormat ion: pairwise constra ints, part ial lab el ing, alterna tive metri cs, and transfer learni n g . Our f ormulatio n is a nat ural ex tension to unco n st rained sp ec- tral clust ering and can b e so lved efficiently usi ng gener alized eig endecomp osi tion. W e demonstr a ted the effectiveness o f o ur a pproach on a v ariety of data sets: the synthetic Two-Mo on dataset, image segmentatio n, the UCI benchmarks, the mul- tili ng ual Reuters do cuments, and resti ng-stat e fMRI scans. The compar ison to existing techniques v al idated the a dv antage of o ur approach. 9 Ackno w l edgments W e grateful ly ackno w ledge suppo rt o f this research via ONR gra nts N00 014-0 9-1- 0712 Auto mated Discov ery and Expl anati on o f Ev ent Behavior, N0001 4-11-1 -0108 Guided Learning in Dynamic Environments and NSF Grant N SF I IS-080 1528 Knowledge Enhanced Cl u st ering. References Amini MR, Usunier N, Go utte C (2009 ) Learni ng from m ultipl e part iall y observed views - an applica tion to multili ngual text catego rizati on. In: Adv ances in Neural Informat ion Pro cessi n g Syst ems 22 (NIPS 2009 ), pp 28 – 36 Asuncion A, Newma n D (2007) UCI machine learning repo s i tory . U RL http://www .ics.uci.e du/ ~ mlearn/MLR epository. html Basu S, Davidson I, W agst aff K (eds) ( 2 008) C o nstrai ned Cluster ing: Adv ances in Algor ithms, Theory , and App l icati ons. Cha p m an & Ha ll/C RC Buc kner RL , Andrew s-Hanna JR, Schacter DL (20 08) The brain’s defaul t netw ork. Annals of the New Y ork Academy of Sci ences 1124 (1): 1–38 28 Xiang W ang et al. Colema n T, Saunder son J, Wirth A (2 008) Sp ectra l clust ering wit h inconsist ent advice. In: Pro ceedings of the 25th Internati onal Conference on Machine Lea r n- ing (ICML 2008 ), pp 1 52–1 59 Davidson I, Ravi SS (2 006) Identifying and generat ing easy sets of const raints for clustering. In: Pro ceedings of the 21st Natio n a l Conference on Artificia l In telli gence (AA AI 2 006) , pp 336– 341 Davidson I, Ra vi SS (2 007a ) The com plexity of no n-hierarchical clust er ing w ith instance and cluster level con s t raints. Data Min Knowl Discov 14 (1): 25–6 1 Davidson I, Ravi SS (2007 b) Intractabil ity a nd clustering w ith constrai nts. In: Pro ceedings o f the 2 4th Internatio nal C o nference on Machine Learni ng (ICML 2007 ) , pp 201 –208 Davidson I, Ravi SS, Ester M ( 2 007) Effici ent incremental co nstrai ned cluster- ing. In: Pro ceedings o f the 13th ACM SIGKDD Internatio nal Conference on Knowledge Discovery a nd Data Mini ng (KDD 2007) , pp 24 0–24 9 De Bie T, Suykens JAK, De Mo or B (2004) Learning from genera l l ab el con- straints. In: St ructural , Syntactic, a nd Stat istica l P attern Recognit ion, J oint IAPR International W o rkshops (SSPR/SPR 2004 ), pp 6 71–6 79 Drineas P , F ri eze AM, Kanna n R, V em pa la S, Vinay V (2004 ) C lusteri n g large graphs via the singul a r v al ue decomp osi tion. Machine Learning 56(1 -3):9 –33 Gu Q, Li Z, Han J ( 2011 ) Learning a kernel fo r multi-task clusteri ng. In: Pro ceed- ings of the 25th AAAI Confer ence o n Artifici al Intelligence ( AAAI 2011) v an den Heuvel M, Mandl R , Hulsho ff Pol H (200 8) Norm alized cut group clus- tering of resting -stat e fm ri dat a. PL oS ONE 3(4 ):e20 01 Horn R, Johnson C (1 990) Mat rix a nalysi s. Ca mbridge U niversity Press Hub ert L, A rabie P (1985 ) C ompari ng part itio ns. Jour na l o f Cl a ssificat ion 2: 193– 218 Ji X, Xu W (20 06) Do cument clusteri ng with pri or knowledge. In: Pro ceedings of the 29 t h Annual International ACM SIGIR Co nference on Research a n d Devel- opment in Inform a tion Ret riev al (SIGIR 2006 ), pp 4 05–4 12 Kamv ar SD, Klein D, Manni ng CD (20 03) Sp ectral lea rning. In: Pro ceedi ngs of the 1 8th Internationa l J oint C o nference on Art ificial Intelligence (IJCA I 2003) , pp 561– 566 Kuhn H, T uc ker A (19 82) No nl inear programmi ng. ACM SIGMAP Bull etin pp 6–18 Kulis B, Basu S, Dhil lon IS, Mo oney RJ (2 005) Semi-sup ervi sed graph clusteri ng: a kernel approach. In: Proceedi ngs of the 22nd Internati onal Co nf erence on Machine Learni n g (ICML 2 0 05), p p 4 57–4 64 Li Z, Liu J , T ang X ( 2009 ) Constr a ined clu s t ering via sp ectral regul arizat ion. In: IEEE Conference o n Comput er Vision and Pattern Recognit ion (CV PR 2009 ), pp 421– 428 Lu Z, Ca rreira -Pe rpi ˜ n´ an M ´ A (2 008) Constr ained sp ect ral cl ustering thro ugh affin- ity propa gati on. In: IEEE Confer ence o n Comput er Visi on and P attern Recog - nitio n (CVPR 200 8) von Luxburg U (2 007) A tutori al on sp ect ral cl ustering. St atist ics and Computi ng 17(4 ) :395 –416 Martin D, F owlkes C, T al D, Ma l ik J (20 01) A data ba se of human segmented natural ima ges and its appl icati on to ev alua ting segmentation al gori t hms a nd measuring ecolo gical sta tisti cs. In: Pro ceedings of the 8t h In ternati onal C onfer- ence On Comput er Visio n ( IC CV 20 01), vol 2, pp 416 – 423 On Constrained Sp ectral Cl ustering and Its A ppl i cations 29 Ng A Y, Jor dan MI, W eiss Y (2001 ) On sp ectral clustering : Ana lysis a nd a n algo - rithm. In: Adv ances in Neura l Info rmat ion Pro cessi ng Sy st ems 22 (NIPS 20 0 1), pp 849– 856 P an SJ, Y ang Q (2 010) A survey o n transf er lear n i ng. IEEE T ra ns Knowl Da ta Eng 22(1 0):1 345– 1359 Shi J, Mali k J ( 2000 ) No rmal ized cuts and ima ge segmentatio n. IEEE T rans Pat- tern Anal Mach Intell 2 2(8) :888 –905 W ang F, Li T, Zhang C (200 8 ) Semi -sup ervised clust ering via matrix fact orizat ion. In: Pro ceedings of the 8th SIAM Internationa l Conference o n Da ta Mini ng (SDM 2008 ) , pp 1–1 2 W ang F , Di n g CHQ, Li T (20 09) In tegrat ed KL (K- m eans - Lapla cian) clustering : A new clustering approa ch by co mbining attri bute data and pair wise rel atio ns. In: Pro ceedings of the 9th SIAM Internationa l Conference o n Da ta Mini ng (SDM 2009 ) , pp 38– 48 W ang X, Davidson I ( 2010 ) Flexi ble constrai ned sp ectra l clustering . In: Pr o ceed- ings of t he 16th ACM SIGKDD Internationa l Co nference on Knowledge Discov- ery and Data Mining (KDD 201 0), pp 5 6 3–5 7 2 White S, Smyth P (200 5) A sp ect ral clusteri n g appro ach to finding co mmunities in graph. In: Pro ceedi n g s of the 5th SIAM Internationa l Co nference on Data Mining (SDM 200 5), pp 7 6 –84 Xu Q, desJa rdins M, W agst a ff K (2 0 05) C onstra ined sp ectra l cl ustering under a lo cal proximi ty structure assum ption. In: Pro ceedings of the 18t h In ternati onal Florida Arti ficial Intelligence Research So ciety Confer ence, pp 866–8 67 Y u SX , Shi J (2001) Groupi ng with bia s. In: Adv ances in Neural Informatio n Pro cessing System s 22 (NIPS 2 001) , pp 1327 –133 4 Y u SX, Shi J (2004 ) Segmentation given partia l grouping const raints. IEEE T rans P attern Anal Mach Intell 26(2) :173 –183

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment