On Constrained Spectral Clustering and Its Applications

Noname manuscr ipt No. (will be inserted by the editor) On Constrained Sp ectral Clustering and Its Applications Xiang W ang · Buyue Qian · Ian Davidson Receiv ed: date / Accepted: date Abstract Constra ined cl ustering has b een well-studied for a l gori thms such as K - means and hierarchical clust ering. Ho wev er, how to satisfy m any const raints i n these algori thmic setting s has been shown to b e in tract a ble. One alt ernati ve t o enco de m any const raints is to use sp ectral cl ustering , whi ch remai ns a developing area. In this pa p er, we prop ose a ﬂexi ble framework for co nstrain ed sp ectr al clus- tering. In contrast to some previo us eﬀo rts that impl icitl y enco de Must-Link and Cannot -Link constrai nts b y mo d i fying t he gra ph Lapl acian o r constraini ng the u n - derlying ei g enspace, we p r esent a mo re natur al and principl ed formulation, which explicit ly enco des the const r aints as part of a constrai ned opt imiza tion problem. Our m et ho d oﬀers seve ral pr actica l adv an tages: i t can encode the deg ree o f b e- lief in Must-Link and Cannot -Link constr aints; it guarantees to low er-b ound how well the gi ven constrai nts are satisﬁed using a us er -sp eciﬁed threshol d; it can b e solved deter m inist i cally in p olynom ial t ime throu g h generali zed eig end eco mp osi- tion. F urt hermore, by i nheritin g the ob jective funct i on from sp ectral clust ering and enco di ng the co nstrai nts explicit ly , m uc h of the existing analysi s of uncon- strained sp ectra l clustering techniques rema ins v alid for o ur formulati o n. W e v al- idate t h e eﬀectiveness of our a pp r oach by empirica l results on b ot h art iﬁcia l and real da tasets. W e also dem onstra te an innov ative use of enco d i ng la rge number of constrai nts: t r ansfer l earning via constra ints. Keywords Sp ectral clusteri ng · C onstra ined clust er ing · T ransfer learning · Graph parti tion X. W ang Departmen t of Computer Science, University of Calif ornia, Davis. Davis, CA 95616, USA. E-mail: xiang@ucdavis.edu B. Qian Departmen t of Computer Science, University of Calif ornia, Davis. Davis, CA 95616, USA. E-mail: byqian@ucda vis.edu I. D a vidson Departmen t of Computer Science, University of Calif ornia, Davis. Davis, CA 95616, USA. E-mail: davidson@cs.ucda vis.edu 2 Xiang W ang et al. −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (a) K-means −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (b) Sp ectral clustering −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (c) Spectral clustering −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (d) Constrained sp ectral clustering Fig. 1 A motiv ating example for constrained sp ectral clustering. 1 Intr oduction 1.1 Background and Mot iv at i on Sp ect ral clust ering is an im p ortant cl u st ering technique t hat has b een extensively studied in t he image pro cessing , data mining, and machine learni ng communi- ties (Shi a nd Mal ik (20 00); von Luxburg (2 007); Ng et al (2001)). It is considered sup er ior to tra ditio nal clustering algori thms lik e K -mea ns in ter ms of having deter- minist i c p oly nomia l-tim e solut ion, the a bility to mo del ar bitra r y shap ed clu s t ers, and its equiv alence to certa in g raph c ut probl ems. F o r exa m ple, sp ectr al cluster- ing i s abl e to captur e the underlyi ng mo on-sha p ed clust ers as shown in Fig. 1( b) , whereas K -means would f ail (Fig . 1(a) ). T he a dv antage of sp ectral clus t ering has also b een v a l idated by many real -world appli catio ns, such as i mage segmentatio n (Shi and Mali k (200 0)) and m ining so cial netw o rks (Whi te and Smyth (200 5)). Sp ect ral clusteri ng was originall y prop osed to a ddress a n unsup er vised l ea rning problem: the da ta i nstances are unla b eled, a nd a ll av ailabl e inform atio n is enco ded in t he graph L aplaci an. Howev er, there are cases w here unsup erv ised s p ect ral clustering b ecom es insuﬃcient. Usi ng the same toy da ta, as shown in (Fi g. 1(c)) , when the two mo ons are under-samp l ed, the clust ers b eco me so sparse t hat the separati on of them beco m es diﬃcult. T o help spect ral cl ustering r ecov er from a n undesirable part itio n, we can intro duce si de inform atio n i n v ario us form s, i n either small or large amo unts. F o r exampl e: On Constrained Sp ectral Cl ustering and Its A ppl i cations 3 1. P airwise constraint s : Domai n exp erts may expl icitl y a ssign cons t raints that state a pair of inst ances must b e in the sa me cluster (Must-Link , ML fo r short) or that a pair of i nstances cannot b e in th e sa m e cluster (Ca nnot-Li nk, CL for short). F or instance, as shown in Fig. 1(d), we a ssigned sev eral ML (sol id lines) a nd CL (dashed lines) constra ints, then appli ed our constrai ned sp ectral clustering alg o rithm , which we w ill describ e later. As a result, the two mo ons were successfully recov er ed. 2. P artial lab eling : T here can b e la b el s on some of the inst ances, which are neither complet e nor exhaustive. W e demonstr ate in Fig. 9 t h a t even small amounts o f lab eled info rmati on can g reatly im prov e clustering resul ts when compared again s t the ground truth part itio n, as inferred by the lab els . 3. Alternative weak distance metrics : In some sit uatio ns ther e may b e mo re than o ne dist ance met rics av ail a ble. F or example, i n T able 3 and accompa nying paragr aphs we descr ib e cl ustering do cu m ents using dista nce functio ns based on diﬀerent langua ges (f ea tures). 4. T ransfer of knowledge : In the context of t ransfer lea rning (Pan and Y ang (201 0)), i f we treat the g raph L a placia n a s the targ et doma in, we could t ransfer knowledge from a diﬀerent but related gra ph , which can b e viewed as t he source domai n . W e discuss this dir ect ion in Sect ion 6.3 and 7.5. All the af o rementioned si d e inf ormat ion can b e tra ns f ormed into pairwi se ML and CL const raints, w hich co uld eit her b e hard (binary ) or soft (degree o f b elief ). F or example, if the side infor mati o n comes from a source gra ph, we ca n construct pairwi se co n s t raints by assuming that the more si mila r tw o i nstance are in t he source gr aph, the m ore li kely they b elo ng to t he sa me cluster in the t arget g raph. Consequently the const r aints shoul d naturall y b e represented by a degree of b elief, rather than a binary assert ion. How to m a ke use of these si de info r mati on to imp r ov e clusterin g fal ls into the ar ea of const rained cluster ing (Basu et al (2008)). In general, const rained clustering is a categor y of techniques t hat try to incorp ora te ML and CL co n- straints into ex isting cl ustering schemes. It has been well studied on a lgor ithms such as K -means clustering, mix ture m o del, hiera rchical clusteri ng, and density- based cl ustering . Previo us studies sho wed that sa tisfyi ng al l const raints at o nce (Davidson and Ravi (20 07a)), increm entall y (Da vidson et a l (2 007)), or even prun- ing constraints ( Davidson and Ravi (200 7b)) is intractable. F urt hermore, it was shown that a lgori thms tha t buil d set part i tions incrementally (such as K -means and EM) are prone t o b eing ov er -constra i ned (Davidson and R avi (2 006)). In con- trast, incorp o rating constraints into spect ral clusteri ng i s a prom ising directio n since, unlike exist ing al gorit hms, all data instances are assig ned simultaneousl y to clusters, even if the given co nstrai nts are inconsi stent. Constra ined sp ectral clu st ering is still a developing area . Prev i ous work on this topic can b e div ided into two catego ries, based on how they enforce t he constrai nts. The ﬁr st categ ory (Kamv ar et al (200 3); Xu et al (2 005); Lu a nd C a rreira -Perpi˜ n´ an (200 8); W ang et a l (20 09); Ji and Xu (2 006)) directly m anipula te the graph Lapl a- cian (or eq uiv alently , the aﬃni ty mat rix) a cc o rding to the given constr aints; then unconstra i ned sp ectral cluster ing is appl ied o n the mo diﬁed graph Laplaci an. The second categ ory use constra ints t o restrict the fea sible soluti on spa ce (De Bie et al (200 4); Colema n et al (200 8); Li et al (200 9); Y u and Shi (200 1, 2004)). Existi ng metho ds in b oth cat egories shar e several limi tati ons: 4 Xiang W ang et al. – They are desig n ed to ha ndle onl y binary const r aints. Howev er, as we h ave stated a b ove, in many real-world appli catio ns, con s t raints are made av ail able in t he form of real -v alued degree of b el i ef, ra ther than a yes or no assert ion. – They aim to sat isfy a s many con st raints as p o ssible, which co uld lead to inﬂex- ibili ty i n p r actice. F o r exampl e, th e given set of const raints could be noi sy , and satisfy ing som e of t he constr aints could act ually hurt the overall p erfor m ance. Also, it is reaso nable to igno r e a small p orti o n of constra ints in exchange for a clustering w ith muc h lower co st. – They do no t o ﬀer a ny nat ural interpretatio n of either the wa y that const raints are enco ded or the impl icati on o f enfo rcing them. 1.2 Our Contributi ons In this pa p er , we st udy how t o inco rp orat e large a mounts of pa i rwise constrai nts into sp ect ral clustering , in a ﬂexible m a nner that a ddresses t he limita tions of previous work. Then we show the pract ical beneﬁt s of o ur app r oach, including new appli c a tions previo usly not po ssible. W e extend beyond binary ML/CL constra ints and propo se a more ﬂexi ble framework to accomm o date general-type side informati on. W e all ow the binar y constrai nts t o b e relaxed to real-v alued degree o f b elief that tw o da ta insta nces bel ong to the sa me c l uster o r two diﬀerent clust ers. Moreov er, inst ead of try ing to sa t isfy each and every constra i nt that has b een given, we use a user-sp eciﬁ ed threshold to lower b ound how well the g iven constrai nts must b e satisﬁed. Ther e- fore, our metho d pro vides maximum ﬂe xibili t y in terms of both representing constrain ts and satisfying them . This, in a dditi o n to handl ing large amounts of constrai nts, al lows the enco di ng of new s tyles of inform atio n such as entire gr aphs and alterna tive dist ance metri cs in thei r raw form witho ut consi d er ing issues such as constr a int inc o nsistenci es and ov er- co nstrai ning. Our co ntributions are: – W e pro p ose a pr incipled fra mework for co nstrai ned sp ectral clustering that ca n incorp or ate larg e amo unts o f b oth hard and soft cons t raints. – W e show how to enforce constraints in a ﬂexible wa y: a user-sp eciﬁed threshold is i ntroduced so th a t a lim i ted amount of co nstrai nts can b e ignored in exchange for l ower clust ering cost. T his al lows i ncorp ora ting side inform a tion in its raw form w ithout consider i ng iss u es such a s inconsist ency and ov er-constra ining. – W e ex tend t he o b jective function of unconst r ained sp ect r al cl ustering b y enco d- ing const raints expl icitl y a nd creating a novel const rained o ptimi z a tion prob- lem. Thus our formulatio n natura lly cov ers unconstra ined sp ectral clusteri ng as a sp ecial case. – W e sh ow that o ur o b jective function can b e tur ned into a g eneraliz ed eigenv al ue problem, which can b e solved determi nistica lly in p ol ynomi a l tim e. This i s a ma jor adv a ntage ov er const rained K -means cl ustering , which produces non- determini stic solutio ns whil e b ei ng i ntractable even for K = 2 ( Drineas et al (200 4); Davidson and Ravi (2 0 07b)). – W e i nterpret our fo rmulation from bo th the graph cut pers p ect ive and the Laplaci an embedding p ersp ecti ve. On Constrained Sp ectral Cl ustering and Its A ppl i cations 5 – W e v alida te the eﬀecti veness of our approach and its adv antage over ex isting metho ds usi n g standard b enchmarks a nd new innov a tive appli catio ns such as transfer learning . This pap er is an extensi on of our previous work (W ang and Davidson (2 010)) with the fo l lowing additi ons: 1) we extend o u r alg orithm from 2-way pa r titi on to K -way pa r titi on ( Section 6 .2); 2) we add a new g eometr ic interpretation to our algorit hm (Sect ion 5. 2); 3) we show how to apply ou r algo rithm to a novel applica tion (Sectio n 6.3), na mely tra nsfer lear ning, and test i t wit h a r eal-world fMRI dat aset (Sect ion 7.5); 4 ) we present a much mo r e comprehensi ve exp eriment section wit h more task s co nducted on more data sets (Sect ion 7 .2 and 7.4). The rest of the paper is o rgani zed as follows: In Section 2 we brieﬂy survey previous work on constrai ned spect ral clustering ; Sect i on 3 provides pr el imina ries for sp ectr al clus t ering; i n Secti on 4 we formal ly i ntroduce o ur f ormulatio n f or constrai ned sp ectra l clust er ing a nd show how to sol ve it eﬃciently; i n Sectio n 5 we interpret our o b jective fr om two diﬀerent p ersp ect i ves; in Sect ion 6 we di scuss the implementati o n of our a lgori thm and p o ssible ex tensions; we em pirical ly ev a luate our approach in Secti o n 7; S ect ion 8 concludes the pap er. 2 Related W ork Constra ined clustering is a ca t egory o f m et ho ds t hat ext end c l ustering fro m unsu- per vised setti ng to semi-sup ervi sed set t ing, where si de inform atio n is av ai lable in the form of, or ca n b e conv erted into, pairw ise constra ints. A n umber o f al g orit hms hav e been pro p o sed on how to incorp ora te constr aints i nto spectr al clust ering, which can b e gro up ed i nto tw o cat egories. The ﬁrst categ ory manipul ates the graph La placia n directly . Ka mv ar et al (200 3) prop osed the sp ectra l learning algo r ithm that set s t he ( i, j )-t h entry o f the aﬃnity matri x to 1 if t here is a ML between no de i and j ; 0 for C L . A new graph Laplaci an is then co mputed based on the mo diﬁed aﬃni ty ma trix. In (Xu et al (200 5)), the constr aints are encoded i n the sa me wa y , but a random walk matri x i s used instea d of t h e norm alized Lapl acian. Kulis et al (2005) pr op osed to add b oth p o s i tive (for ML) and negative (for CL) p enal ties t o the aﬃ nity ma- trix (they then used k ernel K -means, inst ead of sp ect ral clusteri ng, t o ﬁn d t he partit ion based on the new kernel). Lu and Car r eira-Perpi ˜ n´ an (20 08) prop osed to propaga te the constra ints in the a ﬃnity m atrix . In J i and Xu ( 2006); W ang et a l (200 9), the graph Laplaci an is mo diﬁed by combining the const r aint matri x as a regular izer. The limi tati on of these approa ches is that there is no pr i ncipled wa y to d eci de the weights of the constra ints, and there i s no gu a rantee tha t how well the give constra ints wi ll b e satisﬁed. The second categ ory manipula tes t he eigenspace directl y . F or exam ple, the subspace trick i ntroduced by De Bie et al ( 2 004) alters the eigenspace whi ch t he cluster indi cator vector is pro jected onto, based o n the given constra ints. Thi s technique was later ext ended in Coleman et a l ( 2008) to accom mo dat e inco nsis- tent co nstrai nts. Y u and Shi ( 2001, 2 004) enco ded pa rtia l g r ouping info rmati on as a subspace pro jectio n. Li et al (200 9) enforced co nstrai nts by regula rizing t he sp ec- tral emb edding. T his type of a pproaches usually st rictl y enf orce g iven co nstrai nts. As a result, the results are often over-constrained, whi ch m akes the algor ithms sen- 6 Xiang W ang et al. T able 1 T able of notations Sym bol Meaning G An undirected (w eigh ted) graph A The aﬃnity matrix D Th e degree matrix I The identit y m atri x L/ ¯ L The unnormalized/normalized graph Laplacian Q/ ¯ Q The unnormalized/normalized constrain t matrix vol The volume of graph G sitive t o noi se a nd i nconsistenci es in the constra int set. Mo reov er , it is non-t rivia l to extend these approa ches to incorp ora te soft constr aints. In addi tion, Gu et al (201 1 ) p r op osed a sp ectral kernel desig n that co mbines multiple cl ustering ta sks. The learned kernel is constra i ned in such a way tha t the data di stribut ions o f any t wo ta sks are as cl ose as p ossi ble. Their problem setting diﬀers fr om o urs b ecause we aim to p er f orm singl e-task cl ustering by us- ing two ( disagreei ng) data so urces. W ang et al (200 8) show ed how to incorp or ate pairwi se constrai nts into a p enalized matri x factor izati on fra m ework. T heir m atrix approximati on ob jective functio n, which is diﬀerent from our norma lized min-cut ob jective, is solved by an EM-like al gorit hm. W e would l ike to st r ess that the pros a nd cons of sp ectra l clusteri ng a s com- pared to other clusteri ng schemes, suc h a s K -means cl ustering , hierar chical clus - tering, etc., hav e b een thoro ug hly studi ed and well est ablished. W e do not clai m that constra ined spect ral cluster i ng is universally sup eri or to ot her constra ined clustering schemes. The go al of t his work i s to provide a way to incorp ora te con- straints i nto spect ral cluster ing that is m ore ﬂexible and p r incipled as com pared with existi ng const rained spect ral clustering techniques. 3 Background and Preli m inaries In thi s pap er we foll ow the standar d graph mo del that is com monly used i n the sp ect ral clusteri n g litera ture. W e reit er ate some o f the deﬁnitions a nd prop erties in this sectio n, such as gra ph Lapla cian, nor mali zed min-cut , eigendecomp o sitio n and so for t h, to make thi s paper self-contained. Readers who are famil iar w ith the ma teria ls can ski p to o u r formulati on in Section 4 . Im p ortant no tati ons used througho ut th e rest of t he pap er a re l isted in T a ble 1. A co llecti on of N dat a instances i s m o deled by an und i rected, weigh ted graph G ( V , E , A ), where each data instanc e corr esp o nds to a vertex (no de) in V ; E is the edge set and A is the asso ciated aﬃnity ma trix. A is symmetri c and non-nega tive. The dia gona l ma trix D = di ag( D 11 , . . . , D N N ) is called the degr ee m atri x of graph G , wher e D ii = N X j =1 A ij . Then L = D − A is called the unno rmal ized gra ph La placia n of G . Assum ing G is connected (i.e. any no de is rea chable fr om a ny other no de), L has the fo llowing prop erties: On Constrained Sp ectral Cl ustering and Its A ppl i cations 7 Property 1 (Properties of graph Laplacian (von Luxburg (2007))) L et L b e the gr aph L aplacian of a c onne cte d gr aph, then: 1. L is symmetric and p ositive semi -deﬁnite. 2. L has one and only one eigenvalue e qual to 0, and N − 1 p ositive eigenvalues: 0 = λ 0 < λ 1 ≤ . . . ≤ λ N − 1 . 3. 1 is an eigenve ctor of L with eigenvalue 0 ( 1 is a c onstant ve ctor whose entries ar e al l 1 ). Shi and Malik (200 0) showed that the eigenv ect ors of the gr aph La placi a n can be relat ed t o the no rmal ized min-cut (Ncut ) o f G . The ob jecti ve functi on ca n be writt en as: argmi n v ∈ R N v T ¯ L v , s.t. v T v = vol , v ⊥ D 1 / 2 1 . (1) Here ¯ L = D − 1 / 2 LD − 1 / 2 is call ed t he normali ze d graph Lapl acian (von Luxburg (200 7)); v ol = P N i =1 D ii is the volume of G ; t he ﬁrst constrai nt v T v = v ol norma lizes v ; the second constra int v ⊥ D 1 / 2 1 rules out t he pri ncipal eigenv ector of ¯ L as a t rivia l soluti on, b ecause it do es no t deﬁne a mea ningful cut on th e graph. The r elaxed cluster i ndicato r u can b e recov ered from v as: u = D − 1 / 2 v . Note t hat the result of spectra l clustering i s so l ely deci ded by t he aﬃ nity structure of gr a ph G as enco ded in the matr i x A ( and thus th e graph Laplac i an L ). W e wi ll then describ e our extensio ns on how to incorp o rate si d e inform atio n so that t he resul t of clustering w ill reﬂect b o th the a ﬃnity structure o f t he g raph and the structure of the side info rmati on. 4 A Fle x ible F ramework for Constrained Spe ctral Clustering In this secti on, we show how to i ncorp ora te side i nforma tion in to sp ectr al cl u st ering as pairw ise constraints. Our formulat i on allows b ot h hard and soft constr aints. W e pro p o se a new co nstrai ned opti m izati on formulatio n for constr a ined sp ect ral clustering . Then we show how to solve the ob jective function by co nv erti ng it into a general ized eig env alue system. 4.1 The Ob jective F unction W e enco de side informa tion with an N × N const raint matr ix Q . T ra ditio nally , constrai ned clusteri ng only accom mo dat es bina ry co nstraints, na mely Must-Link and Canno t-Link: Q ij = Q j i =      +1 if M L ( i, j ) − 1 if C L ( i, j ) 0 no side informa tion av ailabl e . 8 Xiang W ang et al. Let u ∈ {− 1 , +1 } N be a clust er indi cator vector, where u i = +1 if no de i b elo ngs to cluster + and u i = − 1 i f no de i bel ongs to cluster − , then u T Q u = N X i =1 N X j =1 u i u j Q ij is a m easure of how well the constr aints i n Q ar e sa t isﬁed by the assi gnment u : the mea sure wil l increase by 1 if Q ij = 1 and no de i and j ha ve the sa me si gn in u ; it wi l l decrease by 1 if Q ij = 1 but node i and j have diﬀerent si gns in u o r Q ij = − 1 but no de i and j hav e the sam e si gn in u . W e ext end the ab ov e enco ding scheme to accommo date so ft co nstrai nts by relaxi ng the cl uster indi cator vector u as well as t he const raint ma trix Q such that: u ∈ R N , Q ∈ R N × N . Q ij is p o sitive if w e b eli eve no des i and j b elo ng to the same cluster; Q ij is negati ve if we b elieve no des i and j b elong to diﬀerent clust ers; the m agnit ude of Q ij indicat es how stro ng th e b el ief is. Consequently , u T Q u b ecomes a real-v a lued measure of how well t h e constrai nts in Q are sat isﬁed i n the relax ed sense. F or ex ample, Q ij < 0 means we b elieve no d es i and j bel ong t o di ﬀerent cl usters; in or der to improve u T Q u , we should assig n u i and u j with v alues of diﬀerent si gns; simil a rly , Q ij > 0 mea ns no des i and j a re bel ieved to b elo ng to the same cluster ; we should assign u i and u j with v alues o f the sa m e sign. T he la rger u T Q u is, the b ett er the cluster a ssignment u conform s to the given constra ints in Q . Now g iven this real-v alued m easure, rather t han trying to sa tisfy a ll the con- straints in Q indi vidual ly , we ca n lower-bound this measure with a constant α ∈ R : u T Q u ≥ α. F ollowing the notat ions in Eq.(1), we substitut e u wit h D − 1 / 2 v , ab ove inequa lity beco mes v T ¯ Q v ≥ α, where ¯ Q = D − 1 / 2 QD − 1 / 2 is the normalize d co nstrai nt matri x. W e a pp end t his l ow er-bo und constraint to th e ob jective function of uncon- strained sp ect ral cluster ing i n Eq.(1) and we hav e: Problem 1 (Constrained Sp ectral Clustering) Given a norm alized gr aph Lapl a- cian ¯ L , a normal ized constr aint matrix ¯ Q and a threshol d α , we wan t to optim izes the foll owing ob jecti ve function: argmi n v ∈ R N v T ¯ L v , s.t. v T ¯ Q v ≥ α, v T v = v ol, v 6 = D 1 / 2 1 . (2) Here v T ¯ L v is t he co s t of the cut we wan t t o mini mize; the ﬁrst co nstrai nt v T ¯ Q v ≥ α is t o low er b o und how well t he const raints in Q ar e satisﬁed; the second co nstrai nt v T v = v ol norma lizes v ; the third const raint v 6 = D 1 / 2 1 rules out the tr ivial soluti on D 1 / 2 1 . Supp o se v ∗ is the o p t imal so lutio n to Eq.(2), then u ∗ = D − 1 / 2 v ∗ is the optim al cluster indi cator vector. On Constrained Sp ectral Cl ustering and Its A ppl i cations 9 It is easy to see that the unc o nstrai ned sp ectra l clusteri ng i n Eq.(1) i s covered as a speci al case o f Eq.(2) wher e ¯ Q = I (no co nstrai nts a re enco ded) and α = v ol ( v T ¯ Q v ≥ v ol i s t rivia lly sati sﬁed given ¯ Q = I and v T v = vol ). 4.2 Solvi ng the Ob jecti ve F unct ion T o sol ve a constrai ned o ptimi zatio n problem, we follow t he Karush-Kuhn-T uc ker Theorem (Kuhn and T uck er (1 9 82)) to derive the necessary condit ions for the op- timal so lutio n to t he pro blem. W e can ﬁn d a set of candi d a tes, or fe asible solutions , that sat i sfy all the necessary co nditio ns. T hen we cho ose the optimal so lutio n among the feasible so luti o ns using bru t e-force m etho d, given the si ze o f the feasi - ble set is ﬁnite and sma ll. F or o ur ob jective f u nc t ion in Eq.(2), we intro duce the Lagrang e multipliers as follows: Λ ( v , λ, µ ) = v T ¯ L v − λ ( v T ¯ Q v − α ) − µ ( v T v − v ol ) . (3) Then according to the KKT Theorem, a ny feasible soluti on to Eq.(2) must satisfy the foll owing condi tions: (Stat ionari ty) ¯ L v − λ ¯ Q v − µ v = 0 , (4) (Prima l feasi bility) v T ¯ Q v ≥ α, v T v = vol , (5) (Dual feasibi lity) λ ≥ 0 , (6) (Compl ementary sl a ckness) λ ( v T ¯ Q v − α ) = 0 . (7) Note that Eq.(4) com es fr o m ta king t he deriv ative of Eq.(3) wi th respect to v . Also note tha t we dismi ss t he co nstraint v 6 = D 1 / 2 1 at th i s m oment, b ecause it can b e chec ked indep endently after we ﬁnd the feasi ble sol utions. T o solve Eq.( 4)-(7), we start w ith lo oking at the complementary sl ackness requirement in Eq.(7), which implies either λ = 0 or v T ¯ Q v − α = 0. If λ = 0, we will hav e a triv ial problem b ecause the second t erm from Eq. ( 4) will b e elim inated and t he pr oblem w ill b e reduced to unconstr ained sp ectr al cl ustering . Therefo re in t he foll owing we fo cus o n t he case where λ 6 = 0 . In thi s ca se, for Eq.(7) to hold v T ¯ Q v − α must b e 0. Consequently t he KKT conditio ns b ecome: ¯ L v − λ ¯ Q v − µ v = 0 , (8) v T v = v ol, (9) v T ¯ Q v = α, (10) λ > 0 , . (11) Under t he a ssumptio n tha t α is arbi trari ly assig ned by user a nd λ and µ are indep endent v ari ables, Eq.( 8-1 1) cannot b e so l ved expli citly , and it m ay pr o duce inﬁnite number of feasibl e soluti ons, i f o ne exist s. As a w ork aro und, we t emp ora rily drop t he assum ption t hat α is an arbitr ary v alue assigned by the user. Instead, we assume α , v T ¯ Q v , i.e. α is deﬁned as such that Eq.(10) ho lds. Then we intro duce an auxil iary v ar iable, β , which is deﬁned as the ra tio between µ and λ : β , − µ λ v ol. (12) 10 Xiang W ang et al. Now we substit ute Eq.( 12) into Eq.(8) we obtai n: ¯ L v − λ ¯ Q v + λβ v ol v = 0 , or equiv alently: ¯ L v = λ ( ¯ Q − β v ol I ) v (13) W e i m mediat ely notice that Eq.(1 3) is a genera lized eig env alue pr oblem fo r a given β . Next we show tha t the subs t ituti on o f α with β do es not co mprom i se our origi nal intention o f low er b o unding v T ¯ Q v in Eq.( 2). Lemma 1 β < v T ¯ Q v . Pr o of Let γ , v T ¯ L v , by left -hand multiplyi ng v T to b oth s i des of Eq.(1 3) w e have v T ¯ L v = λ v T ( ¯ Q − β v ol I ) v . Then incorp o rati n g Eq.( 9) a nd α , v T ¯ Q v we have γ = λ ( α − β ) . Now reca l l that L is po sitive semi-deﬁnite (Pro p erty 1), and so is ¯ L , which means γ = v T ¯ L v > 0 , ∀ v 6 = D 1 / 2 1 . Consequently , we have α − β = γ λ > 0 ⇒ v T ¯ Q v = α > β . In summary , our algor ithm works as foll ows (the exact impl ementation is shown in Algo rithm 1): 1. Generating candidates : The user sp eci ﬁes a v a lue for β , and we sol ve the generali zed eigenv a lue syst em g iven in Eq.(13). Note tha t bo th ¯ L and ¯ Q − β /vol I are Hermiti an matr ices, thus the general ized eig env alues a re g uaranteed to b e real numbers. 2. Finding the fe asible set : Removing g eneraliz ed eigenvectors asso cia ted with non-p osit ive eig env alues, and norma lize the rest suc h that v T v = v ol . Note that the t rivia l so lutio n D 1 / 2 1 is autom atica lly r emov ed i n this step b ecause if i t is a general i zed eig env ecto r in Eq.(13), the asso ci ated eigenv al ue would b e 0. Since w e hav e at m ost N genera lized eigenvectors, the n umb er of fea sible eigenv ectors is at most N . 3. Choosi ng the optimal sol uti on : W e choose from t he feasible sol utions the one t hat mini m izes v T ¯ L v , say v ∗ . According t o Lemm a 1, v ∗ is the opti mal soluti on to t he ob jective fun ct ion in Eq.(2) for any given β and β < α = v ∗ T ¯ Q v ∗ . On Constrained Sp ectral Cl ustering and Its A ppl i cations 11 Algorithm 1: Constr a ined Sp ectr al Clusteri ng Input : Aﬃnity matrix A , constrain t matrix Q , β ; Output : The optimal (relaxed) cluster indicator u ∗ ; 1 v ol ← P N i =1 P N j =1 A ij , D ← diag( P N j =1 A ij ); 2 ¯ L ← I − D − 1 / 2 AD − 1 / 2 , ¯ Q ← D − 1 / 2 QD − 1 / 2 ; 3 λ max ( ¯ Q ) ← the largest eigen v alue of ¯ Q ; 4 if β ≥ λ max ( ¯ Q ) · vol then 5 return u ∗ = ∅ ; 6 end 7 else 8 Solv e the generalized eigenv alue system in Eq.(13); 9 Remo v e eigen v ectors associated with non-posi tive eigen v alues and normalize the rest by v ← v k v k √ vol ; 10 v ∗ ← argmin v v T ¯ L v , where v is among the feasible eigenv ectors generated in the previous step; 11 return u ∗ ← D − 1 / 2 v ∗ ; 12 end 4.3 A Suﬃcient C onditi on for the Exist ence o f So lutio ns On o ne hand, our metho d describ ed ab ove is g uaranteed to generate a ﬁnite num- ber of feasibl e sol u t ions. On the other hand, we need to set β appropri ately so tha t the genera lized eigenv a lue system in Eq.(13) co mbined wi th the KKT conditio ns in Eq.(8-11) w ill gi ve ri se to at lea st one feasible sol ution. In this secti on, we discuss such a suﬃcient condi tion: β < λ max ( ¯ Q ) · v ol, where λ max ( ¯ Q ) i s the largest eigenv al ue of ¯ Q . In this case, the ma trix o n the right hand side of Eq.(13), namely ¯ Q − β /v ol · I , wi ll have at least o ne p o sitive eigenv al ue. Consequently , t he genera lized eigenv a lue system in Eq.(13) w ill hav e at lea s t o ne po sitive eigenv a lue. Moreov er, the n umber o f feasible ei g env ecto rs will i ncrease if we make β small er . F or exam ple, if w e set β < λ min ( ¯ Q ) v ol , λ min ( ¯ Q ) to be the small est ei genv al ue o f ¯ Q , then ¯ Q − β / v ol · I b ecom es p ositi ve deﬁnite. T hen the generali zed eigenv alue syst em in Eq. (13) will generat e N − 1 feasi ble eigenv ect ors (the trivi al solut ion D 1 / 2 1 with eigenv a lue 0 i s dr o pp ed). In practice, we norma lly choose t he v a lue of β w ithin the range ( λ min ( ¯ Q ) · v ol, λ max ( ¯ Q ) · v ol ) . In tha t range, t he great er β is, t he mor e th e solu t ion will b e biased towards sati s- fying ¯ Q . Ag ain, not e t hat whenever we hav e β < λ max ( ¯ Q ) · v ol , the v alue of α wil l always b e b ounded by β < α ≤ λ max v ol. Therefore we do not need to ta ke ca re of α explicit ly . 12 Xiang W ang et al. Fig. 2 An illustrative example: the aﬃnity structure says { 1 , 2 , 3 } and { 4 , 5 , 6 } while the node labeli ng (coloring) says { 1 , 2 , 3 , 4 } and { 5 , 6 } . 4.4 An Illust rati ve Example T o illust rate how our algo r ithm works, we presen t a toy exampl e as fol lows. In Fig. 2, we hav e a gra ph a sso ciat ed wit h the fo llowing aﬃnity matri x: A =         0 1 1 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0         Unconstra ined sp ectra l clustering will cut the gr a ph at edge (3 , 4) and split it i nto tw o symmet r ic par t s { 1 , 2 , 3 } and { 4 , 5 , 6 } (Fi g. 3 (a)). Then we intro duce constr aints a s enco ded in the fol lowing co nstrai nt matri x: Q =         +1 +1 +1 +1 − 1 − 1 +1 +1 +1 +1 − 1 − 1 +1 +1 +1 +1 − 1 − 1 +1 +1 +1 +1 − 1 − 1 − 1 − 1 − 1 − 1 +1 +1 − 1 − 1 − 1 − 1 +1 +1         . Q is essentially saying that we want to gro up no des { 1 , 2 , 3 , 4 } i nto one clust er and { 5 , 6 } the ot her. Alt hough thi s kind of “co mplete i nforma tion” co nstrai nt matr ix do es not happ en in practice, we use it here o nly to m ake the result more expli cit and intuitive. ¯ Q has two distinct eig env alues: 0 and 2 . 6667. As a nalyzed ab ove, β must b e small er than 2 . 666 7 v ol to guara ntee the exi stence o f a feasible so lutio n, and lar g er β means we want m o re constr aints i n Q to b e sa tisﬁed (in a r el axed sense). Thus we set β to v ol and 2 v ol resp ect ively , and see how i t w ill aﬀect the r esultant constrai ned cuts. W e solve the general ized eigenv a l ue system in Eq.(1 3), and plot the clust er indi cator vector u ∗ in Fig . 3 (b) and 3(c), resp ect ively . W e can see that as β increas es, no de 4 is drag ged from the gro up of no des { 5 , 6 } to t he group of no des { 1 , 2 , 3 } , which conf orms to our expect atio n that grea t er β v alue impli es bet ter constra int sat isfacti on. On Constrained Sp ectral Cl ustering and Its A ppl i cations 13 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 (a) Unconstrained 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 (b) Constrained, β = v ol 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 (c) Constrained, β = 2 vol Fig. 3 The solutions to the illustrative example in Fig. 2 with diﬀerent β . The x-axis is the indices of the instances and the y-axis is the corresp onding entry v alues in the optimal (relaxed) cluster i ndicator u ∗ . Notice that node 4 is biased tow ard nodes { 1 , 2 , 3 } as β i ncreases. 5 Interpr etations of Our F ormulation 5.1 A Gra p h Cut Interpretati on Unconstra ined spect ral clus t ering ca n b e interpreted as ﬁnding the N cut o f an unlab ele d graph. Si mila rly , our formulati on of constra ined spect ral cl ustering in Eq.(2) can b e interpreted as ﬁnd i ng the N cut of a lab ele d/c olor e d gra ph. Sp eci ﬁcally , supp ose we hav e an undir ected weigh t ed gra ph. The no des of the graph are co l ored i n such a w ay t h a t nodes of the sam e col or ar e adv ised to be a ssigned into the same cluster while no des of diﬀerent col ors are a dvised to be assi g ned into diﬀerent clust ers ( e.g. Fig. 2). Let v ∗ be the solutio n t o the constrai ned o ptimi zatio n problem in Eq.(2). W e cut the g raph into tw o pa r ts based on the v a lues of the entries of u ∗ = D − 1 / 2 v ∗ . T hen v ∗ T ¯ L v ∗ can be i nterpreted as the cost of the cut (in a rel axed sense), whi ch we mini mize. On the o ther hand, α = v ∗ T ¯ Q v ∗ = u ∗ T Q u ∗ can b e interpreted as t he purity of t h e cut ( also in a rel axed sense), accordi ng to the col or of the nodes i n resp ective sides. F or example, if Q ∈ {− 1 , 0 , 1 } N × N and u ∗ ∈ {− 1 , 1 } N , then α equa ls t o the n umber o f constrai nts in Q tha t ar e satisﬁed b y u ∗ minus the number of constr aints vio lated. More general l y , if Q ij is a p o sitive num b er, then u ∗ i and u ∗ j having the sa me sign wil l co ntribute p osi tively to the puri ty of the cut, w hereas diﬀerent sig ns will contribute neg atively to the purity of the cut . It i s no t diﬃcult to see tha t the purity can b e maxi mized when there is no pair of no des wit h diﬀerent colo rs t hat are a ssigned to the same side of the cut (0 viola tions) , which i s the ca se where all co nstrai nts in Q are satisﬁed. 5.2 A Geom etric Interpretat ion W e can a lso interpret ou r formulation as constr aining t he joint numerical range (Horn and Johnson (1990)) of t he gr aph Laplacian and the constra int ma trix. Sp eci ﬁcally , we consi der t he joint numerical ra nge: J ( ¯ L, ¯ Q ) , { ( v T ¯ L v , v T ¯ Q v ) : v T v = 1 } . (14) J ( ¯ L, ¯ Q ) essential ly ma ps a l l p ossibl e cuts v t o a 2-D pl a ne, where the x -co ordina te corresp onds to the cost of the cut, and the y -axis corresp o nds t o the co nstrai nt 14 Xiang W ang et al. −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (a) The unconstrained Ncut −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (b) The constrained Ncut 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.05 0.1 0.15 0.2 0.25 0.3 Cost of the Cut Constraint Satisfaction of the Cut unconstrained cuts constrained cuts unconstrained min−cut constrained min−cut lower bound α (c) J ( ¯ L, ¯ Q ) Fig. 4 The joint n umerical range of the normalized graph Laplacian ¯ L and the normalized constrain t matrix ¯ Q , as well as the optimal solutions to unconstrained/constrained sp ectral clustering. satisfa ction o f the cut. Acco rding to our ob jecti ve i n Eq.( 2) , we want to minimi ze the ﬁrst term while lower-boundi ng the second term. Therefor e, we a re lo oking for the leftm ost p oint a mong those that are ab ov e the hori zontal line y = α . In Fig. 4(c), we vi sualize J ( ¯ L, ¯ Q ) by plo tting al l the unconstrai ned cut s gi ven by sp ectral clusteri ng a nd all the constr ained cuts given by our al gorit hm in the joint numerical rang e, based on the graph La placia n of a T wo-Moo n dat aset wit h a randoml y generat ed const raint m atrix . The horizo ntal li ne and the arrow indi- cate the co nstrained area fro m which we can select fea sible soluti ons. W e can see that mo st of the unconst rained cuts prop osed by sp ectr a l clusteri ng are fa r b elow the threshol d, whi ch sugg est s sp ect ral cluster i ng ca nn o t lead to the g r ound truth partit ion (as shown i n Fi g. 4(b)) witho ut the help o f const raints. On Constrained Sp ectral Cl ustering and Its A ppl i cations 15 6 Impleme ntat ion and Extens i ons In thi s sectio n , we di s c us s some implem entation issues of our met ho d. Then w e show how to genera lize i t to K -w ay partit ion where K ≥ 2. 6.1 Const rained Sp ect ral Clusteri ng for 2-W ay Partiti on The r outine of o ur metho d is simila r to t hat of unconstra i ned spect ral clustering . The input of the algor i thm is a n aﬃnity matr i x A , the const raint ma trix Q , a nd a t hreshold β . Th en we solve the g enerali z ed eigenv a lue probl em in Eq.(13) and ﬁnd all the feasible g enerali z ed eigenvectors. T he out put is the opt imal (rel axed) cluster assignment indicat or u ∗ . In pract ice, a part itio n is often derived from u ∗ by assigni ng no des co rresp onding t o the p ositi ve entries in u ∗ to one side of the cut, and neg ative entries to the o ther side. Our al gori thm is summ arized in Al gorit hm 1. Since o u r mo del enco des soft constrai nts a s degr ee of b el i ef, inco nsistent con- straints in Q wi l l not co r rupt our algo r ithm. Instead, they are enfo rced implici tly by maxi mizing u T Q u . Note t hat if t he constra int matri x Q is generated from a partia l lab eli ng, t hen the constrai nts in Q w ill alwa ys be co nsistent. Run time analysi s: The runtime o f our algor ithm is dom inated by that of the generali zed eig endecomp osi tion. In ot her words, the com pl exity of our algo rithm is o n a par wi th that of unconst r ained sp ectr a l clust er ing in bi g-O no tati on, w hich is O ( k N 2 ), N to b e th e num b er of data instances and k to b e the number of eigenpai r s we need t o compute. Her e k is a num b er la rge enough to guara ntee the existence of feasible solut ions. In pra ctice we norma lly hav e 2 < k ≪ N . 6.2 Extensio n t o K -W a y P artit ion Our a lgor i thm can b e natu r ally extended to K -wa y part itio n for K > 2, following what we usuall y do for unconstr ained sp ectral clustering (von L uxburg ( 2007)): instead of o nly usi ng the opti mal feas i ble eig env ecto r u ∗ , we preserve to p-( K − 1) eigenv ectors a sso ciat ed wit h posi tive eigenv a lues, and p erfor m K -means al gori t hm based on that embedding . Sp eci ﬁcally , t he const raint ma trix Q f ollows the same enco di ng scheme: Q ij > 0 if no de i and j are b elieved to b elo ng t o t he same cl u st er, Q ij < 0 otherwi s e. T o guarantee w e can ﬁnd K − 1 feasi ble eigenv ectors, we set the t hreshold β such that β < λ K − 1 v ol, where λ K − 1 is the ( K − 1)-th l a rgest eig env alue of ¯ Q . Gi ven all t he feasibl e eigen- vectors, we pick t he top K − 1 in term s of minimi zing v T ¯ L v 1 . Let the K − 1 eigenv ectors form the col umns o f V ∈ R N × ( K − 1) . W e p erfo rm K -means clus t ering on the rows of V and g et the ﬁnal clustering. Algorit hm 2 shows t he complet e routine. Note tha t K -means is only one of m any p ossible discreti zatio n techniques tha t can derive a K -wa y parti t ion fro m the relaxed indicato r ma trix D − 1 / 2 V ∗ . D ue t o 1 Here we assume the trivial solution, the eigenv ector with all 1’s, has been excluded. 16 Xiang W ang et al. Algorithm 2: Constr a ined Sp ectr al Clusteri ng for K -wa y Partition Input : Aﬃnity matrix A , constrain t matrix Q , β , K ; Output : The cluster assignmen t indicator u ∗ ; 1 v ol ← P N i =1 P N j =1 A ij , D ← diag( P N j =1 A ij ); 2 ¯ L ← I − D − 1 / 2 AD − 1 / 2 , ¯ Q ← D − 1 / 2 QD − 1 / 2 ; 3 λ max ← the largest eigen v alue of ¯ Q ; 4 if β ≥ λ K − 1 vol t hen 5 return u ∗ = ∅ ; 6 end 7 else 8 Solv e the generalized eigenv alue system in Eq.(13); 9 Remo v e eigen v ectors associated with non-posi tive eigen v alues and normalize the rest by v ← v k v k √ vol ; 10 V ∗ ← argmin V ∈ R N × ( K − 1) trace( V T ¯ LV ), where the columns of V are a subset of the f easible eigenv ectors generated in the previous step; 11 return u ∗ ← kmeans ( D − 1 / 2 V ∗ , K ); 12 end the ortho gonal ity of the eigenvectors, they can b e indep endently discretized ﬁrst, e.g. we can repla ce St ep 1 1 of Algo rithm 2 wi t h: u ∗ ← k means ( sign ( D − 1 / 2 V ∗ ) , K ) . (15) This addi tiona l st ep can hel p allevi ate the inﬂuence of p o ssible out liers on the K -means step in some cases. Moreov er, no tice that the feasible eigenv ectors, which are the columns of V ∗ , are treated equal ly in Eq.(1 5). This may no t b e ideal in pra ct ice b ecause these candidat e cut s are no t equal ly fav ored by gra ph G , i.e. some o f them have hig h er costs than the other. Therefo re, we can weight the c o lumns of V ∗ with the inv erse of their resp ective cost s: u ∗ ← k means ( sign ( D − 1 / 2 V ∗ ( V ∗ T ¯ LV ∗ ) − 1 ) , K ) . (16) 6.3 Using Con st rained Spect ral Clusterin g for T ra n s f er Learni ng The co nstrai ned spect ral cl ustering fr amework na turall y ﬁts into the scenari o of transfer l earning b etween tw o grap h s. Assume we hav e two graphs, a source gr aph and a t arget grap h, w hich share the sam e set of no des but hav e diﬀerent sets o f edges ( o r edge weights). The goal is to transfer k nowledge from the sourc e g raph so that we ca n improve the cut o n t he targ et g raph. The k n owledge to t r ansfer i s derived from the source g r aph in t he form of soft constrai nts. Sp eci ﬁcally , let G S ( V , E S ) b e the source g raph, G T ( V , E T ) th e targ et graph. A S and A T are their respect ive a ﬃnity mat rices. Then A S can b e considered as a con- straint matri x with only ML constr a ints. It carri es the struct ural knowledge fro m the source graph, and we can trans f er it t o t he target graph using our constr ained sp ect ral cluster ing fo rmulation: argmi n v ∈ R N v T ¯ L T v , s.t. v T A S v ≥ α, v T v = vol , v 6 = D 1 / 2 T 1 . (17) On Constrained Sp ectral Cl ustering and Its A ppl i cations 17 α is now the lower b ound of how muc h knowledge from t he source gra ph must b e enforced on the targ et graph. T o so lutio n to this is simi lar t o Eq.(13): ¯ L T v = λ ( ¯ A S − β vol ( G T ) I ) v (18) Note tha t since t he l argest eigenv alue of ¯ A S corresp onds to a tri vial cut, i n practice we shoul d set the t hr eshold such that β < λ 1 v ol , λ 1 to b e t h e second largest eigenv al ue of ¯ A S . This wil l guara ntee a fea s i ble eigenv ect or that is non-t rivia l . 7 T es ting and Innov ative Uses of Our W ork W e b egi n with th r ee sets of exp eriments to test our approach on sta n d a rd sp ec- tral cl ustering dat a sets. W e then show that si nce our approach can hand l e la rge amounts of soft constra ints in a ﬂexib l e fashion, this o p ens up tw o innov a tive uses of o ur work: encoding multiple met rics for transl ated do cum ent cl ustering and transfer learning for fMRI anal ysis. W e aim to answer the fol lowing quest ions with the empiri cal st udy: – Can our algo r ithm eﬀectiv ely incor p o rate side informa tion a nd gener ate se- mantically meanin g ful part i tions ? – Do es our a lgori thm conv erge to the underlying grou n d truth parti tion as mor e constrai nts are provided? – How do es our a lgori thm p er f orm on real-world dataset s, as ev aluated agai nst ground t ruth lab eli ng, w ith co mpariso n to exist ing t echniques? – How well do es our alg orit hm ha ndle soft constrai nts? – How well do es our alg orit hm ha ndle large amou nts of const r aints? Recall that i n Section 1 w e li sted four diﬀeren t typ es of side informati on: explicit pairwi se constra i nts, part ial l ab eling , alterna t ive metri cs, and tra nsfer o f knowledge. The empir ical result s presented in t his secti on ar e arra nged accordi ngly . All but one (the fMRI scans) datasets used in our experim ents are publicly av ail able online. W e i mplemented our algo rithm in MA TLAB, wh i ch is av ai lable online at http://bayou. cs.ucdavis .edu/ or by contacting t he author s. 7.1 Explici t Pairwise C onstra ints: Imag e Seg m entation W e demo nstrat e the eﬀectiveness of our a lgori thm fo r ima ge segmentation usi ng explicit pairw ise const raints assigned by users. W e c ho ose t he i m age seg mentation applica tion for several r easons: 1) it is one of the applicat ions where sp ectral clust ering signi ﬁc a ntly outp erfo rms other clus- tering techniques, e.g. K -means; 2) the results of imag e segm entation can b e easily interpreted and ev a luated by h uman; 3 ) instea d of generat ing random constrai nts, we can provide sema ntically meanin g ful co nstrai nts to see if the constra ined par- titi o n confo rms t o o ur exp ecta tion. The images we used were chosen f rom the Berkeley Segmentation Da t aset an d Benc hmark (Mar tin et al ( 2001)). The orig inal i mages a re 480 × 320 grayscale im - ages in jp eg forma t. F or eﬃciency considera t ion, we compressed t hem to 1 0 % of the origina l size, which is 48 × 32 pixels, as shown in Fig. 5 (a) a nd 6(a). Then 18 Xiang W ang et al. 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (a) Original i mage 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (b) No constraints 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (c) Constrain t Set 1 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 (d) Constrain t Set 2 Fig. 5 Segmen tation of the elephant image. The images are reconstructed based on the relaxed cluster indicator u ∗ . Pixels that are closer to the red end of the spectrum belong to one segment and blue the other. The lab eled pixels are as b ounded by the black and white rectangles. 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (a) Original i mage 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (b) No constraint s 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (c) Constrain t Set 1 5 10 15 20 25 30 5 10 15 20 25 30 35 40 45 (d) Constrain t Set 2 Fig. 6 Segment ation of the face image.The i mages are reconstructed based on the r elaxed cluster indicator u ∗ . Pixels that are closer to the red end of the spectrum belong to one segment and blue the other. The lab eled pixels are as b ounded by the black and white rectangles. aﬃnity matr ix o f the imag e was com puted usi ng the RBF kernel, based on b ot h the p osit ions and the gr ayscale v a lues of t he pixels. As a ba seline, we used uncon- strained sp ectr al clusteri ng ( Shi and Malik ( 2 000)) to g enerate a 2-segmentatio n of the i mage. T hen we intro duced di ﬀerent sets of constra ints to see if they can generate exp ect ed segmentation. Note that the resul ts of i m age segmentatio n v ary with the num b er of segm ents. T o sav e us fro m the com p l icati ons of param eter tun- ing, which is irrelev a nt to t he contribution of this work, we a lways set the num ber of segments to b e 2. The resul ts a re shown i n Fig. 5 and 6. T o v i suali ze t he resul tant segmentation, we reco nstructed the image using the entry v al ues in t he rel axed cluster indi cator On Constrained Sp ectral Cl ustering and Its A ppl i cations 19 vector u ∗ . In Fig. 5(b ), t he unconstrai ned sp ect ral clustering pa rtiti oned the ele- phant ima ge i nto tw o parts: t he sky (r ed pi x els) and the two elephants and the ground (blue pixels) . T his i s not satis f ying in th e sense t hat it failed to isol a te the elephants from the background (the sky a nd the ground). T o correct this, we i n- tro duced constrai nts b y la b eling tw o 5 × 5 blo cks to b e 1 (as b ounded by t he bla ck rectangl es i n Fig. 5 (c)): on e a t the upper -right co rner of the imag e ( the sky) a nd the other at the lower-right corner (the ground) ; we also l ab eled two 5 × 5 blo cks on t he heads of the two elephants t o b e − 1 (a s b ounded b y the white r ectangl es i n Fig. 5( c)). T o genera t e the constr aint matri x Q , a ML was a d d ed b etw een every pair of pixel s with the sam e lab el and a C L was add ed b etween every p a ir of pixel s with diﬀerent la b el s. The pa ramet er β was set to β = λ max × v ol × (0 . 5 + 0 . 4 × # o f constr a ints N 2 ) , (19) where λ max is the la rgest eig env alue of ¯ Q . In thi s wa y , β is al wa ys b etween 0 . 5 λ max v ol and 0 . 9 λ max v ol , and it w ill gradua l ly increase as the num b er of co n- straints increases. F rom Fig . 5( c) we ca n see t hat with t he hel p of user superv i sion, our met h o d successfull y i sola t ed the t wo elephants (blue) from the background, which is the sky and the ground (red). Note that our metho d achiev ed this with very simple lab el ing: fou r squa red bl o cks. T o show the ﬂexibi lity of our m etho d, we tri ed a di ﬀerent set of const r aints o n the same elephant i mage with the same pa r ameter setting s. T his time we aimed to s epa rate t he two elepha nts fro m each other, which is imp ossi ble in the uncon- strained case b ecause the two elephants are not only simil ar in color (g rayscale v alue) but also adjacent in spa ce. Ag ain we used tw o 5 × 5 bl o cks (as bo unded by the black and white rectang les i n Fig. 5(d)), o ne o n the head o f the elephant on the left, l ab eled t o b e 1, and the other on the b o dy of the elephant o n the ri ght, lab eled to b e − 1. As s h own in Fig. 5 (d), our m etho d cut the i mage into two parts with one elephant on t he left (blue) and the other on the r ight (red), just li ke w h a t a human user would do . Simil a rly , we applied our metho d on a human face imag e as shown in Fig. 6(a ). The unco nstrai ned sp ect r al cl us t ering failed to isolate the h uman f ace from the background ( Fig. 6(b )). T hi s is becau s e the tal l hat break s the spat ial contin uity between the l eft side of t he ba ckground and the right side. Then we la b eled tw o 5 × 3 blo c ks t o b e in the same cl ass, one on ea ch side o f the background. As we intended, our metho d assigned the background of b ot h sides into the same clust er and thus i sola t ed the human fa ce with his t all hat fr om the ba ckground(Fig. 6(c)). Again, this was achiev ed simpl y by l ab eling two bl o cks in the ima g e, which cov ered ab out 3% o f all pi xels. Alternat ively , if we lab el ed a 5 × 5 blo c k in the hat to b e 1, and a 5 × 5 blo ck in the face to b e − 1, the resulta nt clustering will isol ate the hat from the rest of the im age ( Fig. 6(d)). 7.2 Explici t Pairwise C onstra ints: The Doubl e Mo on Dat aset W e further exami ne the behavior of our algorit hm on a synthetic data set using explicit constra ints tha t are derived from underlyi ng gr ound truth. W e clai m t hat our fo rmulation is a natura l ex t ension to spectr al clust ering. The question t o a sk t hen is whether the o utput of o ur alg orit h m conv erges to that of 20 Xiang W ang et al. −15 −10 −5 0 5 10 15 20 25 30 −10 −5 0 5 10 15 (a) A Double Mo on sample and its Ncut 0 10 20 30 40 50 60 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Constraints Adjusted Rand Index (b) The conv ergence of our algorithm Fig. 7 The conv ergence of our algorithm on 10 random samples of the Double Mo on distri- bution. sp ect ral cl ustering . More sp eciﬁ ca lly , consider the gr ound trut h parti tion deﬁned by p erform ing sp ectral clust ering on an ideal di stribut ion. W e draw an i mp erfect sample f r om t he distr ibutio n, on which sp ectr a l clustering is unable t o ﬁnd the ground t ruth part itio n. Then we p erform our a lgori thm on this i m p erfect sampl e. As more and more constra ints a r e provided, we wan t to know whether or not the partit ion found by o ur a lgor ithm would converge t o the ground trut h pa rtiti on. T o answer the q uestion, we used t he D o uble Mo on di stributi on. As shown in Fig. 1, sp ectra l clustering i s a ble to ﬁnd the two mo o ns when the sample is dense enough. In Fig. 7(a ), we genera ted an under-sampl ed instan ce of t he dist ributi on with 1 0 0 data p oints, on whi ch unconst r ained sp ectral clust ering could no longer ﬁnd the ground truth part itio n. Then we perfo rmed our algo rithm on this im- per fect sample, and com pared t he part itio n f o und by our a lgori thm to the gro und truth partiti on in terms of a djusted Rand index ( ARI, H ub ert and Ara bie ( 1985)). ARI indica tes how well a given partit ion confo r m to the ground t ruth: 0 means the gi ven par titi on is no b etter t ha n a random assignm ent; 1 means the given par- titi o n matches th e gr o und trut h exactl y . F or each random sampl e, we gener ated 50 rando m sets of const raints a nd recor ded the average A R I. W e rep eated t he pro cess o n 10 diﬀerent ra ndom samples o f the same size and rep orted t he result s in F i g. 7(b) . W e can see t hat o ur a lgor ithm co nsistently conv erg e to the gro und truth result a s more co nstrai nts ar e provided. N o tice that there i s perfo rmance drop when an ext reme small number of constraints are pro vided (l ess than 1 0), which i s exp ected b eca use such small num b er of constra ints are insuﬃ c i ent to hi nt a b etter pa rtiti on, and consequentially l ead to r andom p erturba tion to the results. As more constr a ints were pr ovided, the results were quickly sta biliz ed. T o illu s t rate the robust ness of the o ur a pproach, we created a Double Mo on sample with uni form ba ckground noise, as shown in Fig. 8. A lthoug h the sa mple is dense eno ugh (600 da ta instances in to tal) , spect ral clustering fails t o ﬁnd the correctl y identify the two mo ons, due to the inﬂuence of bac kground no ise (1 00 data i nstances). How ever, with 20 const raints, our al gori thm i s abl e to recover the tw o mo ons in spit e o f t he background noi se. On Constrained Sp ectral Cl ustering and Its A ppl i cations 21 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (a) Spectral Clustering −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (b) Constrained Sp ectral Clustering Fig. 8 The partition of a noisy Double Mo on sample. T able 2 The U C I b ench marks Iden tiﬁer #Instances #Attributes Hepatitis 80 19 Iris 100 4 Wine 130 13 Glass 214 9 Ionosphere 351 34 WDBC 569 30 7.3 Const raints from Partial Lab eli ng: Clust ering the UC I Benc hmarks Next we ev alu a te the p erform a nce o f our alg o rithm by clusterin g the UCI b ench- mark da tasets (A suncion a nd Newman (2 007)) wi th const raints deriv ed from gr o und truth lab eli ng. W e c ho se six diﬀeren t da tasets wi th cla s s lab el i nforma tion, na mely Hepa ti- tis, Iris, Wine, Glass, Io nosphere and Breast Cancer Wisconsin (Diagn o stic) . W e per formed 2 -wa y clusteri ng simply b y partit ioning the optimal cl uster indi cator according t o the sig n: p osi tive entries to one cluster and negati ve the o ther. W e remov ed the setosa cl ass fro m the Iri s data set, which is the cl ass that is known to b e well-separat ely from the other tw o. F or th e sam e reason we remov ed Cla ss 3 fr om the Wi ne dat a set, whi ch i s well-separated from the other two. W e also remov ed da ta insta nces with m issing v alues. The sta tisti cs of the dat a sets af ter prepro cessing are lis t ed in T able 2. F or each da ta set, we computed t he aﬃ nity m atri x using the RBF kernel. T o generate constr aints, we rando mly selected pai rs of no des tha t the unconst rained sp ect ral cluster ing w rongl y partit ioned, a nd ﬁl l in the corr ect rel a tion in Q accord- ing to gro und truth lab el s. The qua lity of the cl ustering result s was measur ed by adjusted Rand index. Si nce t h e constraints are guaranteed to b e co rrect, we set the threshold β such t hat there will b e only one feasible eigenv ect or, i.e. the one that b est conform s t o the co nstraint ma trix Q . In addi tion to comparin g our algor ithm (CSP) to unconstra ined sp ectral cl us- tering, we implem ented two state-of -the-art techniques: – Spect ral L earning (S L ) (Ka mv ar et al (20 03)) m o diﬁes the aﬃ nity matri x of the ori ginal gra ph direct l y: A ij is set to 1 if there i s a ML b etween no de i and j , 0 fo r C L. 22 Xiang W ang et al. – Semi-Sup ervised Kernel K -means (SSKK) (Kuli s et a l ( 2005)) adds p ena lties to the aﬃ n i ty mat rix based on the given co nstrai nts, and then p erforms kernel K -means on the new kernel to ﬁnd the part itio n. W e a lso tr i ed the a lgor i thm prop osed by Y u and Shi (200 1, 20 0 4), which en- co des partia l gro uping informa tion as a pro jecti o n matr ix, the subspace t rick pro- po sed by De Bie et al (20 0 4), and the aﬃ nity pro pagat ion algor ithm pro p osed by Lu and Carrei r a-Perpi˜ n´ a n (2 008). Howev er, we were not able to use t hese algo- rithms to achiev e b ett er resul ts than SL and SSKK, hence t heir resul ts are not rep orted. Xu et al ( 2005) propo sed a v aria t ion of SL, where the const r aints ar e enco ded in the sam e w ay , but inst ead of the normal ized gra ph Laplacia n, they suggested to use the r a ndom walk ma trix. W e p erform ed their a l gori thm on the UCI data sets, which pro d uc ed l argel y identical result s t o that of SL. W e repo rt the adjust ed R and index agai nst the number of constra ints used (rangi ng f r om 5 0 to 500) so that we can see how the qua lity o f clust ering v ari es when more constr a ints a re provided. A t each stop, we randomly g enerated 100 sets of const raints. The mean, m a ximum and mini mum AR I of the 1 00 r andom trial s are rep ort ed in Fig. 9. W e al so rep or t the ratio of the constrai nts tha t were satisﬁed by the const rained p a rtit ion in Fig. 10. The observ ations are: – Across al l six datasets, our algorit hm i s able t o eﬀectiv ely utili ze the con- straints and improv e ov er unconstra ined sp ectra l cl ustering (Baseline). On t he one h a nd, our alg orit hm can quickly im p r ov e the results wit h a small amount of co nstrai nts. On the ot her hand, as mor e const raints are provided, the p er- forma nc e of our al g orit hm consi stently increas es a nd co nverges t o the g r ound truth pa rtit ion (Fig . 9) . – Our al gori thm outp erfo rms the comp eti t ors by a large m argi n i n terms of ARI (Fig . 9). Since we hav e control ov er the l ow er -b o unding threshold α , our algo r ithm is able to sat i sfy al most al l the given constra ints (Fig. 10). – The p erform ance o f our al gorit hm has si gniﬁca ntly sma ller v ari ance ov er di f- ferent random constra int sets than its comp etito rs (Fig . 9 and 10), and the v ariance quickly diminishes a s more cons t raints a re provided. This suggest s that our a lgor ithm would p erfo rm more consistently in pra ctice. – Our algor ithm perfo rms esp ecial ly well on sparse gra phs, i.e. Fig . 9 (e)(f ) , where the com p et itor s suﬀer. The reas o n is t hat w hen t he gra ph is t o o spa rse, it im- plies many “ free” c ut s tha t are equa l ly go od t o unco nstrai n ed s p ect ral cl uster- ing. Ev en a f ter intro ducing a sma ll number of const raints, t he mo diﬁed g raph remains to o spa rse for SL and SSKK, which are unable to identify the ground truth part itio n. In contrast, these free cuts a re not equiv alent when judged by the co ns t raint matrix Q of o ur al gori thm, whi ch can easi l y identify the one cut that min i mizes v T ¯ Q v . As a resul t , our a lgori thm out p erform s SL and SSKK signiﬁca ntly in thi s scena rio. 7.4 Const raints from Al ternat ive Metrics: The R euters Multili ngual Data set W e test ou r algori thm using soft co nstrai nts derived from al ternat ive metr i cs o f the same set of data insta nces. W e used the Reuters Mul t iling ual dataset, ﬁrst i ntroduced by Ami ni et a l (200 9). Each ti me we randomly sampled 1000 do cuments which were origina lly On Constrained Sp ectral Cl ustering and Its A ppl i cations 23 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (a) Hepatitis 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (b) Iris 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (c) Wine 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (d) Glass 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (e) Ionosphere 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Adjusted Rand Index Baseline CSP SL SSKK (f ) Breast Cancer Fig. 9 The p erformance of our algori thm (CSP) on six UCI dataset s, with comparison to un- constrained sp ectral clustering (Baseline) and the Sp ectral Learning al gori thm (SL). Adjusted Rand index ov er 100 random trials is r eported (mean, min, and max). writt en in o ne langua ge and then tra nslated into fo ur others, resp ectively . The stati stics of the dataset is l isted in T abl e 3. These do cum ents came wi t h ground truth la b els that categori ze t hem into six topi cs ( K = 6). W e cons t ructed one graph based on t he or igina l la n g uage, and anot her g raph based on the transl a - tion. The aﬃnity ma trix was the cosine simil arity b etween the t f -idf v ect ors of tw o do cuments. Then we used o ne of the two graphs as the co nstraint matrix Q , whose entries ca n then b e viewed a s soft ML constra ints. W e enfor ce this con- straint m atri x to t he ot her g raph to see if it can hel p im p r ov e the clust ering. W e 24 Xiang W ang et al. 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (a) Hepatitis 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (b) Iris 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (c) Wine 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (d) Glass 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (e) Ionosphere 0 100 200 300 400 500 600 0 0.2 0.4 0.6 0.8 1 Number of Constraints Ratio of Constraints Satisfied CSP SL SSKK (f ) Breast Cancer Fig. 10 The ratio of constraints that are actually satisﬁed. did no t compa re o ur alg orithm to exi sting techniques b ecause t hey are unable to incorp or ate soft const r aints. As shown in F ig. 11, unconstra ined sp ectral clust ering p er forms bet ter on t he origi nal version than the transla ted versions, which is not surprisi ng. If we use the origi nal v ersion as the co nstrai nts and enfor ce tha t onto a tra nslated version using our algo rithm , we yield a constr a ined cl ustering that is no t only b et ter than the unconstra i ned cl ustering on the transl ated version, but als o even b et t er t han the unconstra i ned clustering on t he orig inal version. T his indica tes that our al gori thm is not merely a tradeoﬀ b etween the ori ginal graph a nd the g iven constrai nts. On Constrained Sp ectral Cl ustering and Its A ppl i cations 25 T able 3 The Reuters Multili ngual dataset Language #D o cuments #W or ds English 2000 21,531 F rench 2000 24,893 German 2000 34,279 Italian 2000 15,506 Spanish 2000 11,547 French German Italian Spanish 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Adjusted Rand Index EN Translation Trans. → EN EN → Trans. (a) English Do cument s and T ranslations English German Italian Spanish 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Adjusted Rand Index FR Translation Trans. → FR FR → Trans. (b) F renc h Do cumen ts and T ranslations Fig. 11 The p erform ance of our algorithm on Reuters Multil ingual dataset. Instead it i s able t o i ntegrate the k nowledg e from t h e con st raints into t he ori ginal graph and achiev e a b et t er partit ion. 7.5 T ra ns f er o f Knowledg e: Resting -State fMRI Ana lysis Finall y we a pply o ur al gori thm t o transf er learning on t he resti ng-stat e fMR I data . An fMRI sca n of a p erso n consi sts o f a sequence of 3 D images over time. W e can construct a graph from a given scan such that a no de i n the graph corresp o nd s to a vo x el in the i m age, a nd the edge weigh t b etw een two no des is (the absolut e v alue of ) the correl atio n between the two ti me sequences asso cia ted wi th the tw o vo xel s. Previous work has shown that by apply ing sp ect r al cl u s t ering to the resting - state fMRI we can ﬁnd th e substructures in the br ain that ar e p erio dic a lly and simultaneously act iv at ed o ver time in t he resting state, w hich may i ndicate a netw o rk asso ciated with certain functi ons (v a n den Heuvel et al (20 08)). One o f t h e challenges o f rest i ng-sta te fMRI a nalysis is inst abili ty . Noise can b e easily introduced into th e scan result, e.g. t he sub ject m ov ed his/her hea d dur i ng the sca n, the sub ject was not at resting sta t e (act ively thinking a b out things during the scan), etc. Co nsequently , the result of sp ect ral cl ustering b ecomes inst able. If we appl y sp ectral clustering to tw o fMRI scans o f the same p erso n on tw o di ﬀerent days, the no rmal ized min-cut s on the tw o diﬀerent scans are so diﬀerent that t hey provide lit tle insi ght i nto the brain activ ity of the sub ject (Fig. 12(a) and (b)). T o ov erco me t his pro blem, we use our f o rmulation to tr ansfer knowledge fro m Sca n 1 to Scan 2 and get a co nstrai ned cut, as shown in Fig . 12(c). Thi s cut represents what the two scans a gree on. The pa ttern captu r ed b y Fig. 12( c) is act ually the default m o de network (DMN), which is t he netw o rk t hat is p erio di cally activ a ted at resting state (Fig. 12(d) shows the idealized DMN as sp eciﬁed by do main exp ert s). T o further illust rate the practi ca bility of our a pproach, we transf er the ideal ized DMN in Fig. 1 2(d) to a set of fMRI scans of eld er ly sub jects. The dataset was 26 Xiang W ang et al. 10 20 30 40 50 60 70 10 20 30 40 50 60 (a) Ncut of Scan 1 10 20 30 40 50 60 70 10 20 30 40 50 60 (b) Ncut of Scan 2 10 20 30 40 50 60 70 10 20 30 40 50 60 (c) Constrained cut by transferr ing Scan 1 to 2 10 20 30 40 50 60 70 10 20 30 40 50 60 (d) An idealized default mode netw ork Fig. 12 T ransfer learning on fM RI scans. collect ed a nd p r o cessed wi thin the research program of the Universit y of C alifo rnia at Davis Alzheimer ’ s Disea s e Center ( UCD ADC). The sub jects were categ orized into two gro ups: those diagnosed wit h cognitive syndr o me (20 indivi duals) and those witho ut cognit ive syndro me (11 i ndividua ls). F or each indiv idual scan, we enco de the i dealized DMN into a constra int m atrix (using the RBF kernel), and enforce the co nstrai nts onto the origina l fMRI scan. W e t hen compute t he cost of the const rained cut tha t is t he most simi lar to the DMN. If the cost of the constrai ned cut i s hi gh, i t mea ns there is great di sagreement between t he o rigin a l graph and t he gi ven co nstrai nts ( the ideal ized DMN), and vice versa. In other words, the cost of the constr ained cut can b e interpreted a s the cost of transferri ng the DMN to the parti cular f MRI scan. In Fig. 1 3, we plot the co sts of tr ansferri n g t he DMN t o bo th sub ject gro ups. W e can clear l y see that t he co sts of tra nsferring the DMN to p eople without cognit ive syndrome tend to b e low er tha n to p eopl e w ith cogni t ive syndrome. Thi s conform s well t o the observ ation made i n a recent study that the DMN i s o ften disrupted for p eople with the Al zheimer’ s disease (Buckner et al (2 0 08)). 8 Conclusion In this work we prop osed a pri ncipled and ﬂexible fra mework for constra ined sp ec- tral clust ering tha t can incorp o rate large a mounts of bo th hard and soft co n- On Constrained Sp ectral Cl ustering and Its A ppl i cations 27 0 5 10 15 20 25 30 35 0.8 0.85 0.9 0.95 Cost of Transferring the DMN individuals without cognitive syndrome individuals with cognitive syndrome Fig. 13 The costs of transferri ng the i dealized default mo de netw ork to the fMR I scans of t wo groups of elderly individuals. straints. The ﬂexibili ty of our framework lends i tself t o the use of all typ es o f side inf ormat ion: pairwise constra ints, part ial lab el ing, alterna tive metri cs, and transfer learni n g . Our f ormulatio n is a nat ural ex tension to unco n st rained sp ec- tral clust ering and can b e so lved eﬃciently usi ng gener alized eig endecomp osi tion. W e demonstr a ted the eﬀectiveness o f o ur a pproach on a v ariety of data sets: the synthetic Two-Mo on dataset, image segmentatio n, the UCI benchmarks, the mul- tili ng ual Reuters do cuments, and resti ng-stat e fMRI scans. The compar ison to existing techniques v al idated the a dv antage of o ur approach. 9 Ackno w l edgments W e grateful ly ackno w ledge suppo rt o f this research via ONR gra nts N00 014-0 9-1- 0712 Auto mated Discov ery and Expl anati on o f Ev ent Behavior, N0001 4-11-1 -0108 Guided Learning in Dynamic Environments and NSF Grant N SF I IS-080 1528 Knowledge Enhanced Cl u st ering. References Amini MR, Usunier N, Go utte C (2009 ) Learni ng from m ultipl e part iall y observed views - an applica tion to multili ngual text catego rizati on. In: Adv ances in Neural Informat ion Pro cessi n g Syst ems 22 (NIPS 2009 ), pp 28 – 36 Asuncion A, Newma n D (2007) UCI machine learning repo s i tory . U RL http://www .ics.uci.e du/ ~ mlearn/MLR epository. html Basu S, Davidson I, W agst aﬀ K (eds) ( 2 008) C o nstrai ned Cluster ing: Adv ances in Algor ithms, Theory , and App l icati ons. Cha p m an & Ha ll/C RC Buc kner RL , Andrew s-Hanna JR, Schacter DL (20 08) The brain’s defaul t netw ork. Annals of the New Y ork Academy of Sci ences 1124 (1): 1–38 28 Xiang W ang et al. Colema n T, Saunder son J, Wirth A (2 008) Sp ectra l clust ering wit h inconsist ent advice. In: Pro ceedings of the 25th Internati onal Conference on Machine Lea r n- ing (ICML 2008 ), pp 1 52–1 59 Davidson I, Ravi SS (2 006) Identifying and generat ing easy sets of const raints for clustering. In: Pro ceedings of the 21st Natio n a l Conference on Artiﬁcia l In telli gence (AA AI 2 006) , pp 336– 341 Davidson I, Ra vi SS (2 007a ) The com plexity of no n-hierarchical clust er ing w ith instance and cluster level con s t raints. Data Min Knowl Discov 14 (1): 25–6 1 Davidson I, Ravi SS (2007 b) Intractabil ity a nd clustering w ith constrai nts. In: Pro ceedings o f the 2 4th Internatio nal C o nference on Machine Learni ng (ICML 2007 ) , pp 201 –208 Davidson I, Ravi SS, Ester M ( 2 007) Eﬃci ent incremental co nstrai ned cluster- ing. In: Pro ceedings o f the 13th ACM SIGKDD Internatio nal Conference on Knowledge Discovery a nd Data Mini ng (KDD 2007) , pp 24 0–24 9 De Bie T, Suykens JAK, De Mo or B (2004) Learning from genera l l ab el con- straints. In: St ructural , Syntactic, a nd Stat istica l P attern Recognit ion, J oint IAPR International W o rkshops (SSPR/SPR 2004 ), pp 6 71–6 79 Drineas P , F ri eze AM, Kanna n R, V em pa la S, Vinay V (2004 ) C lusteri n g large graphs via the singul a r v al ue decomp osi tion. Machine Learning 56(1 -3):9 –33 Gu Q, Li Z, Han J ( 2011 ) Learning a kernel fo r multi-task clusteri ng. In: Pro ceed- ings of the 25th AAAI Confer ence o n Artiﬁci al Intelligence ( AAAI 2011) v an den Heuvel M, Mandl R , Hulsho ﬀ Pol H (200 8) Norm alized cut group clus- tering of resting -stat e fm ri dat a. PL oS ONE 3(4 ):e20 01 Horn R, Johnson C (1 990) Mat rix a nalysi s. Ca mbridge U niversity Press Hub ert L, A rabie P (1985 ) C ompari ng part itio ns. Jour na l o f Cl a ssiﬁcat ion 2: 193– 218 Ji X, Xu W (20 06) Do cument clusteri ng with pri or knowledge. In: Pro ceedings of the 29 t h Annual International ACM SIGIR Co nference on Research a n d Devel- opment in Inform a tion Ret riev al (SIGIR 2006 ), pp 4 05–4 12 Kamv ar SD, Klein D, Manni ng CD (20 03) Sp ectral lea rning. In: Pro ceedi ngs of the 1 8th Internationa l J oint C o nference on Art iﬁcial Intelligence (IJCA I 2003) , pp 561– 566 Kuhn H, T uc ker A (19 82) No nl inear programmi ng. ACM SIGMAP Bull etin pp 6–18 Kulis B, Basu S, Dhil lon IS, Mo oney RJ (2 005) Semi-sup ervi sed graph clusteri ng: a kernel approach. In: Proceedi ngs of the 22nd Internati onal Co nf erence on Machine Learni n g (ICML 2 0 05), p p 4 57–4 64 Li Z, Liu J , T ang X ( 2009 ) Constr a ined clu s t ering via sp ectral regul arizat ion. In: IEEE Conference o n Comput er Vision and Pattern Recognit ion (CV PR 2009 ), pp 421– 428 Lu Z, Ca rreira -Pe rpi ˜ n´ an M ´ A (2 008) Constr ained sp ect ral cl ustering thro ugh aﬃn- ity propa gati on. In: IEEE Confer ence o n Comput er Visi on and P attern Recog - nitio n (CVPR 200 8) von Luxburg U (2 007) A tutori al on sp ect ral cl ustering. St atist ics and Computi ng 17(4 ) :395 –416 Martin D, F owlkes C, T al D, Ma l ik J (20 01) A data ba se of human segmented natural ima ges and its appl icati on to ev alua ting segmentation al gori t hms a nd measuring ecolo gical sta tisti cs. In: Pro ceedings of the 8t h In ternati onal C onfer- ence On Comput er Visio n ( IC CV 20 01), vol 2, pp 416 – 423 On Constrained Sp ectral Cl ustering and Its A ppl i cations 29 Ng A Y, Jor dan MI, W eiss Y (2001 ) On sp ectral clustering : Ana lysis a nd a n algo - rithm. In: Adv ances in Neura l Info rmat ion Pro cessi ng Sy st ems 22 (NIPS 20 0 1), pp 849– 856 P an SJ, Y ang Q (2 010) A survey o n transf er lear n i ng. IEEE T ra ns Knowl Da ta Eng 22(1 0):1 345– 1359 Shi J, Mali k J ( 2000 ) No rmal ized cuts and ima ge segmentatio n. IEEE T rans Pat- tern Anal Mach Intell 2 2(8) :888 –905 W ang F, Li T, Zhang C (200 8 ) Semi -sup ervised clust ering via matrix fact orizat ion. In: Pro ceedings of the 8th SIAM Internationa l Conference o n Da ta Mini ng (SDM 2008 ) , pp 1–1 2 W ang F , Di n g CHQ, Li T (20 09) In tegrat ed KL (K- m eans - Lapla cian) clustering : A new clustering approa ch by co mbining attri bute data and pair wise rel atio ns. In: Pro ceedings of the 9th SIAM Internationa l Conference o n Da ta Mini ng (SDM 2009 ) , pp 38– 48 W ang X, Davidson I ( 2010 ) Flexi ble constrai ned sp ectra l clustering . In: Pr o ceed- ings of t he 16th ACM SIGKDD Internationa l Co nference on Knowledge Discov- ery and Data Mining (KDD 201 0), pp 5 6 3–5 7 2 White S, Smyth P (200 5) A sp ect ral clusteri n g appro ach to ﬁnding co mmunities in graph. In: Pro ceedi n g s of the 5th SIAM Internationa l Co nference on Data Mining (SDM 200 5), pp 7 6 –84 Xu Q, desJa rdins M, W agst a ﬀ K (2 0 05) C onstra ined sp ectra l cl ustering under a lo cal proximi ty structure assum ption. In: Pro ceedings of the 18t h In ternati onal Florida Arti ﬁcial Intelligence Research So ciety Confer ence, pp 866–8 67 Y u SX , Shi J (2001) Groupi ng with bia s. In: Adv ances in Neural Informatio n Pro cessing System s 22 (NIPS 2 001) , pp 1327 –133 4 Y u SX, Shi J (2004 ) Segmentation given partia l grouping const raints. IEEE T rans P attern Anal Mach Intell 26(2) :173 –183

On Constrained Spectral Clustering and Its Applications

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment