Horseshoes in multidimensional scaling and local kernel methods
Classical multidimensional scaling (MDS) is a method for visualizing high-dimensional point clouds by mapping to low-dimensional Euclidean space. This mapping is defined in terms of eigenfunctions of a matrix of interpoint dissimilarities. In this pa…
Authors: Persi Diaconis, Sharad Goel, Susan Holmes
The Annals of Applie d Statistics 2008, V ol. 2, No. 3, 777–80 7 DOI: 10.1214 /08-A OAS165 c Institute of Mathematical Statistics , 2008 HORSESHOES IN MUL TIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS By Persi Diaconis, 1 Sharad Goel 2 and Susan Holmes 3 Stanfor d University, Y aho o! R ese ar ch and Stanfor d University Classi cal multi dimensional sca ling (MDS) is a meth od for visual- izing high-dimensional p oint clo uds by mapping to lo w-dimensional Euclidean space. This mapping is defin ed in terms of eigenfunctions of a matri x of in terp oint dis similaritie s. In this paper we analyze in detail m ultidimensional scaling applied to a s p ecific dataset: the 2 005 United States House of Representativ es roll call vo tes. Certain MDS and kernel pro jections output “horseshoes” that are characteristic of dimensionalit y redu ction techniques. W e show that, in general, a la- tent ordering of the data give s rise to these patterns when one only has lo c al information. That is, when only the interpoint distances for nearby p oints are kno wn accurately . Our results pro vide a rigorous set of results and insigh t into manifold learning in the sp ecial case where the manifold is a curve. 1. In tro d uction. Classical m ultidimensional scaling is a widely used tec h- nique for dimensionalit y reduction in complex data sets, a cen tral problem in pattern recognition and mac hine learning. In th is pap er w e carefully analyze the output of MDS applied to the 2005 United S tates House of Represen- tativ es roll call v otes [Office of the Clerk—U.S. House of Representa tiv es ( 2005 )]. T he resu lts w e find seem stable o ver recent y ears. T he resultan t 3-dimensional mapp ing of legislators sho ws “horsesho es” that are c haracter- istic of a num b er of d imensionalit y reduction tec h niques, including prin cipal comp onen ts analysis and corresp ondence analysis. These patterns are heuris- tically attributed to a laten t ordering of th e data, for example, the ranking of p oliticians within a left-rig h t sp ectrum. O ur w ork lends insight into this heuristic, and w e presen t a rigorous analysis of the “h ors esho e phenomenon.” Received June 2007; revised Janua ry 2008. 1 This w ork was part o f a pro ject funded by the F rench A NR un der a Chaire d’Excellence at the Universi ty of Nice Sophia-Antip olis. 2 Supp orted in part by D ARP A Gran t HR 0011-04-1-0025. 3 Supp orted in part by NSF Grant DMS-02-41246. Key wor ds and phr ases. Horseshoes, multidimensional scaling, dimensionalit y re duc- tion, p rincipal comp onents analysis, kernel method s. This is an elec tr onic reprint of the original article published by the Institute of Mathematica l Sta tistics in The Annals of Applie d Statistics , 2008, V ol. 2 , No. 3, 7 77–80 7 . This repr in t differs from the or iginal in paginatio n and typographic detail. 1 2 P . DIACONIS, S. GOEL AND S. HOLMES Seriation in arc haeology w as the main motiv ation b ehind D. Kendall’s disco very of this phenomenon [ Kendall ( 1 970 )]. Ordination tec hniques are part of the ec ologist s’ standard toolb o x [ter Braak ( 1985 , 1987 ), W arten b erg, F erson and Rohlf ( 1987 )]. There are hundreds of examples of horsesho es o ccurring in rea l statistic al app lications. F or instance, Dufrene and Legendre ( 1991 ) foun d that when they analyze d the a v ailable p oten tial ecological factors s cored in the form of presence/absence in 10 km side squares in Belgium there w as a strong underlying gradien t in th e data set whic h in d uced “an extrao rdinary h orsesh o e effect.” T his gradien t follo we d closely the altitude co mp onent. Mik e P almer has a wo nderful “ordination w ebsite” where he sho w s an example of a con tingency table crossing sp ecies coun ts in different lo cations around Bo omer Lak e [P almer ( 2008 )]. He sh o ws a horsesho e effect wh ere the gradien t is the distance to the wa ter (P almer). Psyc h ologist s encount ered the same ph en omenon and call it the Guttman ef- fect after Guttman ( 1968 ). S tandard texts s uc h as Mardia, Ken t and Bibby ( 1979 ), page 412, claim horsesho es result from ordered d ata in wh ic h only lo cal in terp oin t distances can b e estimate d accurate ly . T he mathematical analysis we pro vide sh o ws that b y using the exp onential k ernel, any dis- tance can be d o wnw eight ed for p oin ts that are far apart and also pro vide suc h horsesho es. Method s for acco un ting for [ ter Braak and Prenti ce ( 1988 )], or remo ving gradien ts [ Hill and Gauc h ( 1980 )], that is, detrending the axes, are standard in the analysis of MDS w ith c hisquare distances, kn o wn as corresp ondence analysis. Some mathematic al insight s in to the horsesho e phenomenon ha v e b een prop osed [ P o dani and Miklos ( 2002 ), Iwatsub o ( 1984 )]. The pap er is structured as follo ws: In Section 1.1 we describ e our d ata set and b riefly discuss the output of MDS applied to these data. Section 1.2 describ es the MDS metho d in detail. Section 2 states our main assu mption— that legislators can b e isometrically mapp ed in to an interv al—and presen ts a s imple mo d el for v oting that is consisten t with this metric requ ir ement. In Section 3 w e analyze the mod el and present the main r esults of the p ap er. Section 4 connects the mo d el bac k to the data. The pro ofs of the theoretic al results from Sectio n 3 are p resen ted in the App endix . 1.1. The voting data. W e apply m ultidimensional scaling to data gener- ated b y mem b ers of the 2005 Un ited States House of Representa tiv es, w ith similarit y b et wee n legislators defined via roll call vote s (Office of the Clerk— U.S. House of Represent ativ es). A fu ll House consists of 435 memb ers , and in 2005 there were 671 roll calls. The fi r st t wo roll calls we re a call of the House b y States and the election of the Sp eak er, and so were excluded from our analysis. Hence, the data can b e ordered in to a 435 × 669 matrix D = ( d ij ) with d ij ∈ { 1 / 2 , − 1 / 2 , 0 } indicating, resp ectiv ely , a v ote of “y ea,” “na y ,” or HORSESHO ES 3 “not v oting” b y Represen tativ e i on roll cal l j . (T ec hnically , a representa - tiv e can vo te “present,” but for pu rp oses of our analysis this wa s treated as equiv alen t to “not v oting.”) W e further restricted our analysis to the 40 1 Represen tativ es that vo ted on at least 90% of the roll calls ( 220 Repub li- cans, 180 Demo crats and 1 Indep en d en t), leading to a 401 × 669 matrix V of v oting data. This step remov ed, for example, the Sp eak er of House Dennis Hastert (R-IL) w ho b y custom vo tes only when his vote w ould b e decisiv e, and Rob ert T. Matsui (D-CA) who p assed a w a y at th e start the term. As a first step, we define an empirical distance b et wee n legisla tors as ˆ d ( l i , l j ) = 1 669 669 X k =1 | v ik − v j k | . (1.1) Roughly , ˆ d ( l i , l j ) is the p ercen tage of roll ca lls on which legislators l i and l j disagreed. This in terpretation w ould b e exact if not for the p ossibilit y of “not v oting.” In Section 2 we giv e some theoretical justification for this c hoice of d istance, but it is nonetheless a natural metric on these data. No w, it is reasonable that the empirical distance ab o ve captures the sim- ilarit y of nearb y legislat ors. T o reflect the fact that ˆ d is most meaningfu l at small scales, w e define the pro ximit y P ( i, j ) = 1 − exp( − ˆ d ( l i , l j )) . Then P ( i, j ) ≈ ˆ d ( l i , l j ) for d ( l i , l j ) ≪ 1 an d P ( i, j ) is n ot as sensitiv e to noise around relativ ely large v alues of ˆ d ( l i , l j ). Th is lo calization is a common fea- ture of dimensionalit y reduction algo rithms, for example, eigenmap [ Niy ogi Fig. 1. 3 -Dimensional MDS output of le gislators b ase d on the 2005 U. S. House r ol l c al l votes. Color has b e en adde d to indic ate the p arty affiliation of e ach Re pr esentative. 4 P . DIACONIS, S. GOEL AND S. HOLMES ( 2003 )], isomap [ T enenbaum, de Silv a and Langford ( 2000 )], local linear em- b eddin g [ Ro weis and Saul ( 2000 )] and k ernel PCA [ Sch¨ olk opf, Smola and Mu ller ( 1998 )]. W e apply MDS by double cent ering the squared distances built fr om the dissimilarit y matrix P and p lotting the first th r ee eig enfunctions w eigh ted b y their eigen v alues (see Section 1.2 for d etails). Figure 1 sho ws the results of the 3-dimensional MDS mapping. The most s tr iking feature of the mapping is that the data separate in to “t win horseshoes.” W e ha v e added col or to indicate the p olitica l party affiliation of eac h Represent ativ e (blu e f or Demo- crat, red for Republican and green for the lone indep endent— Rep. Bernie Sanders of V ermon t). The output from MDS is qualitativ ely similar to that obtained from other dimensionalit y reduction tec hniques, s uc h as pr incipal comp onen ts analysis applied directly to the voti ng m atrix V . In Sections 2 and 3 we bu ild and analyze a mo del for the data in an effort to und erstand and in terpr et these pictures. Roughly , our th eory pre- dicts that the Democrats, f or example, are ordered along the blue curve in corresp ondence to their p olitic al ideology , that is, ho w far they lean to the left. In Section 4 w e discuss connections b etw een the theory and the data. I n particular, w e explain why in the d ata legislat ors at the p olitical extremes are not qu ite at the tips of the p ro jected curv es, bu t rather are p ositioned sligh tly to ward the cen ter. 1.2. Multidimensional sc aling. Multidimensional Scaling (MDS) is a wid e- ly u sed tec h nique for app ro ximating the in terp oin t d istances, or dissimilari- ties, of p oin ts in a high-dimensional space by act ual distances b et wee n p oin ts in a lo w-dimensional Euclidean space. See Y oun g and Householder ( 193 8 ) and T orgerson ( 1952 ) for early , clear references, Shep ard ( 1962 ) for ex- tensions from distances to r ank ed similarities, and Mardia, Kent and Bibby ( 1979 ), Co x and Co x ( 20 00 ) and Borg and Gro en en ( 1997 ) for useful text- b o ok account s. In our setting, applying the usual cen tering op erations of MDS to the pro ximities we us e as d ata lead to surprising numerical coinci- dences: the eigenfun ctions of the cen tered matrices are remark ably close to the eigenfunctions of the original p ro ximity matrix. The dev elopment b elo w unrav els this find ing, and d escrib es the m ultidimensional scal ing pro cedure in detail. Euclide an p oints : If x 1 , x 2 , . . . , x n ∈ R p , let d i,j = q ( x 1 i − x 1 j ) 2 + · · · + ( x p i − x p j ) 2 b e the interp oin t distance matrix. Schoenb erg [ Sc h o en b erg ( 19 35 )] c harac- terized distance matrices and ga ve an algorithmic solution for finding the p oint s giv en the distances (see b elo w). Alb ouy ( 2004 ) discusses the history of this problem, tracing it b ac k to Borc h ard t ( 1866 ). Of course, the p oin ts HORSESHO ES 5 can only b e reconstructed up to translati on and rotation, thus, w e assume P n i =1 x k i = 0 for all k . T o describ e Sc h o en b erg’s pro cedure, first organize the u nkno wn p oint s in to a n × p m atrix X and consider the matrix of dot pr o ducts S = X X T , that is, S ij = x i x T j . Then the sp ectral theorem for symmetric matrices yields S = U Λ U T for orthogonal U and d iagonal Λ . Thus, a set of n vec tors w hic h yield S is giv en by ˜ X = U Λ 1 / 2 . Of course, w e can only retriev e X up to an orthonormal transformation. This reduces the problem to finding the dot pro du ct matrix S from the int erp oint distances. F or th is, observ e d 2 i,j = ( x i − x j )( x i − x j ) T = x i x T i + x j x T j − 2 x i x T j or D 2 = s 1 T + 1 s T − 2 S, (1.2) where D 2 is the n × n m atrix of squ ared distances, s is the n × 1 v ector of the diagonal en tries of S , and 1 is the n × 1 v ector of ones. T he m atrix S can b e obtained by double c entering D 2 : S = − 1 2 H D 2 H , H = I − 1 n 11 T . (1.3) T o see this, first note that, for any matrix A , H AH cente rs the r o ws and columns to h a ve mean 0. Consequ en tly , H s 1 T H = H 1 s T H = 0 since the ro w s of s 1 T and the columns of 1 s T are constant. Pre- and p ost-m ultiplying ( 1.2 ) by H , w e ha ve H D 2 H = − 2 H S H . Since the x ’s were c hosen as cen tered, X T 1 = 0, the row su ms of S satisfy X j x i x T j = x i X j x j ! T = 0 and so S = − 1 2 H D 2 H as claimed. In summary , giv en an n × n matrix of interpoint distances, one can solv e for p oin ts ac hieving these distances by the follo wing: 1. Double cen tering the in terp oin t distance squared matrix: S = − 1 2 H D 2 H . 2. Diago nalizing S : S = U Λ U T . 3. Extracting ˜ X : ˜ X = U Λ 1 / 2 . Appr oximate distanc e matric es : The analysis ab o v e assumes that one starts with p oin ts x 1 , x 2 , . . . , x n in a p -dimensional Euclidean space. W e ma y w ant to find an embedd ing x i = ⇒ y i in a space of dimension k < p that preserv es the in terp oint distances as closely as p ossible. Assume that S = U Λ U T is suc h that the diagonal en tries of Λ are decreasing. Set Y k to b e 6 P . DIACONIS, S. GOEL AND S. HOLMES the matrix obtained by taking the fi rst k column s of the U and scaling them so that their squared norms are equal to the eigen v alues Λ k . In particular, this pro vides the first k columns of X ab o v e and solv es the minimization problem min y i ∈ R k X i,j ( k x i − x j k 2 2 − k y i − y j k 2 2 ) . (1.4) Y oun g and Householder ( 1938 ) sh o wed that this min imization ca n b e real- ized a s an eigen v alue problem; see the p ro of in this con text in Mardia, Kent and Bibby ( 1979 ), page 407. In app licatio ns, an observed ma- trix D is often not based on Eu clidean distances (but ma y rep r esen t “dis- similarities,” or just the d ifference of ranks). Then, the MDS solution is a heuristic for fi nding p oin ts in a Euclidean space whose interpoint distances appro ximate the orders of the dissimilarities D . This is called nonm etric MDS [ Shepard ( 1962 )]. Kernel metho ds : MDS con v erts similarities in to in ner pr o ducts, whereas mo dern kernel metho ds [ Sch¨ olk opf, Smola and Muller ( 1998 )] sta rt with a giv en matrix of inner pro ducts. Williams ( 2000 ) p oin ted out that Kernel PCA [ Sc h ¨ olk opf, Smola and Mu ller ( 199 8 )] is equiv alen t to metric MDS in feature space w hen the k ernel fun ction is c h osen isotropic, that is, the k ern el K ( x, y ) only dep ends on the norm k x − y k . The kernels we fo cus on in this pap er h a ve that prop erty . W e w ill sho w a d ecomp osition of the horsesho e phenomenon for on e particular isotropic kernel, the one d efined b y the kernel function k ( x i , x j ) = exp( − θ ( x i − x j ) ′ ( x i − x j )). R elating th e eigenfunctions of S to those o f D 2 : In practice, it is easier to thin k ab out the eigenfunctions of the squared distances matrix D 2 rather than the recen tered matrix S = − 1 2 H D 2 H . Observe that if v is any v ector suc h that 1 T v = 0 (i.e. , the en tries of v sum to 0), then H v = I − 1 n 11 T v = v . No w, sup p ose w is an eigenfunction of D 2 with eigen v alue λ , and let ¯ w = 1 n n X i =1 w i ! 1 b e the constan t v ector whose ent ries are the mean of w . Then 1 T ( w − ¯ w ) = 0 and S ( w − ¯ w ) = − 1 2 H D 2 H ( w − ¯ w ) = − 1 2 H D 2 ( w − ¯ w ) HORSESHO ES 7 = − 1 2 H ( λw − λ ¯ w + λ ¯ w − D 2 ¯ w ) = − λ 2 ( w − ¯ w ) + 1 2 1 n n X i =1 w i ! r 1 − ¯ r . . . r n − ¯ r , where r i = P n j =1 ( D 2 ) ij and ¯ r = (1 /n ) P n i =1 r i . In short, if w is an eigenfunc- tion of D 2 and ¯ w = 0, then w is also an eigenfunction of S . By con tin u it y , if ¯ w ≈ 0 or r i ≈ ¯ r , then w − ¯ w is an appr oximate eig enfunction of S . In our set- ting, it turn s out that th e matrix D 2 has approximat ely constan t ro w sum s (so r i ≈ ¯ r ), and its eigenfunctions satisfy ¯ w ≈ 0 (in fact, some satisfy ¯ w = 0). Consequen tly , the eigenfunctions of th e cent ered and uncent ered matrix are appro ximately the same in our case. 2. A mo del for the data. W e b egin with a brief r eview of mo d els for this type of data. In spatial mo d els of roll call v oting, legisl ators and p oli- cies are represente d by p oin ts in a lo w-dimensional Euclidean space with v otes decided by maximizing a deterministic or sto c h astic utilit y function (eac h legi slator choosing the p olicy m aximizing their u tilit y). F or a p recise description of these tec hniques, see de L eeuw ( 2 005 ), where he treats the particular case of roll call data su c h as ours. Since Co om bs ( 1964 ), it has b een und ersto o d that there is usually a natu- ral left-righ t (i.e., unid imen s ional) mo del for p olitical d ata. Recen t compar- isons [ Burden, Caldeira and Groseclose ( 2000 )] b et wee n the a v ailable left- righ t indices hav e sh own that there is little difference, and that ind ices based on multidimensional scaling [ Hec kman and Snyder ( 199 7 )] p erf orm w ell. F ur ther, Hec kman and Snyder ( 1997 ) conclude “standard roll call mea- sures are go o d pro xies of p ersonal ideology and are still among the b est measures a v ailable.” In empirical work it is often con venie n t to sp ecify a parametric family of utilit y fu n ctions. In that con text, the centra l problem is then to estimate those parameters and to find “ideal p oin ts” for b oth the legis lators and the p olicies. A robust Ba y esian pro cedure for parameter estimatio n in spa- tial mo dels of roll call data w as introd uced in Clin ton, Jac kman an d Riv ers ( 2004 ), and provides a statistical fr amew ork for testing mo dels of legislativ e b ehavi or. Our cut-p oin t mo d el is a bit different and is explained next. Although the empirical d istance ( 1.1 ) is arguably a natural one to use on our data, w e further motiv ate this choic e by considering a theoretical mo d el in w h ic h legislat ors lie on a regular grid in a unidimensional p olicy space. In this idealized mo del it is natural to iden tify legislators l i 1 ≤ i ≤ n with p oin ts in the inte rv al I = [0 , 1] in corresp ondence with their p olitic al id eologi es. W e 8 P . DIACONIS, S. GOEL AND S. HOLMES define the distance b et we en legi slators to b e d ( l i , l j ) = | l i − l j | . This assumption that legisl ators can b e isometrically mapp ed in to an in terv al is k ey to our analysis. In the “cut-p oint mo d el” for v oting, eac h bill 1 ≤ k ≤ m on w hic h the legislato rs vot e is represent ed as a pair ( C k , P k ) ∈ [0 , 1] × { 0 , 1 } . W e can think of P k as ind icating whether the bill is lib eral ( P k = 0) or conserv ativ e ( P k = 1 ), and w e can tak e C k to b e the cut-p oin t b et w een legislat ors that v ote “y ea” or “na y .” Let V ik ∈ { 1 / 2 , − 1 / 2 } ind icate how legislat or l i v otes on bill k . Then, in this mo del, V ik = 1 / 2 − P k , l i ≤ C k , P k − 1 / 2 , l i > C k . As describ ed, the mod el has n + 2 m parameters, one for eac h leg islator and t wo for eac h bill. Th ese p arameters are not identi fiable without further restrictions. Add ing ε to l i and C k results in the same vot es. Belo w w e fix this p roblem b y sp ecifying v alues for l i and a distribution on { C k } . W e reduce the num b er of parameters by assuming that the cu t-p oin ts are indep end en t rand om v ariables u n iform on I . Then, P ( V ik 6 = V j k ) = d ( l i , l j ) , (2.1) since leg islators l i and l j tak e opp osites sides on a giv en bill if and only if the cut-p oin t C k divides them. O bserv e that th e parameters P k do not affect the p r obabilit y ab o ve. The empirical distance ( 1.1 ) b et wee n legi slators l i and l j generalize s to ˆ d m ( l i , l j ) = 1 m m X k =1 | V ik − V j k | = 1 m m X k =1 1 V ik 6 = V j k . By ( 2.1 ), w e can estimate the lat en t distance d b et wee n legislators by the empirical distance ˆ d whic h is computable fr om the v oting r ecord. In p artic- ular, lim m →∞ ˆ d m ( l i , l j ) = d ( l i , l j ) a.s. , since w e assumed the cut-p oin ts are indep endent. More precisely , w e hav e the follo wing result: Lemma 2.1. F or m ≥ log( n/ √ ε ) /ε 2 , P ( | ˆ d m ( l i , l j ) − d ( l i , l j ) | ≤ ε ∀ 1 ≤ i, j ≤ n ) ≥ 1 − ε. HORSESHO ES 9 Pr oof. B y th e Ho effding inequalit y , for fixed l i and l j , P ( | ˆ d m ( l i , l j ) − d ( l i , l j ) | > ε ) ≤ 2 e − 2 mε 2 . Consequen tly , P [ 1 ≤ i ε ! ≤ X 1 ≤ i ε ) ≤ n 2 2 e − 2 mε 2 ≤ ε for m ≥ log( n/ √ ε ) /ε 2 , and th e result follo w s. W e identi fy legislators with p oin ts in the inte rv al I = [0 , 1] and define the distances b et ween them to b e d ( l i , l j ) = | l i − l j | . T his general description seems to b e reasonable not only f or applications in p olitical science, but also in a num b er of other settings. The p oints and the exact distance d are usually unknown, ho wev er, one can often estimate d fr om the d ata. F or our w ork, w e assu m e that one has access to an empirical distance that is lo c al ly accurate, th at is, w e assume one can estimate the distance b etw een nearb y p oint s. T o complete the description of the mo del, something must b e said ab out the h yp othetical legislator p oin ts l i . In Section 3 we sp ecify these so that d ( l i , l j ) = | i/n − j /n | . Because of the uniformit y assump tion on the bill pa- rameters and Lemma 2.1 , asp ects of the com bination of assum ptions can b e empirically tested. A s eries of comparisons b et ween mo del an d data (alo ng with scien tific conclusions) are giv en in Section 4 . These sho w rough bu t go o d accord; see, in p articular, the comparison b et ween Figures 3 , 6 , 7 and Figure 9 and the accompan ying commen tary . Our mo del is a simple, natural set of assumptions whic h lead to a use- ful analysis of these d ata. Th e assumptions of uniform distribution of bills implies identifiabilit y of distances b et ween legislators. Equal sp acing is the mathematica lly simplest assu mption matc hing the observ ed distances. In in- formal work we ha ve tried v arying th ese assump tions but d id not find these v ariations led to a b etter und erstanding of the data. 3. Analysis of the mo del. 3.1. Eigenfunctions and horsesho es. In this sectio n w e analyze m u ltidi- mensional scaling applied to m etric mo dels satisfying d ( x i , x j ) = | i/n − j /n | . 10 P . DIACONIS, S. GOEL AND S. HOLMES This corresp onds to the case in wh ic h legisla tors are uniformly s p aced in I : l i = i/n . No w, if all the inte rp oint d istances w ere kno w n precisely , cla ssical scaling would reconstruct the p oin ts exactly (up to a rev ersal of direction). In applications, it is often not p ossible to ha v e globally accurate informa- tion. Rather, one can only reasonably appr o ximate the int erp oin t d istances for nearby p oin ts. T o reflect this limited knowle dge, w e w ork w ith the dis- similarit y P ( i, j ) = 1 − exp( − d ( x i , x j )) . As a matrix, P = 0 1 − e − 1 /n . . . 1 − e − ( n − 1) /n 1 − e − 1 /n 0 . . . . . . . . . . . . . . . 1 − e − 1 /n 1 − e − ( n − 1) /n . . . 1 − e − 1 /n 0 . W e are in terested in findin g eigenfunctions for the doubly cent ered matrix S = − 1 2 H P H = − 1 2 ( P − J P − P J + J P J ) , where J = (1 /n ) 11 T . T o prov e limiting results, w e work with the scaled matrices S n = (1 /n ) S . Appr o ximate eigenfunctions for S n are found b y con- sidering a limit K of the matrices S n , and then solving the corresp onding in tegral equation Z 1 0 K ( x, y ) f ( y ) dy = λf ( x ) . Standard matrix p erturb ation theory is then applied to reco v er approxi mate eigenfunctions for the original, discrete matrix. When we con tin u ize the scaled m atrices S n , we get the k ern el defi n ed for ( x, y ) ∈ [0 , 1] × [0 , 1] K ( x, y ) = 1 2 e −| x − y | − Z 1 0 e −| x − y | dx − Z 1 0 e −| x − y | dy + Z 1 0 Z 1 0 e −| x − y | dx dy = 1 2 ( e −| x − y | + e − y + e − (1 − y ) + e − x + e − (1 − x ) ) + e − 1 − 2 . Recognizing this as a k ernel similar to those in F redh olm equatio ns of the sec- ond t yp e su ggests that there are trigonometric s olutions, as we s ho w in The- orem A.2 in the App endix . The eigenfun ctions we deriv e are in agreemen t with those arising from the v oting data, lending considerable insigh t into our data analysis problem and, more imp ortan tly , the h orsesho e phenomenon. The sequence of explicit diagonalizations and app ro ximations develo p ed in the App endix leads to the main results of this section giving closed form ap- pro ximations for the eigen v ectors (Theorem 3.1 ) and eigen v alues (Theorem 3.2 ), the pro ofs of these are also in the App endix . HORSESHO ES 11 Theorem 3.1. Consider the c enter e d and sc ale d pr oximity matrix de- fine d by S n ( x i , x j ) = 1 2 n ( e −| i − j | /n + e − i/n + e − (1 − i/n ) + e − j /n + e − (1 − j /n ) + 2 e − 1 − 4) for 1 ≤ i, j ≤ n . 1. Set f n,a ( x i ) = cos( a ( i/n − 1 / 2)) − (2 /a ) sin ( a/ 2) , wher e a is a p ositive solution to tan( a/ 2) = a/ (2 + 3 a 2 ) . Then S n f n,a ( x i ) = 1 1 + a 2 f n,a ( x i ) + R f ,n , wher e | R f ,n | ≤ a + 4 2 n . 2. Set g n,a ( x i ) = sin( a ( i/n − 1 / 2)) , wher e a is a p ositive solution to a cot( a/ 2) = − 1 . Then S n g n,a ( x i ) = 1 1 + a 2 g n,a ( x i ) + R g ,n , wher e | R g ,n | ≤ a + 2 2 n . That is, f n,a and g n,a ar e appr oximate eigenfunctions of S n . Theorem 3.2. Consider the setting of The or em 3.1 and let λ 1 , . . . , λ n b e the eigenvalues of S n . 1. F or p ositive solutions to tan( a/ 2) = a/ (2 + 3 a 2 ) , min 1 ≤ i ≤ n λ i − 1 1 + a 2 ≤ a + 4 √ n . 2. F or p ositive solutions to a cot( a/ 2) = − 1 , min 1 ≤ i ≤ n λ i − 1 1 + a 2 ≤ a + 2 √ n . In the App end ix we pro ve an u ncen tered v ersion of this theorem (Theorem A.3 ) that is used in the case of uncente red matrices whic h w e will need for the d oub le horsesho e case of the next section. In the results ab ov e, we transformed distances in to dissimilarities via the exp onent ial transformation P ( i, j ) = 1 − exp( − d ( x i , x j )). If we w orked with the d istances directly , so that the d iss imilarit y matrix is given by P ( i, j ) = | l i − l j | , then m uch of what w e dev elop here sta ys true. In partic- ular, th e op erators are explicitly d iagonaliz able with similar eigenfunctions. This has b een indep en d en tly studied by physicists in what they call the crystal c onfigur ation of a one-dimensional Anderson mo del, with sp ectral decomp osition analyzed in Bogomol n y , Bohigas and Sc hmit ( 2003 ). 12 P . DIACONIS, S. GOEL AND S. HOLMES Fig. 2. Appr oximate ei genfunctions f 1 and f 2 . 3.1.1. Horsesho es and twin horsesho es. The 2-dimensional MDS map- ping is bu ilt out of the first and second eigenfunctions of the cen tered p ro x- imit y matrix. As sh own abov e, w e h a ve the follo wing approxi mate eigen- functions: • f 1 ( x i ) = f n,a 1 ( x i ) = sin(3 . 67( i/n − 1 / 2)) with eigen v alue λ 1 ≈ 0 . 07, • f 2 ( x i ) = f n,a 2 ( x i ) = cos(6 . 39( i/n − 1 / 2)) with eigen v alue λ 2 ≈ 0 . 02, where the eigen v alues are for the scaled matrix. Figure 2 sho ws a graph of these eigenfunctions. Moreo v er, Figure 3 s h o ws the horsesho e that results from plotting Λ : x i 7→ ( √ λ 1 f 1 ( x i ) , √ λ 2 f 2 ( x i )). F rom Λ it is p ossible to de- duce the r elativ e order of the Represent ativ es in the in terv al I . S ince − f 1 is also an eigenfunction, it is not in general p ossible to determine the abs olute order knowing only th at Λ comes fr om the eige nfunctions. Ho w ev er, as can b e seen in Figure 3 , the relationship b et w een the t wo eigenfunctions is a curv e for w h ic h we ha ve the parametrization give n ab o v e, bu t w h ic h cann ot b e wr itten in functional form, in particular, the second eigen vec tor is not a quadratic fun ction of the first as is sometimes claimed. With the v oting data, we see not one, but t wo horsesh o es. T o see ho w this can happ en, consider the tw o p opulation state space X = { x 1 , . . . , x n , y 1 , . . . , y n } with pr o ximity d ( x i , x j ) = 1 − e −| i/n − j /n | , d ( y i , y j ) = 1 − e −| i/n − j /n | and d ( x i , y j ) = 1 . Th is leads to the partitioned pro ximit y m atrix ˜ P 2 n = P n 1 1 P n , HORSESHO ES 13 where P n ( i, j ) = 1 − e −| i/n − j /n | . Corollar y 3.1. F r om The or em A.3 we have the fol lowing appr oximate eigenfunctions and eigenvalues for − (1 / 2 n ) ˜ P 2 n : • f 1 ( i ) = cos( a 1 ( i/n − 1 / 2 )) , for 1 ≤ i ≤ n f 1 ( j ) = − cos( a 1 (( j − n ) /n − 1 / 2) ) for ( n + 1) ≤ j ≤ 2 n , wher e a 1 ≈ 1 . 3 and λ 1 ≈ 0 . 37 . • f 2 ( i ) = sin( a 2 ( i/n − 1 / 2)) , f or 1 ≤ i ≤ n f 2 ( j ) = 0 for ( n + 1) ≤ j ≤ 2 n , wher e a 2 ≈ 3 . 67 and λ 2 ≈ 0 . 069 . • f 3 ( i ) = 0 , for 1 ≤ i ≤ n , f 3 ( j ) = sin( a 2 (( j − n ) /n − 1 / 2)) f or ( n + 1) ≤ j ≤ 2 n , wher e a 2 ≈ 3 . 67 and λ 3 ≈ 0 . 069 . Pr oof. − 1 2 n ˜ P 2 n = A n 0 0 A n − 1 2 n 11 T , where A n ( i, j ) = (1 / 2 n ) e −| i/n − j /n | . If u is an eigen v ector of A n , then the v ector ( u, − u ) of length 2 n is an eigen v ector of − 1 2 n ˜ P 2 n since A n 0 0 A n − 1 2 n 11 T u − u = λ 1 u − u + 0 . If w e additionally hav e that 1 T u = 0, th en , similarly , ( u, ~ 0) and ( ~ 0 , u ) are also eigenfunctions of − 1 2 n ˜ P 2 n . Fig. 3. A horsesho e that r esults fr om plotting Λ : x i 7→ ( √ λ 1 f 1 ( x i ) , √ λ 2 f 2 ( x i )) . 14 P . DIACONIS, S. GOEL AND S. HOLMES Since the fu nctions f 1 , f 2 and f 3 of Corollary 3.1 are all orthogonal to con- stan t functions, by th e discussion in Section 1.2 they are also appro ximate eigenfunctions for the centered, scaled matrix ( − 1 / 2 n ) H ˜ P 2 n H . Th ese fu nc- tions are graph ed in Figure 4 , and the t win h orsesho es that resu lt fr om the 3-dimensional m app ing Λ : z 7→ ( √ λ 1 f 1 ( z ) , √ λ 2 f 2 ( z ) , √ λ 3 f 3 ( z )) are sho wn in Figure 5 . The first eigen v ector pr o vides the separatio n into t wo groups, this is a well kno w n metho d for separating clusters kn o wn to da y as sp ectral clustering [ Shi and Malik ( 2000 )]. F or a nice su rv ey and consistency r esu lts see v on Luxbu rg, Belkin and Bousquet ( 200 8 ). Remark. The matrices A n and ˜ P 2 n ab o v e are centrosymmetric [ W ea v er ( 1985 )], that is, symmetrical around the cen ter of the matrix. F ormally , if K is the matrix with 1’s in the coun ter (or secondary) diagonal , K = 0 0 . . . 0 1 0 0 . . . 1 0 . . . . . . 0 1 . . . 0 0 1 0 . . . 0 0 , then a matrix B is cen trosymm etric iff B K = K B . A very useful r eview b y W ea v er ( 198 5 ) quotes I. J. Go o d ( 197 0 ) on the connectio n b et ween cen- trosymmetric matrices and k ern els of in tegral equations: “T o eplitz matric es (which ar e examples of matric es which ar e b oth symmetric an d c entr osym- metric) arise as discr ete appr oximations to k e rnels k ( x, t ) of inte gr al e qua- tions when these kernels ar e functions of | x − t | .” (T o day we would c al l Fig. 4. Appr oximate eigenfunctions f 1 , f 2 and f 3 for the c enter e d pr oximity matrix arising fr om the two p opulation m o del. HORSESHO ES 15 Fig. 5. Twi n horsesho es in the two p opulation mo del that r esult f r om plotting Λ : z 7→ ( √ λ 1 f 1 ( z ) , √ λ 2 f 2 ( z ) , √ λ 3 f 3 ( z )) . these isotr opic kernels.) “Similarly if a kernel is an eve n function of its v e c- tor ar gument (x, t), that is, if k ( x, t ) = k ( − x, − t ) , then it c an b e discr etely appr oximate d by a c entr osymmetric mat rix.” Cen trosymmetric matrices ha ve very n eat eigen ve ctor form ulas [ Can toni and Butler ( 197 6 )]. In particular, if the order of the matrix, n , is ev en, then the first eig en v ector is sk ew symm etric and th us of the form ( u 1 , − u 1 ) and orthogonal to the consta n t vec tor. This explains the miracle that seems to o ccur in the simplification of the eigen vect ors in the ab o ve form ulae. 4. Connecting the mo del to the data. When we apply MDS to the vo ting data, the first three eigen v alues are as f ollo ws: • 0 . 13192, • 0 . 00764, • 0 . 00634. Observe that as our t wo p opulation mod el suggests, the second and third eigen v alues are ab out equal and significan tly smaller than the first. Figure 6 sho w s the first, second and third eige nfunctions f 1 , f 2 and f 3 from the v oting data. The 3-dimensional MDS plot in Figure 1(a) is the graph of Λ : x i 7→ ( √ λ 1 f 1 ( x i ) , √ λ 2 f 2 ( x i ) , √ λ 3 f 3 ( x i )). Since legislat ors are not a priori ordered, the eigenfunctions are difficult to in terpret. Ho wev er, our mo del suggests the follo wing ordering: Split the legislato r s in to tw o group s 16 P . DIACONIS, S. GOEL AND S. HOLMES G 1 and G 2 based on the sign o f f 1 ( x i ); then the norm of f 2 is larger on one group, sa y , G 1 , so we sort G 1 based on increasing v alues of f 2 , and similarly , sort G 2 via f 3 . Figure 7 shows the same d ata as do es Figure 6 , but with this judicious ordering of the legislators. Fi gure 8 s h o ws the ordered eigenfunctions obtained from MDS applied to the 2004 roll cal l data. T h e results app ear to b e in agreemen t with the theoretically deriv ed fu n ctions in Figure 4 . This agreemen t give s one v alidatio n of the mo d eling assump tions in Section 2 . The theoret ical second an d third eige nfunctions are p art of a t wo -dimensional eigenspace. In the vo ting data it is reasonable to assume that noise eliminat es symmetry and collapses th e eigenspaces do w n to one dimension. Nonethe- less, w e w ould guess that the second and third eige nfunctions in the v oting data are in the t wo-dimensional predicted eigenspace, as is seen to b e the case in Figures 7 and 8 . Our analysis in S ection 3 suggests that if legislat ors are in fact isomet- rically em b edded in the inte rv al I (relativ e to the roll call distance), then their MDS derive d rank w ill b e consisten t with the order of legislators in the in terv al. This app ears to b e the case in th e data, as seen in Figure 9 , wh ic h sho w s a graph of ˆ d ( l i , · ) for s electe d legisla tors l i . F or example, as we w ould predict, ˆ d ( l 1 , · ) is an increasing f unction and ˆ d ( l n , · ) is decreasing. More- o ver, the data seem to b e in rou gh agreemen t with the metric assumption of our t w o p opulation mo del, namely , that the tw o groups are well separated and that the within group d istance is giv en by d ( l i , l j ) = | i/n − j /n | . Th is agreemen t is another v alidati on of the mod eling assump tions in Section 2. Our vot ing model suggest s that the MDS ord ering of leg islators s hould corresp ond to p olitical ideology . T o test this, we compared the MDS re- Fig. 6. The first, se c ond and thir d eigenfunctions output fr om MDS applie d to the 2005 U.S. House of R epr esentatives r ol l c al l votes. HORSESHO ES 17 Fig. 7. The r e-indexe d first , se c ond and thir d eigenfunctions output f r om MDS applie d to the 2005 U.S. House of Re pr esentatives r ol l c al l votes. Colors indic ate p oli tic al p arties. sults to the assessment of legislators b y Americans for Demo cratic Act ion [ Americans for Democratic Action ( 2005 )]. Eac h yea r AD A selec ts 20 v otes it considers the most imp ortan t du ring that sessio n, for example, the P a- triot Act r eauthorizatio n. Legislato rs are assigned a L ib eral Quotien t: the p ercent age of those 20 v otes on whic h the Represen tativ e vote d in acc or- Fig. 8. The r e-indexe d first , se c ond and thir d eigenfunctions output f r om MDS applie d to the 2004 U.S. House of Re pr esentatives r ol l c al l votes. Colors indic ate p oli tic al p arties. 18 P . DIACONIS, S. GOEL AND S. HOLMES Fig. 9. The empiric al r ol l c al l derive d di stanc e function ˆ d ( l i , · ) for sele cte d l e gislators l i = 1 , 90 , 181 , 182 , 290 , 401 . The x -axis or ders l e gislators ac c or ding to their MDS r ank. dance with w hat ADA considered to b e the lib eral p ositio n. F or example, a represen tativ e who v oted the liberal p osition on all 20 vot es w ould receiv e an LQ of 100%. Figure 10 b elo w sho ws a plot of LQ vs. MDS rank. F or the most p art, the t w o measures are consistent. Ho wev er, MDS sepa- rates t w o groups of relativ ely lib eral Repub licans. T o see why this is the case, consider the t wo legislators Mary Bono (R-CA) with MDS rank 248 and Gil Gutknec ht (R-MN) with rank 373. Both Represen tativ es rece iv ed an ADA rating of 15%, yet had considerably differen t vo ting records. On the 20 AD A bills, b oth Bono and Gutknech t supp orted the lib eral p ositi on 3 times—but nev er sim ultaneously . Consequently , the empirical roll call distance b etw een them is r elativ ely large considering that they are b oth R ep u blicans. Since MDS attempts to preserve lo cal distances, Bono and Gutknec h t are conse- quen tly separated by the algorithm. In this case, distance is directly related to the p rop ensit y of legislators to v ote the same on an y giv en bill. Figure 10 results b ecause this notion of proxi mit y , although related, d o es not cor- resp ond directly to p olitical id eology . The MDS and AD A r ankings comple- men t one another in the sen s e that toge ther they facilita te iden tification of HORSESHO ES 19 Fig. 10. Comp arison of the MDS derive d r ank for R epr esentatives wi th the Lib er al Quo- tient as define d by Americ ans for Demo cr atic A ction. t wo distinct, ye t relativ ely lib eral groups of Republicans. T hat is, although these tw o groups are relativ ely lib eral, they do not s h are the s ame p olitical p ositions. Lik e AD A, the National Jour nal ranks Represen tativ es eac h ye ar b ased on their v oting record. In 2005, The Journal c h ose 41 vote s on economic issu es, 42 on so cial issues and 24 dealing with foreign p olicy . Ba sed on these 107 v otes, legisl ators were assigned a rating b etw een 0 and 100—lo w er num b ers indicate a more lib eral p olitic al ideology . Figure 11 is a plot of the National Journal vs. MDS rankings, and sh o ws results similar to the AD A comparison. As in the ADA case, we see that r elativ ely lib eral Republicans receiv e quite differen t MDS ranks. Interesti ngly , this phenomenon do es n ot app ear f or Democrats und er either the AD A or the Natio nal Journal ranking system. Summary . Our w ork b egan with an empirical finding: multidimensional scaling applied to v oting data from the US house of repr esen tativ es s h o ws a clean double horsesho e pattern (Figure 1 ). These patterns happ en often enough in d ata r eduction tec hniques that it is n atural to seek a theoretica l understanding. Ou r main resu lts giv e a limiting closed form explanation for data matrices that are double-cen tered v ersions of P ( i, j ) = 1 − e − θ | i/n − j /n | , 1 ≤ i, j ≤ n . W e further sho w ho w vo ting data arising from a cut-p oint mo del deve lop ed in Section 3 giv es rise to a mo d el of this form. 20 P . DIACONIS, S. GOEL AND S. HOLMES Fig. 11. C omp arison of the eigende c omp osition derive d r ank for R epr esentative s with the National Journal’s lib er al sc or e. In a follo wu p to this pap er, de Leeu w ( 2007 ) h as sho wn that some of our results can b e deriv ed directly without passing to a con tin uous k ernel. A useful bypro duct of his results and con v ersations with collea gues and studen ts is this: the matrix P i,j ab o v e is totally p ositiv e. S tandard theory sho w s th at th e first eigen v ector can b e tak en increasing and the second as unimo dal. Plotting th ese eigen ve ctors versus eac h other w ill alwa ys result in a horsesh o e shap e. Perhaps this explains the ubiquity of horsesho es. APPENDIX: THEOREMS AND PROOFS F OR SECTION 3 W e state first a classical p erturbation result that relates tw o differen t notions of an appr oximate ei g enfunction . A p ro of is included here to aid the reader. F or more refin ed estimates, see P arlett ( 198 0 ), Chapter 4, page 69. Tw o lemmas provide trigo nometric iden tities that are useful for finding the eigenfunctions for the con tin uou s kernel. Theorem A.2 states sp ecific solutions to this in tegral equation. W e then pro vide a pro of for T heorem 3.1 . Th e v ersion of this theorem for uncent ered matrices (Th eorem A.3 ) follo ws and is used in the t wo horsesh o e case. Theorem A.1. Consider an n × n symmetric matrix A with eigenv alues λ 1 ≤ · · · ≤ λ n . If for ε > 0 k Af − λf k 2 ≤ ε HORSESHO ES 21 for some f , λ with k f k 2 = 1 , then A has an eigenvalue λ k such that | λ k − λ | ≤ ε . If we further assume that s = min i : λ i 6 = λ k | λ i − λ k | > ε, then A has an eigenfu nction f k such that Af k = λ k f k and k f − f k k 2 ≤ ε/ ( s − ε ) . Pr oof. First w e show that min i | λ i − λ | ≤ ε . I f min i | λ i − λ | = 0 , we are done; otherwise A − λI is inv ertible. Then, k f k 2 ≤ k ( A − λI ) − 1 k · k ( A − λ ) f k 2 ≤ ε k ( A − λI ) − 1 k . Since the eigen v alues of ( A − λI ) − 1 are 1 / ( λ 1 − λ ) , . . . , 1 / ( λ n − λ ), b y sym- metry , k ( A − λI ) − 1 k = 1 min i | λ i − λ | . The result no w follo ws since k f k 2 = 1. Set λ k = argmin | λ i − λ | , and consider an orthonormal b asis g 1 , . . . , g m of the asso ciated eigenspace E λ k . Define f k to b e the pr o jection of f on to E λ k : f k = h f , g 1 i g 1 + · · · + h f , g m i g m . Then f k is an eigenfunction with eigen v alue λ k . W riting f = f k + ( f − f k ), w e hav e ( A − λI ) f = ( A − λI ) f k + ( A − λI )( f − f k ) = ( λ k − λ ) f k + ( A − λI )( f − f k ) . Since f − f k ∈ E ⊥ λ k , by symmetry , w e h a ve h f k , A ( f − f k ) i = h Af k , f − f k i = h λ k f k , f − f k i = 0 . Consequen tly , h f k , ( A − λI )( f − f k ) i = 0 and by Pythagoras, k Af − λf k 2 2 = ( λ k − λ ) 2 k f k k 2 + k ( A − λI )( f − f k ) k 2 2 . In particular, ε ≥ k Af − λf k 2 ≥ k ( A − λI )( f − f k ) k 2 . F or λ i 6 = λ k , | λ i − λ | ≥ s − ε . The result n ow f ollo ws since for h ∈ E ⊥ λ k k ( A − λI ) h k 2 ≥ ( s − ε ) k h k 2 . 22 P . DIACONIS, S. GOEL AND S. HOLMES Remark A.1. T h e second stat emen t o f the theorem allo ws n onsim- ple eigen v alues, but requires that the eigen v alues corresp ond in g to distinct eigenspaces b e w ell separated. Remark A.2. The eigenfunction boun d of the theorem is asymptoti- cally tight in ε as the follo wing example illustrates: Consider the matrix A = λ 0 0 λ + s with s > 0. F or ε < s , d efine the fu nction f = p 1 − ε 2 /s 2 ε/s . Then k f k 2 = 1 and k Af − λf k 2 = ε . T he th eorem guaran tees that there is an eigenfunction f k with eigen v alue λ k suc h that | λ − λ k | ≤ ε . Since the eigen v alues of A are λ and λ + s , and since s > ε , we must ha v e λ k = λ . Let V k = { f k : Af k = λ k f k } = { ce 1 : c ∈ R } , where e 1 is the first stand ard basis v ector. Then min f k ∈ V k k f − f k k 2 = k f − ( f · e 1 ) e 1 k = ε/s. The b ound of the theorem, ε/ ( s − ε ), is only sligh tly larger. W e establish an in tegral iden tit y in order to fi nd trigonometric solutions to K f = λf where K is the con tinuized kernel of the cen tered exp onen tial pro ximit y matrix. Lemma A.1. F or c onstants a ∈ R and c ∈ [0 , 1] , Z 1 0 e −| x − c | cos[ a ( x − 1 / 2)] dx = 2 cos[ a ( c − 1 / 2)] 1 + a 2 + ( e − c + e c − 1 )( a sin( a/ 2) − cos( a/ 2)) 1 + a 2 and Z 1 0 e −| x − c | sin[ a ( x − 1 / 2)] dx = 2 sin[ a ( c − 1 / 2)] 1 + a 2 + ( e − c − e c − 1 )( a cos( a/ 2) + sin( a/ 2 )) 1 + a 2 . Pr oof. The lemma follo ws from a straigh tforward in tegration. First split the in tegral in to t wo p ieces: Z 1 0 e −| x − c | cos[ a ( x − 1 / 2)] dx = Z c 0 e x − c cos[ a ( x − 1 / 2)] dx + Z 1 c e c − x cos[ a ( x − 1 / 2)] dx. HORSESHO ES 23 By integ ration b y parts applied t wice, Z e x − c cos[ a ( x − 1 / 2)] dx = ae x − c sin( a ( x − 1 / 2)) + e x − c cos( a ( x − 1 / 2)) 1 + a 2 and Z e c − x cos[ a ( x − 1 / 2)] dx = ae c − x sin( a ( x − 1 / 2) ) − e c − x cos( a ( x − 1 / 2) ) 1 + a 2 . Ev aluating these exp r essions at the appr opr iate limits of int egration give s the first sta temen t of the lemma. T he computati on of R 1 0 e −| x − c | sin[ a ( x − 1 / 2)] dx is analogous, and so is omitted here. W e now derive eigenfunctions for the con tinuous kernel. Theorem A.2. F or the kernel K ( x, y ) = 1 2 ( e −| x − y | + e − y + e − (1 − y ) + e − x + e − (1 − x ) ) + e − 1 − 2 define d on [0 , 1] × [0 , 1] , the c orr esp onding inte gr al e quation Z 1 0 K ( x, y ) f ( y ) dy = λf ( x ) has solutions f ( x ) = sin( a ( x − 1 / 2)) , a cot( a/ 2) = − 1 and f ( x ) = cos( a ( x − 1 / 2)) − 2 a sin( a/ 2) , tan( a/ 2) = a 2 + 3 a 2 . In b oth c ases, λ = 1 / (1 + a 2 ) . Pr oof. First n ote that b oth classes of fun ctions in the statemen t of the theorem satisfy R 1 0 f ( x ) dx = 0. Consequ ently , the int egral simp lifies to Z 1 0 K ( x, y ) f ( y ) dy = 1 2 Z 1 0 ( e −| x − y | + e − y + e − (1 − y ) ) f ( y ) dy . F u r thermore, since e − y + e − (1 − y ) is symmetric ab out 1 / 2 and sin( a ( y − 1 / 2)) is skew-symmetric ab out 1 / 2, Lemma A.1 shows that Z 1 0 K ( x, y ) sin( a ( y − 1 / 2) ) dy = 1 2 Z 1 0 e −| x − y | sin( a ( y − 1 / 2)) dy = sin[ a ( c − 1 / 2) ] 1 + a 2 + ( e − c − e c − 1 )( a cos( a/ 2) + sin( a/ 2)) 2(1 + a 2 ) . 24 P . DIACONIS, S. GOEL AND S. HOLMES This establishes the first statemen t of the theorem. W e examine the second. Since R 1 0 K ( x, y ) dy = 0 , Z 1 0 ( e −| x − y | + e − y + e − (1 − y ) ) dy = (4 − 2 e − 1 − e − x − e − (1 − x ) ) and also, b y straigh tforward integ ration b y parts, Z 1 0 e − y cos( a ( y − 1 / 2)) dy = Z 1 0 e − (1 − y ) cos( a ( y − 1 / 2)) dy = a sin( a/ 2)(1 + e − 1 ) 1 + a 2 + cos( a/ 2)(1 − e − 1 ) 1 + a 2 . Using the result of Lemma A.1 , we h a ve 1 2 Z 1 0 [ e −| x − y | + e − y + e − (1 − y ) ] cos( a ( y − 1 / 2)) − 2 a sin( a/ 2) dy = cos[ a ( x − 1 / 2)] 1 + a 2 + ( e − x + e x − 1 )( a sin( a/ 2) − cos( a/ 2 )) 2(1 + a 2 ) + a sin( a/ 2)(1 + e − 1 ) 1 + a 2 + cos( a/ 2)(1 − e − 1 ) 1 + a 2 − 1 a sin( a/ 2) (4 − 2 e − 1 − e − x − e − (1 − x ) ) = cos[ a ( x − 1 / 2)] 1 + a 2 − 2 sin( a/ 2) a (1 + a 2 ) + φ ( x ) a (1 + a 2 ) , where φ ( x ) = 2 sin( a/ 2) + a ( e − x + e x − 1 )( a sin( a/ 2) − cos( a/ 2)) / 2 + a 2 sin( a/ 2)(1 + e − 1 ) + a cos( a/ 2) (1 − e − 1 ) − (1 + a 2 ) sin( a/ 2)(4 − 2 e − 1 − e − x − e − (1 − x ) ) . The result follo ws b y grouping th e terms of φ ( x ) so that we see φ ( x ) = [2 − 4 + 2 e − 1 + e − x + e − (1 − x ) ] sin( a/ 2) + [ e − x / 2 + e x − 1 / 2 + 1 + e − 1 − 4 + 2 e − 1 + e − x + e − (1 − x ) ] a 2 sin( a/ 2) + [ − e − x / 2 − e x − 1 / 2 + 1 − e − 1 ] a cos( a/ 2) = [ − e − x / 2 − e x − 1 / 2 + 1 − e − 1 ] × [ a cos( a/ 2) − 2 sin( a/ 2) − 3 a 2 sin( a/ 2) ] . Theorem A.2 states sp ecific solutions to our in tegral equation. No w we sho w that in fact these are all the solutions with p ositiv e eige n v alues. T o HORSESHO ES 25 start, observ e that for 0 ≤ x, y ≤ 1, e − 1 ≤ e −| x − y | ≤ 1 and e − 1 + 1 ≤ e − x + e − (1 − x ) ≤ 2 e − 1 / 2 . Consequentl y , − 1 < 3 2 e − 1 + 1 + e − 1 − 2 ≤ K ( x, y ) ≤ 1 2 + 2 e − 1 / 2 + e − 1 − 2 < 1 and so k K k ∞ < 1. In particular, if λ is an eigen v alue of K , then | λ | < 1. No w supp ose f is an eigenfunction of K , that is, λf ( x ) = Z 1 0 [ 1 2 ( e −| x − y | + e − x + e − (1 − x ) + e − y + e − (1 − y ) ) + e − 1 − 2] f ( y ) dy . T aking the deriv ativ e with resp ect to x , w e see that f satisfies λf ′ ( x ) = 1 2 Z 1 0 ( − e −| x − y | H y ( x ) − e − x + e − (1 − x ) ) f ( y ) dy , (A-1) where H y ( x ) is the Hea viside function, th at is, H y ( x ) = 1 for x ≥ y and H y ( x ) = − 1 for x < y . T aking the deriv ativ e again, we get λf ′′ ( x ) = − f ( x ) + 1 2 Z 1 0 ( e −| x − y | + e − x + e − (1 − x ) ) f ( y ) dy . (A-2) No w, sub stituting bac k in to the in tegral equation, w e see λf ( x ) = λf ′′ ( x ) + f ( x ) + Z 1 0 [ 1 2 ( e − y + e − (1 − y ) ) + e − 1 − 2] f ( y ) dy . T aking one fin al deriv ativ e with resp ect to x , and sett ing g ( x ) = f ′ ( x ), w e see g ′′ ( x ) = λ − 1 λ g ( x ) . (A-3) F or 0 < λ < 1, all the solutions to ( A-3 ) can b e written in the form g ( x ) = A sin( a ( x − 1 / 2)) + B cos( a ( x − 1 / 2)) with λ = 1 / (1 + a 2 ). Consequen tly , f ( x ) take s the form f ( x ) = A sin( a ( x − 1 / 2)) + B cos( a ( x − 1 / 2)) + C. Note that since R 1 0 K ( x, y ) dy = 0 , the constan t fu nction c ( x ) ≡ 1 is an eigen- function of K with eige n v alue 0. Since K is symmetric, for an y eigenfunc- tion f with nonzero eigen v alue, f is orthogonal to c in L 2 ( dx ), that is, R 1 0 f ( x ) dx = 0. In particular, for 0 < λ < 1 , w ithout loss, we assume f ( x ) = A sin( a ( x − 1 / 2)) + B cos( a ( x − 1 / 2) ) − 2 a sin( a/ 2) . W e solv e for a , A and B . First assume B 6 = 0, and divide f through by B . Then f (1 / 2) = 1 − (2 /a ) sin( a/ 2). S ince K ( x, · ) is symmetric ab out 1 / 2 and 26 P . DIACONIS, S. GOEL AND S. HOLMES sin( a ( x − 1 / 2) ) is skew-symmetric ab out 1 / 2, we ha ve λf (1 / 2) = 1 − (2 /a ) sin( a/ 2) 1 + a 2 = Z 1 0 1 2 ( e | y − 1 / 2 | + e − y + e − (1 − y ) ) + e − 1 / 2 + e − 1 − 2 f ( y ) dy = 1 2 Z 1 0 ( e | y − 1 / 2 | + e − y + e − (1 − y ) ) cos( a ( y − 1 / 2)) dy + 2 a sin( a/ 2) ( e − 1 / 2 + e − 1 − 2) = 1 1 + a 2 + e − 1 / 2 ( a sin( a/ 2) − cos( a/ 2) ) 1 + a 2 + a sin( a/ 2)(1 + e − 1 ) 1 + a 2 + cos( a/ 2)(1 − e − 1 ) 1 + a 2 + 2 a sin( a/ 2) ( e − 1 / 2 + e − 1 − 2) . The last equalit y follo ws from Lemma A.1 . Equ ating the sides, a satisfies 0 = 2 sin( a/ 2) + e − 1 / 2 a ( a sin( a/ 2) − cos ( a/ 2)) + a 2 sin( a/ 2)(1 + e − 1 ) + a cos( a/ 2)(1 − e − 1 ) + 2(1 + a 2 ) sin( a/ 2)( e − 1 / 2 + e − 1 − 2) = (1 − e − 1 / 2 − e − 1 )( a cos( a/ 2) − 2 sin( a/ 2) − 3 a 2 sin( a/ 2) ) . F rom this it is immediate that tan( a/ 2 ) = a/ (2 + 3 a 2 ). No w w e su pp ose A 6 = 0 and divid e f through by A . Th en f ′ (1 / 2) = a and f rom ( A-1 ) λf ′ (1 / 2) = a 1 + a 2 = − 1 2 Z 1 0 e −| y − 1 / 2 | H y (1 / 2) f ( y ) dy = − 1 2 Z 1 0 e −| y − 1 / 2 | H y (1 / 2) sin( a ( y − 1 / 2) ) dy = − e − 1 / 2 1 + a 2 ( a cos( a/ 2) + sin( a/ 2) ) + a 1 + a 2 . In particular, a cot( a/ 2) = − 1. The solutions of tan( a/ 2) = a/ (2 + 3 a 2 ) are appro ximately 2 k π for in tegers k and the solutions of a cot( a/ 2) = − 1 are app ro ximately (2 k + 1) π . L emma A.2 makes this pr ecise. S ince they d o not ha v e any common solutions, A = 0 if and only if B 6 = 0. Th is completes the argument that Th eorem A.2 lists all the eigenfunctions of K with p ositiv e eigen v alues. HORSESHO ES 27 Lemma A.2. 1. The p ositive solutions of tan ( a/ 2) = a/ (2 + 3 a 2 ) lie in the set ∞ [ k =1 (2 kπ , 2 k π + 1 / 3 kπ ) , with exactly o ne solutio n p er interval. F urthermo r e, a is a solution if and only if − a is a solution. 2. The p ositive solutions of a cot( a/ 2) = − 1 lie in the set ∞ [ k =0 ((2 k + 1) π , (2 k + 1) π + 1 / ( kπ + π / 2)) , with exactly o ne solutio n p er interval. F urthermo r e, a is a solution if and only if − a is a solution. Pr oof. Let f ( θ ) = tan( θ / 2) − θ / (2 + 3 θ 2 ). Then f is an o dd fun ction, so a is a solution to f ( θ ) = 0 if and only if − a is a solution. Now, f ′ ( θ ) = 1 2 sec 2 ( θ / 2) + 3 θ 2 − 2 (3 θ 2 + 2) 2 and so f ( θ ) is increasing for θ ≥ p 2 / 3. Recall the p ow er series expansion of tan θ for | θ | < π / 2 is tan θ = θ + θ 3 / 3 + 2 θ 5 / 15 + 17 θ 7 / 315 + · · · . In particular, for 0 ≤ θ < π / 2, tan θ ≥ θ . Consequently , for θ ∈ (0 , π / 2), f ( θ ) ≥ θ 2 − θ 2 + 3 θ 2 > 0 . So f has n o ro ots in (0 , π / 2), and is increasing in the domain in wh ic h w e are inte rested. F ur thermore, for k ≥ 1, f (2 k π ) < 0 < + ∞ = lim θ → (2 k +1) π − f ( θ ) . The thir d and fourth quadran ts ha v e no solutions since f ( x ) < 0 in those regions. This sho ws that the solutions to f ( θ ) = 0 lie in the interv als ∞ [ k =1 (2 kπ , 2 k π + π ) , with exactly one solution p er in terv al. Finally , for k ∈ Z ≥ 1 , f (2 k π + 1 / (3 k π )) ≥ tan( k π + 1 / (6 k π )) − 1 6 k π = tan(1 / (6 kπ )) − 1 6 k π ≥ 0 , 28 P . DIACONIS, S. GOEL AND S. HOLMES whic h giv es the result. T o pro v e the second statemen t of the lemma, set g ( θ ) = θ cot( θ / 2). T hen g is ev en, so g ( a ) = − 1 if and only if g ( − a ) = − 1. S ince g ′ ( θ ) = cot( θ / 2) − ( θ / 2) csc 2 ( θ / 2), g ( θ ) is negativ e and d ecreasing in third and fourth quadran ts (assuming θ ≥ 0) and furthermore, g ((2 k + 1) π ) = 0 > − 1 > −∞ = lim θ → 2( k +1) π − g ( θ ) . The first and second quadrants ha ve no solutio ns since g ( x ) ≥ 0 in those regions. This sho ws that the solutions to g ( x ) = − 1 lie in the interv als ∞ [ k =0 ((2 k + 1) π , (2 k + 1) π + π ) , with exactly one solution p er in terv al. Finally , for k ∈ Z ≥ 0 , g ((2 k + 1) π + 1 / ( k π + π / 2)) = ((2 k + 1) π + 1 / ( k π + π / 2)) cot( kπ + π / 2 + 1 / (2 k π + π )) = ((2 k + 1) π + 1 / ( k π + π / 2)) cot( kπ + π / 2 + 1 / (2 k π + π )) = ((2 k + 1) π + 1 / ( k π + π / 2)) cot( π / 2 + 1 / (2 k π + π )) = − ((2 k + 1) π + 1 / ( kπ + π / 2)) tan(1 / (2 kπ + π )) < − 1 , whic h completes the pro of. The exact eigenfunctions for the con tinuous k ern el yield approxima te eigenfunctions and eigen v alues for the discrete case. Here w e giv e the pro of of Th eorem 3.1 . Pr oof of Theorem 3.1 . That f and g are approximat e eigenfunctions for the discrete matrix f ollo ws d ir ectly from Th eorem A.2 . Supp ose K is the con tinuous k ernel. Then, S n f n,a ( x i ) = n X j =1 S n ( x i , x j )[cos( a ( j /n − 1 / 2) ) − (2 /a ) sin( a/ 2)] = Z 1 0 K ( x i , y )[ cos( a ( y − 1 / 2)) − (2 /a ) sin( a/ 2)] dy + R f ,n = 1 1 + a 2 f n,a ( x i ) + R f ,n , where the error term satisfies | R f ,n | ≤ M 2 n for M ≥ sup 0 ≤ x ≤ 1 d dx K ( x i , y )[cos( a ( y − 1 / 2)) − (2 /a ) sin( a/ 2)] HORSESHO ES 29 b y the standard righ t-hand ru le error b ound. In particular, we can tak e M = a + 4 ind ep endent of j , fr om whic h the result for f n,a follo ws. T he case of g n,k is analogous. The ve rsion of this theorem for uncent ered m atrices is as f ollo ws: Theorem A.3. F or 1 ≤ i, j ≤ n , c onsider the matric e s define d by A n ( i, j ) = 1 2 n e −| i − j | /n and S n ( i, j ) = A n − 1 2 n 11 T . 1. Set f n,a ( x i ) = cos( a ( i/n − 1 / 2)) , wher e a is a p ositive solution to a tan( a/ 2) = 1 . Then A n f n,a ( x i ) = 1 1 + a 2 f n,a ( x i ) + R f ,n wher e | R f ,n | ≤ a + 1 2 n . 2. Set g n,a ( x i ) = sin( a ( i/n − 1 / 2)) , wher e a is a p ositive solution to a cot( a/ 2) = − 1 . Then S n g n,a ( x i ) = 1 1 + a 2 g n,a ( x i ) + R g ,n wher e | R g ,n | ≤ a + 1 2 n . That is, f n,a and g n,a ar e appr oximate eigenfunctions of A n and S n . The p ro of of Theorem A.3 is analogous to T heorem 3.1 b y wa y of Lemma A.1 and so is omitted here. Pr oof of Theorem 3.2 . Let ˜ f n,a = f n,a / k f n,a k 2 . Then b y Theorem 3.1 , K n ˜ f n,a ( x i ) − 1 1 + a 2 ˜ f n,a ( x i ) ≤ a + 4 2 n k f n,a k 2 and, consequent ly , K n ˜ f n,a ( x i ) − 1 1 + a 2 ˜ f n,a ( x i ) 2 ≤ a + 4 2 √ n k f n,a k 2 . By Lemma A.2 , a lies in one of the interv als (2 kπ , 2 k π + 1 / 3 kπ ) for k ≥ 1. Then | f n,a ( x n ) | = | cos( a/ 2) − (2 /a ) sin( a/ 2) | ≥ cos(1 / 6 π ) − 1 /π ≥ 1 / 2 . 30 P . DIACONIS, S. GOEL AND S. HOLMES Consequen tly , k f n,a k 2 ≥ | f n,a ( x n ) | ≥ 1 / 2 and so the fi rst statemen t of the result follo ws from Theorem A.1 . The second statemen t is analogous. Ac kn o wledgments. W e thank Harold Widom, Ric hard Mon tgomery , Beres- ford Pa rlett, Jan de Leeu w and Doug Rivers for bibliographical p ointe rs and helpful con versati ons. Ca jo ter Braak did a w onderful job educating us as w ell as p oin ting out typos and mistak es in an earlier draft. SUPPLEMENT AR Y MA TERIAL Supplementary files for “Horseshoes in multidimensional scaling and lo- cal k ernel metho ds” (DOI: 10.12 14/08 -A OAS165SUPP ; .tar). This dir ectory [ Diaconis, Go el and Holmes ( 2008 )] co n tains b oth th e matlab (mds analysis.m) and R files (mdsanalysis.r) and the original data(v oting record2005.t xt,v oting record description.txt, house m em b ers description.txt,house members2005. txt,house part y2005.txt) as well as the transformed d ata (reduced v oting record2005.t xt,reduced house part y2005. txt). REFERENCES Albouy, A. (2004). Mutual d istances in celesti al mechanics. Lectures at Nank ai Insti- tute, Tianjin, China. A v ailable at http://ww w.imcce.fr /fr/presen tation/equip es/ASD/ preprints/ prep.2004/Albouy Nank ai09 2004.pdf . Americans for Democra ti c A ction (2005). A DA Congressio nal vo ting record—U.S. House of Representa tives. Av ailable at http://www .adaction. org . Bogomolny, E., Bohiga s, O. and Schmit, C. (2003). S p ectral properties of distance matrices. J. Phys. A: Math. Gen. 36 3595–3 616. Av ailable at http://www .citebase. org/abstract?id=oai:arXiv.org:nlin/0301044 . MR1986436 Borc hardt, C. W. (1866). Ueber die aufg ab e des maxim um, w elche der b estimmung des tetraeders v on gr¨ osstem vo lumen b ei gegeb enem fl¨ ac h eninhalt d er seitenfl¨ a chen f ¨ ur mehr als drei dimensionen ents prich t. M ath. Abha nd. Akad. Wiss. Berlin 121–1 55. Borg, I. and Groenen, P. (1997). Mo dern Mul tidimensional Sc aling: The ory and Appli- c ations . S p ringer, New Y ork. MR142424 3 Burden, B. C., Caldeira, G. A. and Groseclose, T . (2000). Measuring the ideologies of U. S. senators: The song remains the same. L e gislative Studies Quarterly 25 237–258. Cantoni, A. and Butler, P. (1976). Eigen v alues and eige nv ectors of symmetric cen- trosymmetrlc matrices. Line ar Algebr a Appl. 13 275–288. MR0396614 Clinton, J., Ja ckman, S. and Rivers, D. (2004). The statistical analysis of roll call data. Americ an Politic al Scienc e R eview 355–370. Coombs, C. H. (19 64). A The ory of Data . Wiley , New Y ork. Co x, T. F. and Cox, M. A. A. (2000). Multidimensional Sc aling . Chapman and Hall, London. MR133544 9 Diaconis, P., Goel, S. and Holmes, S. (2008 ). Supplement to “Horseshoes in multidi- mensional scaling and lo cal kernel metho ds.” DOI: 10.121 4/08-A OAS165 SUPP . HORSESHO ES 31 de Leeuw, J. (2005). Multidimensio nal unfolding. In Encyclop e dia of Statistics in Behav- ior al Scienc e . Wiley , New Y ork. de Leeuw, J. (200 7). A horseshoe for multidimensional scaling. T echnical report, UCLA. Dufrene, M. and Legendre, P. (1991). Geographic structure and p otentia l eco- logical factors in Belgium. J. Bi o ge o gr aphy . Avai lable at http:// links.jstor.org / sici?sic i=0305-027 0(199105)18%253A3%253C257%253A%GSAPEF%253E2.0.CO%253B2-F . Good, I. J. (1970). The inv erse of a centrosymmetric matrix. T e chnometrics 12 925–9 28. MR029778 0 Guttman, L. (1968). A general nonmetric technique for find ing the small- est coordinate space fo r a confi guration of . . . . Psychometrika . Avai lable at http://www .springerl ink.com/index/AG2018142W42704L.pdf . Heckman, J. J. and Snyde r, J. M. (1997). Linear probabilit y models of the demand for attributes with an empirical application t o estimating the preferences of legislators. RAND J. Ec onomi cs 28 S142–S189. Hill, M. O. and G auch, H. G. (1980 ). Detrended correspondence analysis, an improv ed ordination technique. V e getatio 42 47–58. Iw a tsubo, S. (1984). The analytical solutions of an eigen v alue problem in the case of ap- plying optimal scori ng metho d to some types of data. In D ata Analysis and Informatics I I I 31–40. North-Holland, Amsterdam. MR0787633 Kendall, D. G. (1970). A mathematical approach to seriati on. Phil. T r ans. R oy. So c. L ondon 269 125–135. Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate A nalysis . Academic Press, New Y ork. Niyogi, P. (2003). Laplacian eigenmaps for dimensionalit y reduction and data represen tation. Neur al Computation 15 1373– 1396. Av ailable at http://www .mitpressj ournals.org/doi/abs/10.1162/089976603321780317 . Office of the Clerk—U.S. H ouse of R epresent a tives . (2005 ). U.S. House of Representati ves roll call votes 109th Congress—1st session. A v ailable at http://cle rk.house.g ov . P almer, M. (2008). Ordination metho ds for ecologis ts. Av ailable at http://ord ination.ok state.edu/ . P arlett, B . N. (1980 ). The Symmetric Eigenvalue Pr oblem . Prentic e H all, Englew oo d Cliffs, NJ. MR057011 6 Pod ani, J. and Miklos, I. (200 2). Resemblance co efficients and the horsesho e effect in principal co ordinates analysis. Ec olo gy 3331–334 3. Ro weis, S. T. and Saul, L. K. (2000). Nonlinear dimensionalit y reduction b y locally linear embedd ing. Scienc e 2323–232 6. Schoenberg, I. J. (1935). Remarks to Maurice Fr´ echet’s article “Sur la d´ efinition ax- iomatique d’une classe d’espace distanci ´ es vectoriell ement applicable sur l’espace de Hilb ert.” Ann. of Math. (2) 36 724–73 2. MR150324 8 Sch ¨ olk opf, B., Smola, A. and Muller, K.-R. (1998 ). Nonlinear comp onent analy- sis as a kernel eigenv alue problem. Neur al Computation 10 1299–13 19. Av ailable at http://www .mitpressj ournals.org/doi/abs/10.1162/089976698300017467 . Shep ard, R. N. (1962). The analysis of pro ximities: Multidimensi onal scaling with an unknown distance function. I. Psychometrika 27 125–140. MR014037 6 Shi, J. and Malik, J. (2000). Normalized cuts and image segmentatio n. IEEE T r ans. Pattern An alysis and Machine Intel li genc e 22 888–905. Ava ilable at citeseer.i st.psu.edu /shi97normalized.html . Tenenbaum, J. B., d e Sil v a, V. and Langford, J. C. (2000 ). A global geometri c framew ork for nonlinear dimensionality reduction. Scienc e 2319–2323. 32 P . DIACONIS, S. GOEL AND S. HOLMES ter B raak, C. (1985). Correspondence analysis of incidence and abundance data: Prop erties in terms of a unimodal response . . . . Biometrics . A v ailable at http:// links.jstor.org /sici?si ci=0006-341X(198512)41%253A4%253C859%253A%CAOIAA %253E2. 0.CO%253B 2-S . ter Braak, C. J. F. (1987). Ordination. In Data A nalysis in Community and L andsc ap e Ec olo gy 81–173. Center for Agricultural Publishing and Do cu mentati on. W ageningen, The Netherlands. ter Braak, C. and Prentice, I. (1988). A theory of gra- dient analysis. A dvanc es in Ec olo gic al R ese ar ch . Av ailable at http://cat .inist.fr/ ?aModele=afficheN&cpsidt=7248779 . Tor gerson, W. S. (195 2). Multidimensional scaling. I. Theory and metho d. Psychome- trika 17 401–419. MR005421 9 vo n Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clus- tering. Ann. Statist. 36 555–5 86. MR239680 7 W ar tenberg, D., Ferso n, S. and Rohlf, F. (1987). Putting things in order: A critique of d etren ded corresp ondence analysis. The Americ an Natur alist . Avai lable at http:// links.jstor.org /sici?si ci=0003-0147(198703)129%253A3%253C434%253%APTIOA C %253E2. 0.CO%253B 2-3 . Wea ver, J. R. (198 5). Centros ymmetric (cross-symmetric) matrices, t h eir basic proper- ties, eigenv alues, and eigen vectors. A mer. Math. Monthly 92 711–7 17. MR082005 4 Williams, C. K. (2000). On a connection b etw een kernel PCA and metric multidimen- sional scaling. In NI PS 675–68 1. Young, G . and Householder, A. S. (1938). Discussion of a set of p oints in terms of their mutual distances. Psychometrika 3 19–22. S. Holmes P. Diaconis Dep ar tmen t of St a tistics St anford University St anford, California 9430 5 USA URL: h ttp://www-stat.stanford.edu/˜ susan/ E-mail: susan@stat.stanford.edu S. Goel Y ah oo! Resear ch 111 W. 40th Street, 17th Floor New York, New York 10025 USA E-mail: goel@yahoo-inc.com URL: ht tp://www-rcf.usc.edu/˜sharadg/
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment