Taking all positive eigenvectors is suboptimal in classical multidimensional scaling

T aking all p ositiv e eigen v ectors is sub optimal in classical m ultidimensional scaling Jeﬀrey Tsang, Ra jesh Pereira jeﬀrey .tsang@ieee.org p ereirar@uoguelph.ca Abstract It is hard to o verstate the importance of multidimensional scaling as an analysis tec hnique in the broad sciences. Classical, or T orgerson m ul- tidimensional scaling is one of the main v ariants, with the adv an tage that it has a closed-form analytic solution. Ho wev er, this solution is exact if and only if the distances are Euclidean. Con versely , there has been comparativ ely little discussion on what to do in the presence of negative eigen v alues: the in tuitiv e solution, prima facie justiﬁable in least-squares terms, is to take ev ery p ositive eigen vector as a dimension. W e sho w that this, minimizing least-squares to the cen tred distances instead of the true distances, is sub optimal — thro wing a wa y positive eigenv ectors can decrease the error ev en as w e pro ject to few er dimensions. W e pro vide pro v ably better metho ds for handling this common case. 1 In tro duction Multidimensional scaling is a fundamen tal analysis tec hnique that takes as input a matrix of distances or dissimilarities b etw een items and returns a conﬁguration of points in Euclidean space suc h that the in ter-p oin t distances appro ximate the input. F rom the original deﬁnition and usage in psyc hology [ 12 , 22 ], applications ha ve spread throughout all the sciences, for example in ecology [ 10 ], genetics [ 13 ], palynology [ 17 ], neuroscience [ 7 ], medicine [ 3 ], education [ 21 ], management [ 19 ], m usic [ 8 ], ph ysical c hemistry [ 25 ], and electrical engineering [ 9 ]. Multiple textb ooks ha ve been written on the sub ject, see for example [ 1 , 4 ]. Classical multidimensional scaling was the ﬁrst version, which op erates based on matrix eigendecomposition. This giv es it the adv antage of ha ving a closed- form solution; in the case that no exact solution exists, an intuitiv e solution with a seemingly straightforw ard error analysis can be found. Unfortunately , that analysis is hea vily marred b y the simple fact that it relies on minimizing least-squares to the cen tred distances instead of the true distances. W e show that the na ¨ ıv e solution is therefore sub optimal and pro vide b etter methods for handling this common case. 1 2 Preliminaries Unless otherwise noted, all matrices will b e of size n × n . W e will freely switch b et w een capital letters and entry-form [ a ij ], that is the i, j th en try of the matrix is a ij , as a function of i and j . Let δ ( i = j ) b e the Kroneck er delta function, whic h is 1 if i = j and 0 otherwise. Let I = [ δ ( i = j )] denote the identit y matrix, J = [1] denote the matrix of all ones, and Φ b e the linear op erator that tak es a matrix and zeros out all en tries oﬀ the main diagonal. That is, Φ ([ a ij ]) = [ δ ( i = j ) a ij ] . W e will follow the exp osition in [ 11 ]. Let D = [ d 2 ij ] b e the input data matrix of squared distances: w e will presume that it is symmetric, has a zero diagonal and otherwise nonnegative entries, and the (non-squared) entries satisfy the triangle inequality . T o prepare for classical MDS, w e will double-centre the matrix and scale this by − 1 / 2; call this B : B = [ b ij ] =  − 1 2 ( d 2 ij − d 2 i · − d 2 · j + d 2 ·· )  = − 1 2  I − 1 n J  D  I − 1 n J  , where d 2 i · is the mean of the i th row of the matrix, d 2 · j the mean of the j th column, and d 2 ·· the mean of all n 2 en tries of the matrix. T o see the p oin t of this transformation, assume for now that the matrix comes from a set of n v ectors in R n , say { ~ x 1 , ~ x 2 , . . . , ~ x n } . Since distances are in v ariant under translation, further suppose the set of vectors has mean ¯ x = 1 n P n k =1 ~ x k = ~ 0. So d 2 ij = k ~ x i − ~ x j k 2 = k ~ x i k 2 − 2 h ~ x i , ~ x j i + k ~ x j k 2 . Thus, − 2 b ij = k ~ x i − ~ x j k 2 − 1 n n X k =1 k ~ x i − ~ x k k 2 − 1 n n X k =1 k ~ x j − ~ x k k 2 + 1 n 2 n,n X k,l =1 k ~ x k − ~ x l k 2 =  k ~ x i k 2 − 2 h ~ x i , ~ x j i + k ~ x j k 2  − k ~ x i k 2 − 2 h ~ x i , ¯ x i + 1 n n X k =1 k ~ x k k 2 ! − k ~ x j k 2 − 2 h ~ x j , ¯ x i + 1 n n X k =1 k ~ x k k 2 ! + 2 n n X k =1 k ~ x k k 2 − 2 h ¯ x, ¯ x i ! = − 2 h ~ x i , ~ x j i . Since b ij = h ~ x i , ~ x j i , B is a Gram matrix of inner pro ducts, formed by X X T , where X = [ ~ x 1 , ~ x 2 , . . . , ~ x n ], the matrix formed by concatenating eac h ~ x i as a column. W e no w attempt to tak e the square root of the matrix to reco ver the v ectors. T o do so, we diagonalize the matrix: B = V Λ V T =  V Λ 1 2   Λ 1 2 V T  =  V Λ 1 2   V Λ 1 2  T where the eigen vectors and singular v ectors coincide as the matrix is symmetric. T o hav e a real solution, we therefore need that all eigenv alues of B b e nonnega- tiv e. Let us list the eigenv alues of B in descending order as λ 1 ≥ λ 2 ≥ . . . ≥ λ n . 2 Note that we could write a linear transformation to turn D into B ; we will write out its inv erse here. b ii + b j j − 2 b ij = ( d 2 ii − d 2 i · − d 2 · i + d 2 ·· )+( d 2 j j − d 2 j · − d 2 · j + d 2 ·· ) − 2( d 2 ij − d 2 i · − d 2 · j + d 2 ·· ) = d 2 ij since by symmetry d 2 i · = d 2 · i . W e hav e that Φ( B ) J = [ P n k =1 ( δ ( i = k ) b ik )(1)] = [ b ii ] and similarly that J Φ( B ) = [ b j j ]. Thus the equation D = Φ( B ) J + J Φ( B ) − 2 B , completing the preliminaries. 3 Least-squares error Consider the problem of having negative eigenv alues in B . A celebrated theorem [ 20 ] pro ves that there cannot be any exact solution to MDS in this case, so we need to deﬁne a measure of the error to optimize. Naturally , some form of least-squares is indicated, for example L ( X ) = n,n X i,j =1  d 2 ij − k ~ x i − ~ x j k 2  2 on the squared distances, which deﬁnes classical MDS. W e will not go in to detail regarding the competing v ariants (for example, deﬁning least-squares on the non-squared distances leads to a diﬀeren t ob jective, see [ 5 , 6 ]). F rom now on, let X be an approximate solution to MDS. F or now, consider the ur-error L 0 ( X ) = n,n X i,j =1  b ij − ( X X T ) ij  2 whic h is the sum of the squares of each entry of B − X X T , also kno wn as the square of the F robenius norm, or   B − X X T   2 F = tr(( B − X X T )( B − X X T ) T ). The trace of a matrix is the sum of its diagonal, tr([ a ij ]) = P n i =1 a ii . Standard results in matrix analysis state that the F rob enius norm squared is the sum of squares of the singular v alues (= sum of squares of eigenv alues for a symmetric matrix). Th us the in tuitiv e solution to the negativ e eigen v alue problem ab o v e: pick X X T = V Λ + V T , that is, let X X T and B ha v e the same eigen vectors, with all negativ e eigenv alues clamp ed to 0 [ 14 ]. The minimized “error” in this case is simply the sum of squares of negativ e eigen v alues. Unfortunately , the error is phrased in terms of the en tries of D , not B . Since D = Φ( B ) J + J Φ( B ) − 2 B , in the same w ay , the appro ximated distances are giv en by Φ( X X T ) J + J Φ( X X T ) − 2 X X T . Thus the true error is L ( X ) =   (Φ( X X T ) J + J Φ( X X T ) − 2 X X T ) − (Φ( B ) J + J Φ( B ) − 2 B )   2 F =   (Φ( B − X X T ) J + J Φ( B − X X T )) − 2( B − X X T )   2 F 3 b y linearity of Φ. W rite as shorthand R = [ r ij ] = B − X X T . = k Φ( R ) J + J Φ( R ) k 2 F − 2 tr  (2 R )(Φ( R ) J + J Φ( R )) T  + k 2 R k 2 F from the trace deﬁnition, where trace is in v ariant under transp osition. F urther- more, it is inv ariant under switching the order, so = k Φ( R ) J + J Φ( R ) k 2 F − 4 tr ( RJ Φ( R ) + Φ( R ) J R ) + 4 k R k 2 F Note that both B and X X T b y extension hav e zero column and ro w sums, due to the cen tering pro cess. Since m ultiplying by J is summing ro ws or columns, J R = RJ = [0]. Thus the cross-term cancels: = k Φ( R ) J + J Φ( R ) k 2 F + 4 k R k 2 F . The ﬁrst term in the abov e can no w b e written as n,n X i,j =1 ( r ii + r j j ) 2 = n n X i =1 r 2 ii + 2 n X i =1 r ii !   n X j =1 r j j   + n n X j =1 r 2 j j = 2 n n X i =1 r 2 ii + 2(tr( R )) 2 . With that, the error simpliﬁes to L ( X ) = 2 n n X i =1 r 2 ii + 2(tr( R )) 2 + 4 k R k 2 F . 4 Error b ounds Although the ﬁrst term cannot b e easily simplﬁed, w e can b ound it. First note that w e can ﬁnd tr( R ) by summing the diagonal en tries. T o minimize the sum of squares, w e can set each diagonal entry equal, to tr( R ) /n , whence the sum is b ounded b elow by n × (tr( R )) 2 /n 2 , and the ﬁrst term b y 2(tr( R )) 2 . T o maximize, consider that k R k 2 F is the sum of squares of all entries, clearly not less than the sum of squares of the diagonal. W e also know that R has ro w and column sums zero. Th us to maximize the ratio betw een P n i =1 r 2 ii and k R k 2 F , let us set each diagonal entry equal (to say c ), and each oﬀ-diagonal en try equal (to − c n − 1 ). The sum we wish to b ound is nc 2 , and the upp er b ound nc 2 + n ( n − 1) c 2 ( n − 1) 2 = ( nc 2 )(1 + 1 n − 1 ), a factor of n n − 1 . Therefore the ﬁrst term is bounded ab o ve by 2( n − 1) k R k 2 F . In conclusion, w e ha ve 4(tr( R )) 2 + 4 k R k 2 F ≤ L ( X ) ≤ 2(tr( R )) 2 + (2 + 2 n ) k R k 2 F . Our bounds are tight, since the equal-diagonal, equal-oﬀ-diagonal matrix is realizable by the regular simplex, that is, set all distances to 1 ( D = J − I ). 4 After cen tering, ignore X (set every MDS p oint to ~ 0) and let B = R — then the lo wer and upp er bounds coincide. Note that (tr( R )) 2 is not the same as k R k 2 F — the ﬁrst is the square of the sum of eigen v alues, the second is the sum of the squares. In fact, let us analyze from that viewp oin t. The latter term is the ur-error minimized b y V Λ + V T . The former term squares the sum of residual eigen v alues, which means to minimize it w e hav e to leav e enough p ositiv e eigen v alues to balance out the sum (as w e cannot duplicate negativ e eigen v alues). Therefore, to minimize a w eighted sum of b oth terms, we cannot simply matc h ev ery single positive eigen v alue. How ever, w e w ould b e seriously remiss if w e failed to mention that ev en if a function can be b ounded b oth abov e and b elo w, the minimum of the function need not be b et ween the minima of the upp er and low er bounds. An example is depicted in Figure 1 . Figure 1: Graphical demonstration that f ( x ) ≤ g ( x ) ≤ h ( x ) ∀ x do es not imply argmin x f ( x ) ≤ argmin x g ( x ) ≤ argmin x h ( x ). 5 Impro v ed algorithms Lea ving that cav eat aside for no w, w e can consider simple algorithms to mini- mize these bounds. Let us treat the problem as t wo-dimensional: we can triv- ially compute (tr( R )) 2 and k R k 2 F , in terms of the eigenv alues of X = V Λ 0 1 2 , where V = [ v ij ] is the matrix of eigenv ectors of B and Λ 0 is the diagonal ma- trix of reconstructed eigenv alues. Let its en tries be λ 0 1 , λ 0 2 , . . . , λ 0 n ≥ 0. Then (tr( R )) 2 = ( P n i =1 λ i − λ 0 i ) 2 and k R k 2 F = P n i =1 ( λ i − λ 0 i ) 2 , whic h is en tirely in terms of the λ 0 i s. W e solve this with a marginalization pro cedure. Supp ose we ﬁx (tr( R )) 2 , or equiv alen tly ﬁx P n i =1 λ 0 i . How would w e minimize k R k 2 F ? Remember that w e are summing up the squares of the unmatched eigen v alues: thus the minim um is when we equalize the maximal unmatched terms. That is, ﬁx some c ∈ R , 5 and pic k λ 0 i = ( λ i − c, λ i ≥ c 0 , λ i < c . This solution essentially caps all eigenv alues of R at c , leaving lesser ones un- c hanged — by monotonicity there exists a unique c . No w note that the b ounds for L ( X ) can be written as a piecewise quadratic function in c (as tr( R ) c hanges linearly in c b et ween diﬀeren t λ i ), and this can b e easily solved in each piece. Giv en a candidate solution X , w e can compute the true v alue of L ( X ) in quadratic time: testing the O ( n ) candidates and picking the best one tak es cubic time, whic h is the same complexit y as diagonalizing B in the ﬁrst place. Hence the algorithm takes no longer asymptotically than the usual one. F urthermore, as the space of solutions considered includes the original ( c = 0 matches all p ositiv e eigen v alues), this solution clearly dominates the original. W e outline a second algorithm that a voids the complications illustrated in Figure 1. W e can directly write out P n i =1 r 2 ii as a function of the λ 0 i s: r ii = ( B − V Λ 0 V T ) ii = b ii − n X k =1 λ 0 k v 2 ik where b ii , v 2 ik are constan ts found from diagonalizing B . Hence P n i =1 r 2 ii is a quadratic function of all the λ 0 i s, along with (tr( R )) 2 and k R k 2 F whic h we ha ve treated abov e. Rephrasing, w e wish to minimize L ( X ( λ 0 1 , λ 0 2 , . . . , λ 0 n )), a quadratic function, under the constraint that λ 0 i ≥ 0. This is a quadratic program. Not only that, it is relativ ely easy to show that L ( X ), as a sum of squares, will be a con vex quadratic program, whence a cubic algorithm exists to solv e it [ 16 ]. Again, the space of feasible solutions includes the original, and in cubic (that is, asymptotically equal) time, we hav e an algorithm that solves classical MDS and dominates the original in least- squares error. T o facilitate implemen tation, let us directly expand L ( X ). T ypically , the function is written in the form 1 2 ~ x T Q~ x + ~ c · ~ x + a , where Q is symmetric, and the constan t a is immaterial to the optimization and usually left out. L ( X ) = 2 n n X i =1 r 2 ii + 2(tr( R )) 2 + 4 k R k 2 F = 2 n n X i =1 b ii − n X k =1 λ 0 k v 2 ik ! 2 + 2 n X i =1 λ i − n X i =1 λ 0 i ! 2 + 4 n X i =1 ( λ i − λ 0 i ) 2 6 = 2 n n,n X i,j =1 n X k =1 v 2 ki v 2 kj ! λ 0 i λ 0 j − 4 n n X i =1 n X k =1 b kk v 2 ki ! λ 0 i + 2 n n X i =1 b 2 ii + 2 n X i =1 λ 0 i ! 2 − 4 n X i =1 n X k =1 λ k ! λ 0 i + 2 n X i =1 λ i ! 2 + 4 n X i =1 λ 0 i 2 − 8 n X i =1 λ i λ 0 i + 4 n X i =1 λ 2 i = 2 n ( ~ λ 0 ) T W T W ~ λ 0 − 4 nW T diag( B ) · ~ λ 0 + 2 n n X i =1 b 2 ii + 2( ~ λ 0 ) T J ~ λ 0 − 4 tr( B ) ~ 1 · ~ λ 0 + 2 (tr( B )) 2 + 4( ~ λ 0 ) T I ~ λ 0 − 8 diag(Λ) · ~ λ 0 + 4 k B k 2 F where W = [ v 2 ij ] is the en try-wise square of the eigenv ectors matrix V , diag( B ) is the v ectorized diagonal of B , ~ 1 is the vector of all ones, diag(Λ) is the v ec- tor of eigen v alues of B in order, and ~ λ 0 are our reconstructed eigen v alues (the parameters of solution). Therefore, w e ha ve Q = 4 nW T W + 4 J + 8 I ~ c = − 4 nW T diag( B ) − 4 tr( B ) ~ 1 − 8 diag (Λ) a = 2 n n X i =1 b 2 ii + 2 (tr( B )) 2 + 4 k B k 2 F and after solving for ~ λ 0 , our Λ 0 is the diagonal matrix of ~ λ 0 , and the MDS pro jected p oin ts are giv en b y V Λ 0 1 2 . The only remaining issue is that this presumes the solution should ha ve the same eigenv ectors as B , whic h is no longer necessarily true as it was for L 0 ( X ). W e lea ve this as an open question for future work. 6 Example calculations W e provide an example from real-world usage to demonstrate that the preceding is no mere theoretical problem. W e ha ve a distance matrix of size 296 × 296, as computed following [ 23 ], using the updated algorithm in [ 24 ]. Other than taking the dataset and noting that it is explicitly kno wn to b e non-Euclidean, w e b ear no connection to that w ork. W e go through the motions of running classical MDS on this dataset, and tabulate a v ariety of statistics on the output in T able 1 . Aside from the eigen v al- ues in decreasing order, w e will sho w the normalized least-squares error, which can be directly interpreted as the ro ot mean square error on the distances them- selv es. F or a sense of scale, the input distances are b ounded ab o ve by 1, and ha ve a mean of 0 . 288577 and RMS of 0 . 323568. All follo wing algorithms are forced to consider only the top n eigenv ectors ( λ 0 i ≡ 0 for i > n ): the standard algorithm copies the top n p ositiv e eigen v alues; 7 w e can optimize the upper and lo wer bounds in terms of c , the eigen v alue cutoﬀ, then take that as a solution; w e also presen t the quadratic programming solution. F or comparison, w e also presen t the b est SMA COF solution in n dimensions o ver 100 random initial conﬁgurations. SMACOF is an iterative algorithm that minimizes the least-squares error on the non-squared distances [ 5 ]. The ﬁrst ma jor insigh t is that ev en though there are 103 positive eigenv alues, using an y more than just the top 5 eigenv alues starts incr e asing the error ev en as more dimensions are used. In fact, using all 103 is worse than simply using 3! This extremely small num ber, compared to the num b er of p ositiv e eigen v alues, sho ws that the problem is of clear practical relev ance. Note that b oth the upp er and low er bounds are uniformly b etter than the standard solution: the cutoﬀ starts oﬀ negative, as when forced to use insuf- ﬁcien t dimensions, it is gainful to inten tionally ov erscale the eigenv ectors to increase the a verage distances. The upper b ound fares m uch w orse as the 2 + 2 n factor on the k R k 2 F term means it is hardly diﬀeren t from the original problem and thus its error ev entually bo omerangs; how ever it uses only the top 49 eigen- v ectors. The lo w er bound impro ves the best solution by 14% and uses just 10 comp onen ts. The quadratic programming solution b eats them all, improving b y 18%. P eculiarly , it refuses to use the 11th, 13th, and 15–18th eigen vectors, and none of the p ositiv e ones after the 20th. Ev en more w eirdly , it somehow uses the 246th and 258th eigenv ectors and only those, ev en though their eigen v alues are negativ e! This is not a numerical artifact: their reconstructed eigen v alues are 0 . 003882 and 0 . 006092; the error is also clearly decreasing. The SMA COF algorithm technically optimizes a diﬀerent ob jective, and hence the comparison in L ( X ) terms is not completely fair. Even then, it manages to outp erform the quadratic program in 2–4 dimensions. When more dimensions are used, the com binatorial explosion causes the algorithm to get stuc k in local minima and impro vemen t ev entually v anishes. W e do not attempt to run it for n > 21 dimensions. Th us under the standard algorithm, the data seems to b e 5-dimensional; using the lo wer b ound, an impro ved ﬁt is found with 10 dimensions, and with the optimal quadratic programming metho d, 16 dimensions are used. These are maximal ﬁts, in the sense that ev en if allow ed extra dimensions, the metho ds will not use them. T o close, w e note that tw o of the most widely used statistical softw are pack- ages are una ware of this problem. MA TLAB [ 15 ] ac knowledges that in the presence of signiﬁcant negativ e eigen v alues, its choice of using all positive eigen- v ectors may yield a p oor solution. R [ 18 ] provides sev eral options, including using a minimal additive constant from [ 2 ] to make the distance matrix Eu- clidean. Ho wev er, it still follows the work of [ 14 ] and hence also uses all p ositiv e eigen vectors b y default. 8 Supplemen tary Materials: D2ST.dist: a non-squared distance matrix of size 296 × 296, giv en in up- p er triangular order, listing d 1 , 1 , d 1 , 2 , . . . , d 1 , 296 , d 2 , 3 , . . . (total 43660 entries); the num b ers are stored in IEEE 754 double precision format. Av ailable from arXiv → do wnload → source. Ac kno wledgemen ts Ra jesh Pereira ac knowledges the support of a Natural Sciences and Engineering Council of Canada Discov ery Gran t. References [1] Ingwer Borg and P atrick J. F. Gro enen. Mo dern Multidimensional Sc aling: The ory and Applic ations . Springer Series in Statistics. Springer, New Y ork, NY, second edition, August 2005. ISBN 0-387-25150-2, OCN 318297114 . [2] F rancis Cailliez. “The analytical solution of the additive constant problem”. Psychometrika 48(2):305–308, June 1983. ISSN 0033-3123. doi: 10.1007/ BF02294026 . [3] Charles S. Cleeland, Y oshio Nak amura, Tito R. Mendoza, Katherine R. Edw ards, Jeﬀ Douglas, and Ronald C. Serlina. “Dimensions of the impact of cancer pain in a four coun try sample: new information from multidi- mensional scaling”. Pain 67(2-3):267–273, October 1996. ISSN 0304-3959. doi: 10.1016/0304- 3959(96)03131- 4 . [4] T revor F. Co x and Michael A. A. Cox. Multidimensional Sc aling , v olume 88 of Mono gr aphs on statistics and applie d pr ob ability . Chapman & Hall/CRC, Bo ca Raton, FL, second edition, September 2000. ISBN 1-58488-094-5, OCN 44728137 . [5] Jan de Leeu w. “Applications of con vex analysis to multidimensional scal- ing”. In Jean R. Barra, F. Bro deau, G. Romier, and B. v an Cutsen (editors), R e c ent Developments in Statistics , pp. 133–146. North-Holland Press, Amsterdam, Netherlands, 1977, ISBN 0-7204-0751-6. URL http: //citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.2432 . [6] Jan de Leeuw. “Conv ergence of the ma jorization metho d for multidimen- sional scaling”. Journal of Classiﬁc ation 5(2):163–180, September 1988. ISSN 0176-4268. doi: 10.1007/BF01897162 . [7] Karl J. F riston, Chris D. F rith, Paul Fletc her, Peter F. Liddle, and Ric hard S. J. F rac ko wiak. “F unctional top ograph y: Multidimensional scaling and functional connectivity in the brain”. Cer ebr al Cortex 5(2):156–164, Octo- b er 1996. ISSN 0034-6667. doi: 10.1093/cercor/6.2.156 . 9 [8] Christophe Hourdin, G´ erard Charb onneau, and T arek Moussa. “A m ul- tidimensional scaling analysis of musical instruments’ time-v arying spec- tra”. Computer Music Journal 21(2):40–55, Summer 1997. ISSN 0148- 9267. URL http://www.jstor.org/stable/3681107 . [9] Xiang Ji and Hongyuan Zha. “Sensor p ositioning in wireless ad-ho c sen- sor net works using multidimensional scaling”. In INFOCOM 2004. 23r d A nnual Joint Confer enc e of the IEEE Computer and Communic ations So- cieties , pp. 2652–2661, March 2004. ISBN 0-7803-8355-9. doi: 10.1109/ INFCOM.2004.1354684 . [10] Norm C. Kenk el and Laszlo Orl´ oci. “Applying metric and nonmetric mul- tidimensional scaling to ecological studies: Some new results”. Ec olo gy 67(4):919–928, August 1986. ISSN 0012-9658. doi: 10.2307/1939814 . [11] Nathan Krislo c k and Henry W olko wicz. “Euclidean distance matrices and applications”. In Miguel F. Anjos and Jean B. Lasserre (editors), Hand- b o ok on Semideﬁnite, Conic and Polynomial Optimization , volume 166 of International Series in Op er ations R ese ar ch & Management Scienc e , pp. 879–914. Springer, New Y ork, NY, 2012, ISBN 978-1-4614-0768-3. doi: 10.1007/978- 1- 4614- 0769- 0_30 . [12] Joseph B. Krusk al. “Multidimensional scaling by optimizing go odness of ﬁt to a nonmetric h yp othesis”. Psychometrika 29(1):1–27, March 1964. ISSN 0033-3123. doi: 10.1007/BF02289565 , URL http://repub.eur.nl/ pub/1274/ei200415.pdf . [13] Enrique P . Lessa. “Multidimensional analysis of geographic genetic struc- ture”. Systematic Biolo gy 39(3):242–252, September 1990. ISSN 1063-5157. doi: 10.2307/2992184 . [14] Kanti V. Mardia. “Some prop erties of classical multi-dimensional scaling”. Communic ations in Statistics — The ory and Meth- o ds 7(13):1233–1241, July 1978. ISSN 0361-0926. doi: 10.1080/ 03610927808827707 , URL http://www1.maths.leeds.ac.uk/ ~ sta6kvm/ reprints/CommunStatTheo1978.pdf . [15] MathWorks. “Classical multidimensional scaling — MA TLAB cmd- scale”, Septem b er 2013. URL http://www.mathworks.com/help/stats/ cmdscale.html . [16] Renato D. C. Monteiro and Ilan Adler. “Interior path following primal- dual algorithms. Part I I: Conv ex quadratic programming”. Mathematic al Pr o gr amming 44(1-3):43–66, May 1989. ISSN 0025-5610. doi: 10.1007/ BF01587076 . [17] I. Colin Pren tice. “Multidimensional scaling as a research to ol in quaternary palynology: A review of theory and metho ds”. R eview of Palae ob otany and Palynolo gy 31:71–104, 1980. ISSN 0034-6667. doi: 10.1016/0034- 6667(80) 90023- 8 . 10 [18] The R Core T eam. “R: A Language and Environmen t for Statisti- cal Computing, Reference Index”, September 2013. URL http://cran. r- project.org/doc/manuals/r- release/fullrefman.pdf . [19] Sandra L. Robinson and Reb ecca J. Bennett. “A typology of deviant w ork- place behaviors: A m ultidimensional scaling study”. A c ademy of Manage- ment Journal 38(2):555–572, April 1995. ISSN 0001-4273. doi: 10.2307/ 256693 . [20] Isaac J. Sc ho en b erg. “Remarks to Maurice Fr´ echet’s article ‘Sur la deﬁni- tion axiomatique d’une classe d’espace distanci ´ es vectoriellemen t applicable sur l’espace de Hilbert”’. A nnals of Mathematics 36:724–732, July 1935. ISSN 0003-486X. URL http://www.jstor.org/stable/1968654 . [21] Michael J. Subko viak. “The use of multidimensional scaling in educational researc h”. R eview of Educ ational R ese ar ch 45(3):387–423, Summer 1975. ISSN 0034-6543. doi: 10.3102/00346543045003387 . [22] W arren S. T orgerson. The ory and metho ds of sc aling . Wiley , Oxford, UK, Marc h 1958. ISBN 0-471-87945-2, OCN 190692 . [23] Jeﬀrey Tsang. “The parametrized probabilistic ﬁnite-state transducer prob e game pla yer ﬁngerprint mo del”. IEEE T r ansactions on Computa- tional Intel ligenc e and AI in Games 2(3):208–224, September 2010. ISSN 1943-068X. doi: 10.1109/TCIAIG.2010.2062512 , arXiv: 1401.7406 . [24] Jeﬀrey Tsang. “The structure of a 3-state ﬁnite transducer represen tation for Prisoner’s Dilemma”. In 2013 IEEE Confer enc e on Computational In- tel ligenc e in Games , pp. 307–313, August 2013. ISBN 978-1-4673-5308-3. doi: 10.1109/CIG.2013.6633638 . [25] Mathura S. V enk atara jan and W erner Braun. “New quan titative descrip- tors of amino acids based on m ultidimensional scaling of a large n um b er of ph ysical-chemical prop erties”. Journal of Mole cular Mo del ling 7(12):445– 453, Decem b er 2001. ISSN 0949-183X. doi: 10.1007/s00894- 001- 0058- 5 . 11 Ro ot mean square error on d 2 ij , 100 × p L ( X ) /n ( n − 1) # Eigen v alue Standard Upp er(Cutoﬀ ) Lo wer(Cutoﬀ ) CQP SMA COF 1 12 . 870775 2 . 20519329 2 . 20178706(–0 . 00863) 2 . 14970170(–1 . 28598) 2 . 05734225 2 . 21289091 2 1 . 037833 1 . 45592702 1 . 45178684(–0 . 00513) 1 . 31937723(–0 . 51137) 1 . 29080402 1 . 26708425 3 0 . 841390 0 . 89622191 0 . 89409993(–0 . 00231) 0 . 83180580(–0 . 17318) 0 . 82418493 0 . 78480269 4 0 . 476908 0 . 65701404 0 . 65680429(–0 . 00072) 0 . 65854993(–0 . 04316) 0 . 65147324 0 . 62506041 5 0 . 383134 0 . 55749502 0 . 55692006(0 . 000554) 0 . 53853137(0 . 027886) 0 . 52684668 0 . 53871108 6 0 . 240600 0 . 56075204 0 . 55774720(0 . 001346) 0 . 48723979(0 . 058274) 0 . 47279706 0 . 49638691 7 0 . 124567 0 . 60165224 0 . 59623265(0 . 001751) 0 . 48338972(0 . 066560) 0 . 46840905 0 . 49234572 8 0 . 095117 0 . 63479317 0 . 62693049(0 . 002058) 0 . 48102608(0 . 069733) 0 . 46517676 0 . 49090814 9 0 . 090882 0 . 67155736 0 . 66077926(0 . 002348) 0 . 47911194(0 . 071848) 0 . 46186658 0 . 49012857 10 0 . 083647 0 . 70978791 0 . 69587863(0 . 002613) 0 . 47876356(0 . 072921) 0 . 46080720 0 . 48968884 11 0 . 066470 0 . 74715742 0 . 72995180(0 . 002820) (same) (same) 0 . 48925221 12 0 . 060986 0 . 77286918 0 . 75259874(0 . 003008) (same) 0 . 45690971 0 . 48943608 13 0 . 054237 0 . 80406322 0 . 78039058(0 . 003174) (same) (same) 0 . 48955872 14 0 . 045480 0 . 82707314 0 . 80020239(0 . 003310) (same) 0 . 45622298 0 . 48961516 15 0 . 042569 0 . 85124591 0 . 82100872(0 . 003435) (same) (same) 0 . 48968960 16 0 . 039933 0 . 87451480 0 . 84080891(0 . 003552) (same) (same) 0 . 48972780 17 0 . 036081 0 . 89475692 0 . 85769606(0 . 003656) (same) (same) 0 . 48975081 18 0 . 035816 0 . 91614215 0 . 87538331(0 . 003758) (same) (same) 0 . 48973887 19 0 . 028725 0 . 93216642 0 . 88808096(0 . 003837) (same) 0 . 45594512 0 . 48978879 20 0 . 025852 0 . 94636821 0 . 89905662(0 . 003906) (same) 0 . 45580692 0 . 48982703 T able 1: Ro ot mean square results for classical MDS p erformed using v arious methods, restricted to using only the top n eigen vectors. Standard: copy all eigen v alues; Upp er: minimize the upp er bound; Cutoﬀ: the c v alue used; Low er: minimize the lo wer b ound; CQP: optimize the least-squares quadratic program. SMACOF: use the iterativ e stress ma jorization algorithm in n dimensions. (same) means the solution is exactly the same as for n − 1 dimensions. 12 Ro ot mean square error on d 2 ij , 100 × p L ( X ) /n ( n − 1) # Eigen v alue Standard Upp er(Cutoﬀ ) Lo wer(Cutoﬀ ) CQP SMA COF 21 0 . 024536 0 . 96135228 0 . 91063699(0 . 003971) (same) (same) 0 . 48983074 . . . . . . . . . . . . . . . . . . . . . 49 0 . 004537 1 . 14058061 1 . 00324282(0 . 004497) (same) (same) N/A 50 0 . 004050 1 . 14299177 (same) (same) (same) N/A . . . . . . . . . . . . . . . . . . . . . 103 0 . 000018 1 . 18422460 (same) (same) (same) N/A 104 0 . 000000 (same) (same) (same) (same) N/A 105 –0 . 000035 (same) (same) (same) (same) N/A . . . . . . . . . . . . . . . . . . . . . 245 –0 . 004660 (same) (same) (same) (same) N/A 246 –0 . 004816 (same) (same) (same) 0 . 45578827 N/A 247 –0 . 004953 (same) (same) (same) (same) N/A . . . . . . . . . . . . . . . . . . . . . 257 –0 . 006288 (same) (same) (same) (same) N/A 258 –0 . 006408 (same) (same) (same) 0 . 45572899 N/A 259 –0 . 006525 (same) (same) (same) (same) N/A . . . . . . . . . . . . . . . . . . . . . 296 –0 . 387203 (same) (same) (same) (same) N/A T able 1 (con tinued) : Ro ot mean square results for classical MDS p erformed using v arious metho ds, restricted to using only the top n eigen vectors. Standard: cop y all eigen v alues; Upper: minimize the upp er b ound; Cutoﬀ: the c v alue used; Lo wer: minimize the lo w er b ound; CQP: optimize the least-squares quadratic program. SMACOF: use the iterativ e stress ma jorization algorithm in n dimensions. (same) means the solution is exactly the same as for n − 1 dimensions. 13

Taking all positive eigenvectors is suboptimal in classical multidimensional scaling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment