Empirical Evaluation of Four Tensor Decomposition Algorithms

Empirical Evaluation of F our T ensor Decomposition Algorithms Pe ter D. T urney Institute for Information T echnology National Research Council of Canada M-50 Montreal Road Ottawa, Ontario, Canada K1A 0R6 peter .turney@nrc-c nrc.gc.ca T echnical Report ERB-1152, NRC-49877 November 12, 2007 Abstract Higher -order tensor de compositions are analog ous to the familia r Singular V alue De- composi tion (SVD), b ut they tran scend the limitatio ns of matrices (second- order ten- sors). SVD is a p owerful t ool that has achie ved impressi ve results in informatio n retrie v al, collab orativ e ﬁltering, computa - tional lin guistics, computa tional visio n, and other ﬁelds. Howe ver , SVD is limited to two-d imensional a rrays of da ta ( two modes), and many potential appl ications ha ve three or more modes , w hich requ ire higher -order tensor decompositi ons. This paper ev alu- ates four algorith ms for higher -order ten- sor decompositio n: Higher -Order Singul ar V alue Decompos ition (HO-SVD), Higher - Order O rthog onal Iteration (HOOI), S lice Projectio n (SP), and Multislice Projectio n (MP). W e measure the time (elap sed run time), space (RAM and disk space requi re- ments), and ﬁt (tensor recon struction accu- racy ) of the four algorithms , under a v ari- ety of conditi ons. W e ﬁnd that standard im- plementa tions of HO-SVD and HO OI do not scale up to lar ger tens ors, due to increasing RAM requiremen ts. W e recommend HOOI for tensors that are small enough for the a vaila ble RA M an d MP for lar ger tensor s. 1 Intr oduction Singular V alue Decompositi on (SVD ) is gro wing increa singly popular as a tool for the analy sis of two-d imensional arrays of data , due to its succe ss in a wide vari ety of applic ations, such as informa- tion retrie v al (Deerwester et al., 1990), collabor a- ti ve ﬁlteri ng (Billsus and Pazz ani, 1998), compu- tation al ling uistics (Sch ¨ utze, 1998), computational vision (Brand, 2002 ), and genomics (Alter et al., 2000) . SVD is limited to two-dimens ional arrays (matrices or second-orde r tensors ), b ut many appli- cation s require high er-di mensional arrays, kno wn as higher -orde r tensors. There are sev eral higher -order tensor decompo- sition s, analogo us to SV D, that are able to cap- ture higher -order struct ure that cannot be mode led with two dimen sions (two mode s). High er-ord er genera lizations of SVD include H igher -Order S in- gular V alue Decomp osition (HO-SVD ) (De Lath- auwer e t al., 20 00a), T ucker d ecomposition (T uck er , 1966) , and P ARAF AC (parallel facto r analysi s) (Harshman, 1970 ), w hich is also known as CA N- DECOMP (canonica l decompositio n) (Carrol l and Chang, 1970). Higher -order tensors quickly become unwieldy . The number of elements in a matrix increases quadra tically , as the product of the numbe r of ro ws and columns, b ut the number of elements in a third- order tensor increases cubically , as a product of the number of rows, columns, and tubes . Thus there is a need for tensor decomposition algorith ms that can handle lar ge tensor s. In this paper , we ev aluate four algori thms for higher -order tensor decompositio n: Higher - Order Singular V alue Decomposition (HO-SVD ) (De Lathau wer et al., 200 0a), H igher -Order Orthog- onal Iterat ion (HOOI) (De Lathau wer et al., 2000b ), Slice Projectio n (SP) (W ang and A huja, 2005), and Multislic e Projectio n (MP) (introduced here). Our main concern is the ability of the four algorithms to scale up to lar ge tensors. In Section 2, we moti vate this work by listing some of the applica tions for higher -order tensors. In any ﬁeld where SVD has been useful, there is likel y to be a third or fourth mode that has been ignored , becaus e SVD only handles two modes. The tensor notation we use in this pape r is pre- sented in S ection 3. W e follo w the notational con- ven tions of K olda (2006). Section 4 presen ts the four algorithms, HO -SVD, HOOI, SP , and MP . For HO-SVD and HO OI, we used the implementation s giv en in the MA TLAB T ensor T ool box (Bader and Kold a, 2007a; Bader and K olda, 2007b ). For SP and MP , we created our own MA TL AB impleme ntations. Our implementa tion of MP for thir d-order tensors is gi ven in the Append ix. Section 5 pres ents our empirical ev alua tion of the four ten sor decompositio n algori thms. In the e x- periment s, we measure the time (elapsed run time), space (RAM and disk space requirements ), and ﬁt (tenso r reconstru ction accuracy ) of the four algo- rithms, under a v ariety of conditions . The ﬁrst group of ex periments looks at ho w the algori thms scale as the input tenso rs gro w increa s- ingly larg er . W e test the alg orithms with random sparse third-or der tensors as input. HO-SVD and HOOI exceed the av ailable RAM when gi ven large r tensor s as input, bu t SP and MP are able to process lar ge tensor s with low RAM usage and good speed. HOOI provide s the best ﬁt, follo wed by MP , then S P , and lastly HO-SVD. The second group of e xperiments examines the sensit ivity of the ﬁt to the balan ce in the ratio s of the c ore sizes (deﬁne d in Section 3). The algori thms are tested with random sparse third-ord er tensor s as input. In general, the ﬁ t of the four algorit hms fol- lo ws the same pattern as in the ﬁrst group of expe r- iments (HOOI gi ves the best ﬁt, then MP , SP , and HO-SVD), b ut we observ e that SP is particular ly sensit ive to unbalan ced ratios of the core sizes. The third group explore s th e ﬁt with varyi ng ratios between the size of the input tenso r and the size of the core ten sor . For th is group, w e mo ve from third - order tensors to fourth-ord er tensors . The algorithms are tested with random fourth -order tenso rs, with the input te nsor size ﬁx ed while the cor e sizes v ary . The ﬁt of the algorit hms follo ws the same pattern as in the pre vious two groups of ex periments, in spite of the mov e to fourth-o rder tensors . The ﬁnal group measures the perfo rmance with a real (nonra ndom) tensor that was gen erated for a task in computati onal linguistic s. The ﬁt follo ws the same patt ern as in the pre vious three grou ps of ex- periment s. Furthermore, the dif ferences in ﬁt are re- ﬂected in the performance on the gi ven task. This exp eriment valid ates the use of rand om tensors in the pre vious three groups of experi ments. W e conclude in Section 6. There are tra deoff s in time, space, and ﬁt for the four algorithms, such that there is no absolute winner among the four algo- rithms. The choice will depend on the time, spac e, and ﬁt requirements of the giv en a pplication . If good ﬁt is t he primary concern , w e rec ommend H OOI fo r smaller tensors that can ﬁt in the av ailabl e RAM, and MP for lar ger tensor s. 2 Ap plications A good surve y of applications for tensor decompo- sition s for data analysis is Acar and Y ener (2007) , which lists se veral applications , including ele ctroen- cephal ogram (EEG ) data analys is, spectra l analysis of chemical mixtures , co mputer vision , and social netwo rk analys is. K olda (20 06) also lists v arious applic ations, such as psychomet rics, image ana lysis, graph analysi s, and signal processing . W e belie ve that a natu ral place to look for appli- cation s for tensor decompo sitions is wherev er SVD has pro ven useful. W e hav e grown accus tomed to thinki ng of data in terms of two-dimen sional tables and matrices; in terms of what we can handle with SVD. Howe ver , real applicati ons often hav e m any more modes, which we ha ve been ignoring. In informatio n retrie val (Deerwester et al., 1990) , SVD is typically applied to a term × docu ment ma- trix, where each ro w repres ents a word and each col- umn represe nts a documen t in the collection . An element in the m atrix is a weight that represents the importan ce of the giv en w ord in the gi ven docu ment. SVD smoot hes the weights, so that a docu ment d will hav e a n onzero weight for a word w if d is simi- lar to other do cuments that contain t he word w , e ven if d does not contai n actually con tain w . T hus a search for w will retur n the document d , thanks to the smoothin g effe ct of SVD. T o e xtend the term-doc ument matrix to a third - order tensor , it would be natural to add informat ion such as autho r , dat e of publication , citation s, and ven ue (e.g., the name of the confere nce or journal). For ex ample, Dunla vy et al. (2006 ) used a tensor to combin e informat ion from abstr acts, tit les, key- words , authors, and citatio ns. Chew et al. (2007) a p- plied a tensor decompositio n to a term × document × langua ge tenso r , for cross- language info rmation retrie v al. Sun et al. (2006) analyz ed an author × ke ywor d × date tensor . In collabor ativ e ﬁltering (Billsus and Pazzani , 1998) , S VD is usually applie d to a user × item ma- trix, in which each ro w re presents a perso n and ea ch column represent an item, such as a movie or a book. An element i n the matrix is a rating by the giv en us er for the gi ven item. Most of the elements in the ma- 2 trix are missing, becaus e each user has only rated a fe w items. W hen a zero element represent s a miss- ing rating, SVD can be used to gue ss the missin g rating s, based on the nonzero elements. The user- item matrix could be extended to a third- order tensor b y a dding a var iety of informatio n, such as the words use d to describe an item, the wo rds used to describe the interest s of a user , the price of an item, the geogra phical location of a user , and the age of a user . For exampl e, Mahone y et al. (2006) and Xu et al. (2006) applied tensor decomposit ions to colla borati ve ﬁltering. In computation al lingui stics, SVD is often applied in semantic space m odels of word meaning. For ex- ample, L andau er an d Dumais (1997) applied S VD to a word × document matrix, achie ving human-le vel scores on multiple -choice synon ym ques tions from the TOEFL test. T urney (200 6) appli ed SVD to a wor d-pair × pattern matrix, reachin g human-l ev el scores on multiple-ch oice analogy ques tions from the SA T test. In our recent work, we ha ve begun explor ing ten- sor decomposition s for semantic space models. W e are curren tly de velop ing a wor d × pat tern × word tensor that can used for both synon yms and analo- gies. T he experi ments in Section 5.4 e valu ate the four tensor decomposit ion algorith ms using this ten- sor to answer multipl e-choice TOEFL questio ns. In computat ional vision, S VD is often applied to image anal ysis (Bran d, 2002). T o wo rk with the two-mod e constraint of SVD , an image, which is natura lly two-dimensio nal, is mapped to a v ector . For exa mple, in face recognitio n, SVD is applied to a face × image -vector matrix, in which each ro w is a ve ctor that encodes an image of a person ’ s fa ce. W ang and Ahuja (2005 ) pointe d out that this two- mode approa ch to image analysis is ignoring essen- tial higher -mode structure in the data. The ex peri- ments in W ang and Ahuja (2005) demonstra te that higher -orde r tensor decomposit ions can be much more ef fectiv e than S VD. In summary , wherev er SVD has been useful, w e exp ect there are higher -order m odes that ha ve been ignore d. W ith algorithms that can decompose lar ge tensor s, it is no longer nec essary to ignore thes e modes. 3 Notation This pap er follows the notatio nal con ve ntions of K olda (200 6). T ensors of ord er three or highe r are represented by bold script letters, X . Matrices (secon d-order tensors) are denot ed by bold capital letters , A . V ec tors (ﬁrst-orde r tensors) are den oted by bold lo wercase letters, b . S calars (zero- order ten- sors) are repres ented by lowercase ital ic letters, i . The i -th element in a vect or b is indicated by b i . The i -th ro w in a matrix A is denote d by a i : , the j -th column is gi ven by a : j , an d the element in ro w i and column j is represen ted by a ij . A third-o rder tensor X has ro ws, columns, and tubes . The element in ro w i , column j , and tube k is represe nted by x ij k . T he ro w vector that con- tains x ij k is deno ted by x i : k , the column vect or is x : j k , and the tube vect or is x ij : . In general, the vec- tors in a tens or (e.g., the rows, columns, and tubes in a third- order tenso r) are called ﬁber s . There are no special na mes (b eyond rows, columns, an d tube s) for ﬁbers in tensor s of order four and higher . A third-orde r tensor X contains matrices, called slices . The horizon tal, lateral, and frontal slices of X are repr esented by X i :: , X : j : , and X :: k , respe ctiv ely . The concept of s lices also applies to tensors of order four and highe r . An index i rang es from 1 to I ; that is, the upper bound on the range of an index is gi ven by the upper - case form of the index letter . Thus the size of a ten- sor X is denoted by upperca se scalars, I 1 × I 2 × I 3 . There a re sev eral kinds o f tensor produ cts, but we only need the n -mode product in this paper . The n -mode produ ct of a tensor X and a matri x A is written as X × n A . Let X be of size I 1 × I 2 × I 3 and let A be of size J 1 × J 2 . The n -mode prod- uct X × n A multiplies ﬁbers in mode n of X with ro w vectors in A . Therefore n -mode multiplicatio n requir es that I n = J 2 . The result of X × n A is a tensor with the same o rder (same number of modes) as X , but with the size I n replac ed by J 1 . For e x- ample, the resul t of X × 3 A is of size I 1 × I 2 × J 1 , assuming I 3 = J 2 . Let X be an N -th order tensor of si ze I 1 × . . . × I N and let A be a matrix of size J × I n . Suppos e that Y = X × n A . Thus Y is of size I 1 × . . . × I n − 1 × J × I n +1 × . . . × I N . The elemen ts of Y are deﬁned as follo ws: y i 1 ...i n − 1 j i n +1 ...i N = I n X i n =1 x i 1 ...i N a j i n (1) The transpos e of a matrix A is writ ten as A T . W e may thin k of the clas sical matrix produ ct AB as a specia l case of n -mode pro duct: AB = A × 2 B T = B × 1 A (2) 3 Fibers in mode two of A (ro w vector s) are multi- plied with ro w vectors in B T , which are column vec- tors (mode one) in B . A tensor X can be unfolde d into a matrix, which is called matricizat ion . The n -mode matriciz ation of X is written X ( n ) , an d is forme d by tak ing the mode n ﬁbers of X and making them column vec tors in X ( n ) . Let X be a tensor of size I 1 × I 2 × I 3 . The one-mod e matricizat ion X (1) is of s ize I 1 × ( I 2 I 3 ) : X (1) = [ x :11 x :21 . . . x : I 2 I 3 ] (3) = [ X ::1 X ::2 . . . X :: I 3 ] (4) Similarly , the two-mode matricizat ion X (2) is of size I 2 × ( I 1 I 3 ) : X (2) = [ x 1:1 x 2:1 . . . x I 1 : I 3 ] (5) = [ X T ::1 X T ::2 . . . X T :: I 3 ] (6) Note that Y = X × n A if and only if Y ( n ) = AX ( n ) . Thus n -mode matricizati on relates the classical ma- trix product to the n -mode tenso r pro duct. I n the specia l case of second -order tensors, C (1) = C and C (2) = C T , hence C = B × 1 A if and only if C = AB . Like wise C = B × 2 A if and only if C T = AB T . Let G be a tens or of size J 1 × . . . × J N . Let A (1) , . . . , A ( N ) be matrices such that A ( n ) is of size I n × J n . The T uc ker opera tor is deﬁned as follo ws (K olda, 2006): J G ; A (1) , A (2) , . . . , A ( N ) K ≡ G × 1 A (1) × 2 A (2) . . . × N A ( N ) (7) The result ing tensor is of size I 1 × . . . × I N . Let X be a tensor of si ze I 1 × . . . × I N . The T uc ker decompo sition of X has the follo wing for m (T ucker , 1966; K olda, 2006): X ≈ J G ; A (1) , . . . , A ( N ) K (8) The tensor G is called the cor e of the decompo sition. Let G be of si ze J 1 × . . . × J N . Each matrix A ( n ) is of size I n × J n . The n -rank of a tensor X is the rank of the m a- trix X ( n ) . For a secon d-order tensor , the one-rank necess arily equals the two -rank, but this is not true for high er-or der tensors. If J n is equal to th e n -rank of X for each n , then it is possible for the T ucke r decompo sition to exact ly equa l X . In general, we want J n less than the n -rank of X for each n , yield- ing a core G that has lower n -ranks than X , analo- gous to a truncated (thin) SVD. In the special case of a second-o rder tensor , the T ucker decompositi on X ≈ J S ; U , V K is equ iv alent to the thin SVD, X ≈ USV T . Suppose we hav e a tenso r X and its T uck er de- composi tion ˆ X = J G ; A (1) , . . . , A ( N ) K , such that X ≈ ˆ X . In the ex periments in S ection 5, we mea- sure the ﬁt of the d ecomposition ˆ X to the origina l X as follo ws: ﬁt( X , ˆ X ) = 1 −    X − ˆ X    F k X k F (9) The Frobenius norm of a tensor X , k X k F , is the square root of the sum of the abs olute square s of its elements. The ﬁ t is a normalized m easure of the error in reconstru cting X from its T ucker decompo- sition ˆ X . When X = ˆ X , the ﬁt is 1; otherwise, it is less than 1, and it may be negati ve when the ﬁt is particu larly poor . The equi va lence betwee n th e n -mode tensor prod- uct and the classical matrix product with n -mode matriciza tion sugge sts that tensors might be merely a new notation; that there may be no advan tage to using the Tuc ker decomposi tion with t ensors in stead of using SVD with unf olded (matricized ) tens ors. Perhaps the dif ferent layers (slices ) of the tensor do not actual ly interact with each other in any interest- ing way . This critici sm would be appropria te if the T ucke r decompo sition used only one mode, b ut the decompo sition uses all N modes of X . B ecaus e all modes are used, the layers of the tensor are thor - oughl y mixed togethe r . For example, supp ose X ≈ J G ; A , B , C K . L et X i :: be a slice of X . There is no slice of G , say G j :: , suc h that we can recon struct X i :: from G j :: , using A , B , and C . W e need all of G in order to recons truct X i :: . All fo ur of the alg orithms that we ex amine in this paper perform the Tu cker decompositio n. One reason for our focus on the T ucker decomposit ion is that Bro and Andersson (1998) sho wed tha t the T ucke r decompos ition can be combine d with other tensor decomposi tions, such as P ARAF AC (Harsh- man, 1970; Carroll and Chang, 197 0). In general, algori thms for the T uck er decomp osition scale to lar ge ten sors bet ter than most other tenso r decom- positi on algo rithms; there fore it is poss ible to im- pro ve the speed of other alg orithms by ﬁrst com- 4 pressi ng the tensor with the T ucker decompositio n. The slo wer alg orithm (such as P ARA F A C) is the n applie d to the (rel ativ ely small) T ucke r core, instead of the whole (large) inpu t tensor (Bro and Ander - sson, 1998). Thus an algorithm that can perf orm the Tuc ker decompos ition with larg e tensors mak es it possible for other kinds of tensor decomposition s to be applie d to large tensors . 4 Algorithms This section introduces the four tensor decompo- sition algorith ms. A ll four algorithms tak e as in- put an arbitrary tensor X and a desired core size J 1 × . . . × J N and generate as output a T ucke r de- composi tion ˆ X = J G ; A (1) , . . . , A ( N ) K , in which the matrices A ( n ) are orthon ormal. For HO-SVD (Higher -Order Singular V alue De- composi tion) and HOOI (Highe r-Order Orthogo nal Iterati on), we show the algori thms specia lized for third-o rder tensor s and gene ralized for arbitr ary ten- sors. For SP (Slice P rojecti on) and MP (Multislice Projectio n), we present the algorithms for third- order and fourth-order tensors and lea ve the gen- eraliza tion for arbitrary tensors as an excercise for the reader . (There is a need for a better notatio n, to write the genera lization of S P and MP to arbitra ry tensor s.) 4.1 Higher -Order SVD Figure 1 presents the HO-SV D algorithm for third- order tensors. Figure 2 giv es the gener alization of HO-SVD for tensors of arb itrary order (De L ath- auwer et al., 2000a; Ko lda, 20 06). In the follow- ing ex periments, we used the implementation of HO-SVD in the MA TL AB T ens or T oolb ox (Bader and Ko lda, 2007b). HO-SV D is not a distinct func- tion in the T ool box, b ut it is easil y e xtracted from the T ucke r Alternating Least Squares function, where it is a compon ent. HO-SVD does not attemp t to optimize the ﬁt, ﬁt( X , ˆ X ) (K olda, 2006) . That is, HO-SVD does not pr oduce an optima l rank- J 1 , . . . , J N approx i- mation to X , because it optimiz es for each mode separa tely , without cons idering interactions among the m odes. Howe ver , we will see in Section 5 that HO-SVD often produces a reasonabl e app roxima- tion, and it is relativ ely fast. For more informat ion about HO-SVD, see De Lathauwer et al. (2000 a). 4.2 Higher -Order Orthogonal Iteratio n Figure 3 presents the HO OI algor ithm for third - order tensors. Figure 4 giv es the gener alization of HOOI for tensors of arbitrary orde r (De Lathauwer et al., 2000b; K olda, 2006 ). HOOI is implement ed in the MA T LAB T ensor T ool box (Bader and Ko lda, 2007b ), in the T uck er Alternating Least Squares functi on. HOOI uses HO-SV D to initiali ze the matrices, b e- fore entering the main loop. The implementation in the MA TLA B T ens or T oolbox giv es the option of using a random initializa tion, but initializat ion with HO-SVD usuall y results in a better ﬁ t. In the main loop, each matrix is optimiz ed in- di vidually , while the othe r matrices are held ﬁxed. This general method is called A lterna ting Least Squares (ALS). HOOI, SP , and MP all use ALS . The main loop terminate s when the chang e in ﬁt drops belo w a threshold or w hen the number of itera- tions reaches a maximum, wh ichev er c omes ﬁrst. T o calcul ate the ﬁ t for each iteration , H OOI ﬁrst calcu- lates the core G using J X ; A (1) T , . . . , A ( N ) T K , and then calculates ˆ X from J G ; A (1) , . . . , A ( N ) K . The chang e in ﬁt is the ﬁt of the T ucker dec omposition after the t -th iterati on of the main loop minus the ﬁt from the pre vious iteration: ∆ ﬁt ( t ) = ﬁt( X , ˆ X ( t ) ) − ﬁt( X , ˆ X ( t − 1) ) (10) In the exper iments, we set the thresho ld for ∆ ﬁt ( t ) at 10 − 4 and we set the maximum number of iteratio ns at 50 . (These are the d efault v alues in the MA TLA B T ensor T oolb ox.) The main loop usually terminated after half a dozen iteration s or fewer , w ith ∆ ﬁt ( t ) less than 10 − 4 . As implemented in the MA TLA B T ensor T oo l- box, calculating the HO -SVD initializatio n, the in- termedia te ten sor Z , and the change in ﬁ t, ∆ ﬁt ( t ) , requir es bringi ng the entire input tensor X into RAM. Although sparse representati ons are used, this requir ement limits the s ize of the tens ors that w e can proces s, as we see in Section 5.1. For more informa- tion about HOOI, see De L athauwer et al. (2000b) and K olda (2006). 4.3 Slice Pr ojection Figure 5 presen ts the SP algori thm for third-or der tensor s (W ang and Ahuja, 2005). A lthou gh W ang and Ahuja (2005 ) do not discuss tensors beyond the third-o rder , the SP algorithm genera lizes to tensors of arbitrary order . For example, Figure 6 shows S P for fourth- order tensors. Instea d of using HO-SV D, W ang an d Ahuja (2005 ) initializ e SP randomly , to av oid bringing X 5 in: T enso r X of size I 1 × I 2 × I 3 . in: Desired rank of core: J 1 × J 2 × J 3 . A ← J 1 leadin g eigen v ectors of X (1) X T (1) – X (1) is the unfol ding of X on mode 1 B ← J 2 leadin g eigen vectors of X (2) X T (2) C ← J 3 leadin g eigen vectors of X (3) X T (3) G ← J X ; A T , B T , C T K out: G of size J 1 × J 2 × J 3 and orthon ormal matrices A of size I 1 × J 1 , B of size I 2 × J 2 , and C of size I 3 × J 3 , such that X ≈ J G ; A , B , C K . Figure 1: Higher -Order Singular V alue Dec omposition for third-ord er tensors (De Lathauwer et al., 200 0a). in: T enso r X of size I 1 × I 2 × · · · × I N . in: Desired rank of core: J 1 × J 2 × · · · × J N . f or n = 1 , . . . , N do A ( n ) ← J n leadin g eigen vectors of X ( n ) X T ( n ) – X ( n ) is the unfol ding of X on mode n end f or G ← J X ; A (1) T , . . . , A ( N ) T K out: G of size J 1 × J 2 × · · · × J N and orthon ormal matrices A ( n ) of size I n × J n such that X ≈ J G ; A (1) , . . . , A ( N ) K . Figure 2: H igher -Order Singular V alue Decomposition for tensors of arbitrary order (De L athauwer et al., 2000a ). 6 in: T enso r X of size I 1 × I 2 × I 3 . in: Desired rank of core: J 1 × J 2 × J 3 . B ← J 2 leadin g eigen vectors of X (2) X T (2) – initializa tion via HO-SVD C ← J 3 leadin g eigen vectors of X (3) X T (3) while not con v erge d do – main loop U ← J X ; I 1 , B T , C T K A ← J 1 leadin g eigen v ectors of U (1) U T (1) V ← J X ; A T , I 2 , C T K B ← J 2 leadin g eigen v ectors of V (2) V T (2) W ← J X ; A T , B T , I 3 K C ← J 3 leadin g eigen vectors of W (3) W T (3) end while G ← J X ; A T , B T , C T K out: G of size J 1 × J 2 × J 3 and orthon ormal matrices A of size I 1 × J 1 , B of size I 2 × J 2 , and C of size I 3 × J 3 , such that X ≈ J G ; A , B , C K . Figure 3: Higher -Order Orthogo nal Iteration for third-or der tens ors (De Lathau w er et al., 2000b; Kolda , 2006) . Note that it is not necessa ry to initialize A , since the wh ile loop sets A using B and C . I i is the identi ty m atrix of size I i × I i . in: T enso r X of size I 1 × I 2 × · · · × I N . in: Desired rank of core: J 1 × J 2 × · · · × J N . f or n = 2 , . . . , N do – initializa tion via HO-SVD A ( n ) ← J n leadin g eigen vectors of X ( n ) X T ( n ) end f or while not con v erge d do – main loop f or n = 1 , . . . , N do Z ← J X ; A (1) T , . . . , A ( n − 1) T , I n , A ( n +1) T , . . . , A ( N ) T K A ( n ) ← J n leadin g eigen vectors of Z ( n ) Z T ( n ) end f or end while G ← J X ; A (1) T , . . . , A ( N ) T K out: G of size J 1 × J 2 × · · · × J N and orthon ormal matrices A ( n ) of size I n × J n such that X ≈ J G ; A (1) , . . . , A ( N ) K . Figure 4: H igher -Order Orthogonal Iteration for tensors of arbitrary order (De L athauwer et al., 2000b; K olda, 2006). I n is the identi ty m atrix of size I n × I n . 7 in: T enso r X of size I 1 × I 2 × I 3 . in: Desired rank of core: J 1 × J 2 × J 3 . C ← random matrix of size I 3 × J 3 – normaliz e columns to unit length while not con v erge d do – main loop M 13 ← I 2 X i =1 X : i : CC T X T : i : – slices on mode 2 A ← J 1 leadin g eigen v ectors of M 13 M T 13 M 21 ← I 3 X i =1 X T :: i AA T X :: i – slices on mode 3 B ← J 2 leadin g eigen v ectors of M 21 M T 21 M 32 ← I 1 X i =1 X T i :: BB T X i :: – slices on mode 1 C ← J 3 leadin g eigen vectors of M 32 M T 32 end while G ← J X ; A T , B T , C T K out: G of size J 1 × J 2 × J 3 and orthon ormal matrices A of size I 1 × J 1 , B of size I 2 × J 2 , and C of size I 3 × J 3 , such that X ≈ J G ; A , B , C K . Figure 5: Slice Projection for third-o rder tensors (W ang and Ahuja, 2005). Note that it is not necessar y to initial ize A and B . 8 in: T enso r X of size I 1 × I 2 × I 3 × I 4 . in: Desired rank of core: J 1 × J 2 × J 3 × J 4 . D ← random matrix of size I 4 × J 4 – normaliz e columns to unit length while not con v erge d do – main loop M 14 ← I 2 X i =1 I 3 X j =1 X : ij : DD T X T : ij : – slices on modes 2 and 3 A ← J 1 leadin g eigen v ectors of M 14 M T 14 M 21 ← I 3 X i =1 I 4 X j =1 X T :: ij AA T X :: ij – slices on modes 3 and 4 B ← J 2 leadin g eigen v ectors of M 21 M T 21 M 32 ← I 1 X i =1 I 4 X j =1 X T i :: j BB T X i :: j – slices on modes 1 and 4 C ← J 3 leadin g eigen vectors of M 32 M T 32 M 43 ← I 1 X i =1 I 2 X j =1 X T ij :: CC T X ij :: – slices on modes 1 and 2 D ← J 4 leadin g eigen v ectors of M 43 M T 43 end while G ← J X ; A T , B T , C T , D T K out: G of size J 1 × J 2 × J 3 × J 4 and orthon ormal matrices A of size I 1 × J 1 , B of size I 2 × J 2 , C of size I 3 × J 3 , and D of size I 4 × J 4 , such that X ≈ J G ; A , B , C , D K . Figure 6: Slice Projection for fourth-ord er tensors. 9 into RAM. In Figure 5, the matrix C is is ﬁlled with random numbers from the uniform distrib ution ove r [0 , 1] and then the columns are normalized. Note that HO-SVD calculates each matrix from X alone, whereas HO OI calcul ates each matrix from X and all of the other matrices. SP lies between HO-SVD and H OOI, in that it calculate s each matrix from X and one other matrix. In the main loop, the input tensor X is processe d one slice at a time, again to a void bringin g the whole tensor into RAM. Befo re enterin g the main loop, the ﬁrst step is to ca lculate th e sl ices and store each slice in a ﬁle. MP requires this same ﬁrst s tep. The MA T - LAB source code for MP , gi ven in the Appen dix, sho ws how we calculate the slices of X w ithout bringi ng all of X into R AM. Our app roach to constru cting the slice ﬁles as- sumes that the input tensor is giv en in a sparse rep- resent ation, in w hich each nonzero element of the tensor i s describe d by on e line i n a ﬁle. The descr ip- tion consists of the indices that specify the location of th e nonzero element, follo wed by the v alue o f the nonze ro element. For example, the element x ij k of a third-or der tensor X is described as h i, j, k , x ij k i . T o calculate the n -mode slices, we ﬁ rst sort the in- put t ensor ﬁle by mode n . For e xample, we g enerate two-mod e slices by sorting on j , the second column of the input ﬁle. T his puts all of the elements of an n -mode slice together consecuti vely in the ﬁle. Af- ter sorting on mode n , we can read the sorted ﬁle one slice at a time, writing each mode n slice to its o wn unique ﬁle. T o sort the input ﬁle, we use the Unix sort com- mand. This command allo ws the user to specify the amount of RAM used by the sort buf fer . In the fol- lo wing exp eriments, we arbitrarily set the buf fer to 4 GiB, half the av ailable RAM. (For W indo ws, the Unix sort command is included in Cygwin.) The main loop terminates after a maximum num- ber of iterations or when the core stops gro wing, whiche ver comes ﬁrst. The growth of the core is measured as follo ws: ∆ G ( t ) = 1 −    G ( t − 1)    F    G ( t )    F (11) In this equation, G ( t ) is the core after the t -th iter- ation. W e set the thres hold for ∆ G ( t ) at 10 − 4 and we set th e maximum number of iterations at 50 . The main loop usu ally terminated after half a dozen iter - ations or fe wer , w ith ∆ G ( t ) less than 10 − 4 . SP uses ∆ G ( t ) as a proxy for ∆ ﬁt ( t ) , to av oid bringi ng X into RAM. W ith eac h iteration , as the estimates for the matrices improve , the core captures more of the v ariation in X , res ulting in gro wth of the core. It is not necess ary to bring X into RAM in or- der to calculate G ( t ) ; we can calculate G ( t ) one slice at a time, as gi ven in the Appendi x. For more informatio n about SP , see W ang and Ahuja (2005). W ang et al. (2005) introduce d another lo w RA M algorithm for higher -order tensors , based on blocks instead of slice s. 4.4 Multislice Pr ojection Figure 7 presents the MP algo rithm for third-ord er tensor s. T he MP algor ithm genera lizes to arbitrary order . Figure 8 sho ws MP for fourth- order tensors. The basic struct ure of MP is taken from SP , b ut MP takes three ideas from H OOI: (1) use HO-SVD to initia lize, instead of random initiali zation, (2) use ﬁt to determine co n ver gence, instead of using the gro wth of the core, (3) use all of the other matri- ces t o calculat e a g iv en matrix, inste ad of usin g only one o ther matrix. Like SP , MP begins by calcul ating all of the slices of the input tensor and storing each slice in a ﬁle. See the Appendix for details. W e call the initia lization pseudo HO-SVD initial- ization , because it is no t exact ly HO-SV D, as ca n be seen by c omparing the initializ ation in Figure 3 with the initial ization in Figure 7. Note that X (2) in Fig- ure 3 is of size I 2 × ( I 1 I 3 ) , whe reas M 2 in Figure 7 is of size I 2 × I 2 , which is usually m uch smaller . HO-SVD brings the whole tensor into RAM, bu t pseud o HO-S VD pro cesses one slice at a time. The main loop terminate s when the chang e in ﬁt drops belo w a threshold or when the number of it- eration s reache s a maximum, whiche ve r comes ﬁrst. W e calcul ate the ﬁt one slice at a time, as giv en in the Appendix; it is not necessary to bring the whole input tensor into RAM in order to calcu late the ﬁ t. W e set the threshold for ∆ ﬁt ( t ) at 10 − 4 and w e set the maximum number of iterations at 50 . The main loop usually terminate d after half a dozen iteration s or fe wer , w ith ∆ ﬁt ( t ) less than 10 − 4 . The most signiﬁcant diffe rence between SP and MP is that MP uses all of the other m atrices to cal- culate a giv en matri x. For example, M 13 in Figu re 5 is based on X and C , w herea s th e corre sponding M 1 in Figu re 7 is based on X , B , and C . In this resp ect, MP is like HOOI, as we can see with the correspond - ing U in Figure 3. By slicing on two modes, instea d of only one, we improve the ﬁt of the tensor , as we shall see in the next s ection. 10 in: T enso r X of size I 1 × I 2 × I 3 . in: Desired rank of core: J 1 × J 2 × J 3 . M 2 ← I 1 X i =1 X i :: X T i :: + I 3 X i =1 X T :: i X :: i – pseud o HO-SV D initia lization B ← J 2 leadin g eigen vectors of M 2 M T 2 M 3 ← I 1 X i =1 X T i :: X i :: + I 2 X i =1 X T : i : X : i : C ← J 3 leadin g eigen vectors of M 3 M T 3 while not con v erge d do – main loop M 1 ← I 3 X i =1 X :: i BB T X T :: i + I 2 X i =1 X : i : CC T X T : i : – slices on modes 2 and 3 A ← J 1 leadin g eigen v ectors of M 1 M T 1 M 2 ← I 3 X i =1 X T :: i AA T X :: i + I 1 X i =1 X i :: CC T X T i :: – slices on modes 1 and 3 B ← J 2 leadin g eigen v ectors of M 2 M T 2 M 3 ← I 2 X i =1 X T : i : AA T X : i : + I 1 X i =1 X T i :: BB T X i :: – slices on modes 1 and 2 C ← J 3 leadin g eigen vectors of M 3 M T 3 end while G ← J X ; A T , B T , C T K out: G of size J 1 × J 2 × J 3 and orthon ormal matrices A of size I 1 × J 1 , B of size I 2 × J 2 , and C of size I 3 × J 3 , such that X ≈ J G ; A , B , C K . Figure 7: Multislice Projectio n for third -order tensors. MA TLAB sourc e code for this a lgorithm is pro vided in the Append ix. 11 in: T enso r X of size I 1 × I 2 × I 3 × I 4 . in: Desired rank of core: J 1 × J 2 × J 3 × J 4 . M 2 ← I 3 X i =1 I 4 X j =1 X T :: ij X :: ij + I 1 X i =1 I 4 X j =1 X i :: j X T i :: j + I 1 X i =1 I 3 X j =1 X i : j : X T i : j : B ← J 2 leadin g eigen vectors of M 2 M T 2 M 3 ← I 2 X i =1 I 4 X j =1 X T : i : j X : i : j + I 1 X i =1 I 4 X j =1 X T i :: j X i :: j + I 1 X i =1 I 2 X j =1 X ij :: X T ij :: C ← J 3 leadin g eigen vectors of M 3 M T 3 M 4 ← I 2 X i =1 I 3 X j =1 X T : ij : X : ij : + I 1 X i =1 I 3 X j =1 X T i : j : X i : j : + I 1 X i =1 I 2 X j =1 X T ij :: X ij :: D ← J 4 leadin g eigen vectors of M 4 M T 4 while not con v erge d do M 1 ← I 3 X i =1 I 4 X j =1 X :: ij BB T X T :: ij + I 2 X i =1 I 4 X j =1 X : i : j CC T X T : i : j + I 2 X i =1 I 3 X j =1 X : ij : DD T X T : ij : A ← J 1 leadin g eigen v ectors of M 1 M T 1 M 2 ← I 3 X i =1 I 4 X j =1 X T :: ij AA T X :: ij + I 1 X i =1 I 4 X j =1 X i :: j CC T X T i :: j + I 1 X i =1 I 3 X j =1 X i : j : DD T X T i : j : B ← J 2 leadin g eigen v ectors of M 2 M T 2 M 3 ← I 2 X i =1 I 4 X j =1 X T : i : j AA T X : i : j + I 1 X i =1 I 4 X j =1 X T i :: j BB T X i :: j + I 1 X i =1 I 2 X j =1 X ij :: DD T X T ij :: C ← J 3 leadin g eigen vectors of M 3 M T 3 M 4 ← I 2 X i =1 I 3 X j =1 X T : ij : AA T X : ij : + I 1 X i =1 I 3 X j =1 X T i : j : BB T X i : j : + I 1 X i =1 I 2 X j =1 X T ij :: CC T X ij :: D ← J 4 leadin g eigen v ectors of M 4 M T 4 end while G ← J X ; A T , B T , C T , D T K out: G of size J 1 × J 2 × J 3 × J 4 and orthon ormal matrices A of size I 1 × J 1 , B of size I 2 × J 2 , C of size I 3 × J 3 , and D of size I 4 × J 4 , such that X ≈ J G ; A , B , C , D K . Figure 8: Multislice Projection for fourth-o rder tensor s. 12 5 Experiments This section present s th e four group s of experimen ts. The har dware for these e xperiments was a computer with two dual-core AM D Opteron 64 processors, 8 GiB of RAM , and a 16 GiB swap ﬁle. The softwar e was 64 bit Suse Linux 10.0, MA TLAB R2007a, and MA TL AB T ensor T oolb ox V ersion 2.2 (Bader and K olda, 20 07b). The algorithms only used one of the four cores; we did not attempt to perform par - allel process ing, although SP and MP could be par - alleliz ed readily . The input ﬁles are plain text ﬁ les with one lin e for each nonzero value in the tensor . E ach line consists of intege rs th at gi ve the location of the nonzero valu e in the tensor , follo wed by a single real numbe r that gi ves the nonzero v alue itsel f. The input ﬁles are in text format, rather than binary format, in order to faci litate sorting the ﬁles. The output ﬁ les are binary MA TLAB ﬁ les, con- tainin g the tensor decomposit ions of the input ﬁles. The fou r algorithms genera te tensor dec ompositions that are n umerically differe nt but structu rally identi- cal. That is, the numerical values are differe nt, but , for a gi ven input tenso r , the four alg orithms gen erate core tenso rs and matrice s of the same size. T here- fore the output ﬁ le size does not depend on which algori thm was used. 5.1 V arying T ensor Sizes The goal of this group of ex periments was to e val - uate the four algorithms on increasingl y large r ten- sors, to disco ver how their performanc e scales w ith size. HO-SV D and HOOI as sume that the input ten- sor ﬁts in RAM, w hereas SP and MP assume that the input tensor ﬁle m ust be read in blocks. W e ex- pected that HO-SVD and HOOI woul d eve ntually run out of RAM , but we could not predict precisely ho w the four algorith ms would scale, in terms of ﬁt, time, and space. T able 1 summarizes the inpu t test tensors for the ﬁrst group of experi ments. The test tensors are ran- dom sparse third-ord er tensors, varyi ng in size from 250 3 to 2000 3 . The number of nonzeros in the ten- sors v aries fro m 1.6 millio n to 800 milli on. T he nonze ro values are random samples from a uniform distrib ution between zero and one. T able 2 sho ws the results of the ﬁ rst group of ex- periment s. HO-SVD and HOOI were only able to proces s the ﬁrst four tensors, with sizes fro m 250 3 to 1000 3 . The 1000 3 tensor required almost 16 GiB of RAM. The next tensor , 1250 3 , required more RAM than was a vaila ble (24 GiB; 8 GiB of actual RAM plus a 16 G iB swap ﬁle). On the other hand, SP and MP were able to process all eight tensors, up to 2000 3 . Larger tensors are possible with SP and MP; the limiting f actor beco mes run time, rathe r than a vail able RAM . Figure 9 shows the ﬁt of the four algorit hms. HOOI has the best ﬁt, follo wed by MP , then SP , and ﬁ nally HO-SV D. The curv es for HO-SVD and HOOI stop at 100 million nonzero s (the 1000 3 ten- sor), but it seems like ly that the same trend would contin ue, if sufﬁcie nt R AM were av ailabl e to apply HO-SVD and HOOI to the lar ger tensors. The ﬁt is somewhat lo w , at abo ut 4%, due to the dif ﬁculty of ﬁ tting a random tensor with a core size that is 0.1% of the size o f the input tens or . Ho wev er , we are interested in the relat iv e ranking of the four algori thms, rather than the abso lute ﬁ t. T he results in Section 5.4 show that the ranking w e see here, in Figure 9, is predicti ve of the relati ve performan ce on a real (nonra ndom) task. Figure 10 shows the R AM use of the algorithms. As we can see in T able 2, there are two compone nts to the RAM use of SP and MP , the RAM used by sort and the RAM used by MA TLAB. W e arb itrarily set the sorting b uffer to 4 GiB, which sets an upper bound on the RA M used by sort . A machin e with less R AM could use a smaller sorti ng buf fer . W e ha ve not experiment ed with the b uffer size, b ut we exp ect that the buf fer could be made m uch smaller , with only a slight increas e in run time. The gro wth of the MA TLAB componen t of R AM use of SP and MP is slo w , especially in compari son to HO-SVD and HOOI. Figure 11 g iv es the run time . For the smallest ten- sors, SP and MP take longer to run than HO-SVD and HOOI, because S P and MP make more use of ﬁles and less use of RAM . Wi th a ten sor size of 1000 3 , both HO-SVD and HOO I use up the a vai l- able hardware RAM (8 GiB) a nd nee d to use t he vir - tual R AM (the 16 GiB swap ﬁle), which exp lains the s udden upward su rge in Figure 11 at 100 million nonze ros. In genera l, the run time of SP and MP is competit iv e with H O-SVD and HOOI. The results sho w t hat SP and MP can hand le muc h lar ger tens ors th an HO-SVD and HOOI (800 m illion nonze ros versu s 100 million nonzeros ), with only a small penalt y in run time for smaller tensors. How- e ver , HOOI yields a better ﬁt than M P . If ﬁt is impor- tant, we recommend HOOI for smaller tens ors and MP for lar ger tenso rs. If speed is more importa nt, we re commend HO-SVD for smaller ten sors and SP for lar ger tensors . 13 Input tensor size Core size Density Nonzeros Input ﬁle Output ﬁle ( I 1 × I 2 × I 3 ) ( J 1 × J 2 × J 3 ) (% Nonzero) (Million s) (G iB) (MiB) 250 × 250 × 250 25 × 25 × 25 10 1.6 0.03 0.3 500 × 500 × 500 50 × 50 × 50 10 12.5 0.24 1.5 750 × 750 × 750 75 × 75 × 75 10 42.2 0.81 4.3 1000 × 1000 × 1000 10 0 × 100 × 100 10 100.0 1.93 9.5 1250 × 1250 × 1250 12 5 × 125 × 125 10 195.3 3.88 17.7 1500 × 1500 × 1500 15 0 × 150 × 150 10 337.5 6.85 29.7 1750 × 1750 × 1750 17 5 × 175 × 175 10 535.9 11.03 46.0 2000 × 2000 × 2000 20 0 × 200 × 200 10 800.0 16.64 67.4 T able 1: R andom sp arse third-o rder tensors of va rying size. Algorith m T ensor Nonzeros Fit Run time Matlab RAM Sort RAM T otal RAM (Million s) (%) (HH:MM:SS) (GiB) (GiB) (GiB) HO-SVD 250 3 1.6 3.890 00:00 :24 0.21 0.00 0.21 HO-SVD 500 3 12.5 3.883 00:03 :44 1.96 0.00 1.96 HO-SVD 750 3 42.2 3.880 00:14 :42 6.61 0.00 6.61 HO-SVD 100 0 3 100.0 3.880 01:10 :13 15.66 0.00 15.66 HOOI 250 3 1.6 4.053 00:01 :06 0.26 0.00 0.26 HOOI 500 3 12.5 3.982 00:09 :52 1.98 0.00 1.98 HOOI 750 3 42.2 3.955 00:42 :45 6.65 0.00 6.65 HOOI 1000 3 100.0 3.942 04:01 :36 15.74 0.00 15.74 SP 250 3 1.6 3.934 00:01 :21 0.01 1.41 1.42 SP 500 3 12.5 3.906 00:10 :21 0.02 4.00 4.03 SP 750 3 42.2 3.896 00:34 :39 0.06 4.00 4.06 SP 1000 3 100.0 3.893 01:43 :20 0.11 4.00 4.12 SP 1250 3 195.3 3.890 03:16 :32 0.21 4.00 4.22 SP 1500 3 337.5 3.888 06:01 :47 0.33 4.00 4.33 SP 1750 3 535.9 3.886 09:58 :36 0.54 4.00 4.54 SP 2000 3 800.0 3.885 15:35 :21 0.78 4.00 4.79 MP 250 3 1.6 3.979 00:01 :45 0.01 1.41 1.42 MP 500 3 12.5 3.930 00:13 :55 0.03 4.00 4.03 MP 750 3 42.2 3.914 00:51 :33 0.06 4.00 4.07 MP 10 00 3 100.0 3.907 02:21 :30 0.12 4.00 4.12 MP 12 50 3 195.3 3.902 05:05 :11 0.22 4.00 4.23 MP 15 00 3 337.5 3.899 09:28 :49 0.37 4.00 4.37 MP 17 50 3 535.9 3.896 16:14 :01 0.56 4.00 4.56 MP 20 00 3 800.0 3.894 25:43 :17 0.81 4.00 4.82 T able 2: Performance of the four algorithms with tensors of vary ing size. 14 1 10 100 1,000 3.86% 3.88% 3.90% 3.92% 3.94% 3.96% 3.98% 4.00% 4.02% 4.04% 4.06% HOOI MP SP HO-SVD Nonzer os (millions) Fit (percent) Figure 9: The ﬁ t of the fou r algorithms as a function of the number of nonzeros. 1 10 100 1,000 0.1 1.0 10.0 100.0 HO OI MP SP HO -SVD Nonz eros (millions) Total RAM ( GiB) Figure 10: The RAM use of the four algorithms as a functi on of the number of nonzeros. Note that the size of the sortin g buf fer for SP and MP was arbitrar ily set to 4 GiB. 15 1 10 100 1,0 00 10 100 1,00 0 10, 000 100 ,000 HOOI MP SP HO-SVD Nonzeros (m il lions) Tim e (s econds ) Figure 11: The run time of the four algorithms as a function of the number of nonzeros . 5.2 V arying Cor e Size Ratios SP is some what dif ferent from the other three algo- rithms, in that it has a kind of asymmetry . Compare M 13 in Figure 5 with M 1 in Figure 7. W e could ha ve used B instead of C , to calculate A in Fig- ure 5, bu t we arbitrarily chose C . W e hypothesi zed that this asymmetry would make SP sensit iv e to v ari- ation in the ratios of the core sizes. In this group of expe riments, we v ary the ratios between the si zes of the core in each mode , as listed in T able 3. The effect of the ratio on the performance is sho wn in T able 4. F igure 12 illustra tes the ef fect of the ratio on the ﬁt. It is clear from the ﬁgure that SP is asymmet rical, whereas HO-SVD , HOOI, and MP are symmetrical . This asymmetry of SP might be viewed as a ﬂaw , and thus a reason for preferring MP o ver SP , b ut it could also be see n as an advan tage for SP . In the case where the ratio is 0.2, SP has a better ﬁt than MP . This sugge sts that we might use SP instead of MP w hen the ratios between the sizes of the core in each mode are highly ske wed; howe ver , we must be carefu l to make sure that SP processe s the matrices in the optimal order for the gi ven core sizes. Note that the relati ve ran king of the ﬁt of the four algorithms is the same as in the prev ious group of experi ments (best ﬁt to worst: H OOI, MP , SP , HO-SVD), except in the cas e of extreme ske w . Thus Figure 12 sho ws the robus tness of the relati ve rank- ing. 5.3 Fourt h-Ord er T ensors This group of experiments demonstra tes that the pre- vious observ ations regardi ng the relati ve ranking of the ﬁt also apply to fourth-ord er tensors. The e xper- iments also in ve stigate the effec t of varyi ng the size of the core, with a ﬁxed inp ut tensor size. T able 5 lists the core sizes that we in ve stigated. The eff ect of the core sizes on the perfo rmance is sho wn in T able 6. F igure 13 sho ws the impact of core size on ﬁt. The ﬁt va ries from abou t 4% with a core of 10 4 to about 44% with a core of 90 4 . T o make the dif- ferenc es among the algo rithms clearer , we normal- ized the ﬁt by using HO-SVD as a baseline. The ﬁt relati ve to HO-SVD is deﬁned as the perce ntage impro vement in the ﬁt of the giv en algorithm, com- pared to the ﬁt of HO-SVD. Figure 13 shows that the dif ferences amon g the four algorith ms are lar gest when the core is about 50 4 ; that is, the size of one mode of the core ( 50 ) is about half of the s ize of one mode of the input ten sor 16 Input tensor size Core size Ratio Density N onzero s Input ﬁle Output ﬁle ( I 1 × I 2 × I 3 ) ( J 1 × J 2 × J 3 ) ( J 1 /J 2 = J 2 /J 3 ) (%) (Millions ) (GiB) (MiB) 500 × 500 × 500 250 × 50 × 10 5.00 10 12.5 0.24 2.05 500 × 500 × 500 125 × 50 × 20 2.50 10 12.5 0.24 1.63 500 × 500 × 500 83 × 50 × 30 1.66 1 0 1 2.5 0.24 1.51 500 × 500 × 500 63 × 50 × 40 1.26 1 0 1 2.5 0.24 1.48 500 × 500 × 500 50 × 50 × 50 1.00 1 0 1 2.5 0.24 1.46 500 × 500 × 500 40 × 50 × 63 0.80 1 0 1 2.5 0.24 1.48 500 × 500 × 500 30 × 50 × 83 0.60 1 0 1 2.5 0.24 1.51 500 × 500 × 500 20 × 50 × 125 0.40 10 12.5 0.24 1.63 500 × 500 × 500 10 × 50 × 250 0.20 10 12.5 0.24 2.05 T able 3: Random sparse third-ord er tensors with v arying ratios between the sizes of the core in each mode. 0.1 1 10 3.88% 3.89% 3.90% 3.91% 3.92% 3.93% 3.94% 3.95% 3.96% 3.97% 3.98% 3.99% HOOI MP SP HO-SVD Ratio of e dge sizes of t he c ore Fit ( per cen t) Figure 12: The ﬁ t of the four algorithms as a function of the ratios between the sizes of the core in each mode. 17 Algorith m Ratio Fit R un time Matlab RAM Sort RAM T otal RAM ( J 1 /J 2 = J 2 /J 3 ) (%) (HH:MM:SS) (GiB) (G iB) (GiB) HO-SVD 5.00 3 .881 00:06:5 4 2.71 0 .00 2.71 HO-SVD 2.50 3 .883 00:04:5 3 1.96 0 .00 1.96 HO-SVD 1.66 3 .883 00:04:1 5 1.78 0 .00 1.78 HO-SVD 1.26 3 .884 00:03:5 3 1.96 0 .00 1.96 HO-SVD 1.00 3 .883 00:03:4 8 1.96 0 .00 1.96 HO-SVD 0.80 3 .884 00:03:3 3 1.96 0 .00 1.96 HO-SVD 0.60 3 .883 00:03:2 4 1.96 0 .00 1.96 HO-SVD 0.40 3 .883 00:03:1 5 1.96 0 .00 1.96 HO-SVD 0.20 3 .881 00:03:0 6 1.96 0 .00 1.96 HOOI 5.00 3 .969 00:27:2 4 2.72 0 .00 2.72 HOOI 2.50 3 .983 00:16:2 3 2.02 0 .00 2.02 HOOI 1.66 3 .982 00:12:5 3 1.98 0 .00 1.98 HOOI 1.26 3 .982 00:11:0 6 1.98 0 .00 1.98 HOOI 1.00 3 .982 00:09:5 3 1.98 0 .00 1.98 HOOI 0.80 3 .982 00:09:0 2 1.98 0 .00 1.98 HOOI 0.60 3 .982 00:08:1 1 1.98 0 .00 1.98 HOOI 0.40 3 .982 00:07:2 6 1.99 0 .00 1.99 HOOI 0.20 3 .965 00:05:3 2 2.02 0 .00 2.02 SP 5.00 3 .896 00:11:1 8 0.02 4 .00 4.02 SP 2.50 3 .900 00:09:3 6 0.02 4 .00 4.02 SP 1.66 3 .902 00:09:3 0 0.02 4 .00 4.02 SP 1.26 3 .905 00:10:1 2 0.02 4 .00 4.03 SP 1.00 3 .906 00:10:1 3 0.02 4 .00 4.03 SP 0.80 3 .908 00:10:1 2 0.03 4 .00 4.03 SP 0.60 3 .910 00:10:2 3 0.03 4 .00 4.03 SP 0.40 3 .914 00:10:3 2 0.04 4 .00 4.04 SP 0.20 3 .920 00:12:4 3 0.06 4 .00 4.07 MP 5.00 3.901 00 :15:01 0.02 4.00 4.03 MP 2.50 3.917 00 :14:05 0.02 4.00 4.02 MP 1.66 3.925 00 :13:46 0.02 4.00 4.03 MP 1.26 3.929 00 :13:47 0.02 4.00 4.03 MP 1.00 3.930 00 :13:51 0.03 4.00 4.03 MP 0.80 3.930 00 :13:45 0.03 4.00 4.03 MP 0.60 3.927 00 :14:17 0.03 4.00 4.04 MP 0.40 3.922 00 :14:37 0.04 4.00 4.04 MP 0.20 3.905 00 :16:33 0.06 4.00 4.07 T able 4: P erforman ce of the four al gorithms with vary ing ratios between the sizes of th e core in each mode. 18 Input tensor size Core size Density Nonzero s Input ﬁle Outp ut ﬁ le ( I 1 × I 2 × I 3 × I 4 ) ( J 1 × J 2 × J 3 × J 4 ) (%) (Millions) ( MiB) (MiB) 100 4 90 4 10 10 197.13 480.68 100 4 80 4 10 10 197.13 300.16 100 4 70 4 10 10 197.13 176.02 100 4 60 4 10 10 197.13 95 .08 100 4 50 4 10 10 197.13 45 .91 100 4 40 4 10 10 197.13 18 .86 100 4 30 4 10 10 197.13 6.02 100 4 20 4 10 10 197.13 1.23 100 4 10 4 10 10 197.13 0.10 T able 5: Random sparse fourth-or der tensors with v arying core sizes. 10 20 30 4 0 50 60 70 8 0 90 -0.2% 0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 1.4% 1.6% 1.8% 2.0% 2.2% HOO I MP SP HO- SVD Size of one edge of the core Fit relative to HO-SVD (percent) Figure 13: The ﬁt of the four algorith ms as a function of the core sizes, gi ven fourth-o rder tensor s. 19 Algorith m Core S ize Fit Relati ve ﬁt Run time Matlab RAM Sort RAM T otal RAM (%) (%) (HH :MM:SS) (GiB) (GiB) (GiB) HO-SVD 90 4 44.007 0.00 00: 05:35 3.56 0.00 3.56 HO-SVD 80 4 26.477 0.00 00: 05:03 3.10 0.00 3.10 HO-SVD 70 4 16.449 0.00 00: 04:20 3.02 0.00 3.02 HO-SVD 60 4 10.463 0.00 00: 03:52 2.62 0.00 2.62 HO-SVD 50 4 6.988 0.00 00: 03:27 2.07 0.00 2.07 HO-SVD 40 4 5.116 0.00 00: 03:13 1.89 0.00 1.89 HO-SVD 30 4 4.232 0.00 00: 02:57 1.80 0.00 1.80 HO-SVD 20 4 3.903 0.00 00: 02:48 1.75 0.00 1.75 HO-SVD 10 4 3.827 0.00 00: 02:34 1.80 0.00 1.80 HOOI 90 4 44.065 0.13 00: 18:35 4.32 0.00 4.32 HOOI 80 4 26.600 0.47 00: 21:16 3.75 0.00 3.75 HOOI 70 4 16.609 0.97 00: 17:50 3.28 0.00 3.28 HOOI 60 4 10.629 1.59 00: 15:00 2.88 0.00 2.88 HOOI 50 4 7.135 2.10 00: 12:44 2.53 0.00 2.53 HOOI 40 4 5.227 2.17 00: 10:19 2.24 0.00 2.24 HOOI 30 4 4.301 1.63 00: 08:42 1.97 0.00 1.97 HOOI 20 4 3.933 0.75 00: 05:25 1.81 0.00 1.81 HOOI 10 4 3.834 0.18 00: 04:34 1.80 0.00 1.80 SP 90 4 44.029 0.05 01: 45:07 2.19 4.00 6.19 SP 80 4 26.517 0.15 01: 31:50 1.55 4.00 5.56 SP 70 4 16.499 0.30 01: 17:53 1.06 4.00 5.07 SP 60 4 10.511 0.46 01: 09:49 0.44 4.00 4.44 SP 50 4 7.026 0.54 01: 04:21 0.38 4.00 4.38 SP 40 4 5.140 0.47 01: 02:37 0.13 4.00 4.13 SP 30 4 4.247 0.35 01: 01:09 0.07 4.00 4.08 SP 20 4 3.908 0.11 00: 59:02 0.04 4.00 4.04 SP 10 4 3.828 0.01 00: 57:56 0.01 4.00 4.02 MP 90 4 44.039 0.07 03: 16:44 2.19 4.00 6.19 MP 80 4 26.544 0.25 02: 31:07 1.55 4.00 5.56 MP 70 4 16.532 0.50 01: 57:17 1.06 4.00 5.07 MP 60 4 10.547 0.80 01: 36:45 0.69 4.00 4.70 MP 50 4 7.057 0.98 01: 23:33 0.38 4.00 4.38 MP 40 4 5.163 0.91 01: 14:23 0.17 4.00 4.18 MP 30 4 4.259 0.63 01: 07:01 0.07 4.00 4.08 MP 20 4 3.911 0.20 01: 04:29 0.04 4.00 4.04 MP 10 4 3.828 0.03 01: 05:26 0.01 4.00 4.02 T able 6: P erforman ce of the four algorithms with fourth-ord er tensors and va rying core sizes. R elati ve ﬁt is the percen tage increase in ﬁt relativ e to HO-SVD. 20 ( 100 ). When the core is very small or very lar ge, compared to the input tensor , there is little differ ence in ﬁt among the algori thms. The ﬁ t follo ws the same trend here as in the pre- vious two groups of experimen ts (best to wors t: HOOI, MP , SP , HO -SVD), in spite of the switch from third-o rder tensors to fourth- order ten sors. This furth er conﬁrms the robu stness of the results. T able 6 sho ws that SP and MP are slo w with fourth -order tensors, compared to HO-SVD and HOOI. This is a chang e from what we observ ered with thir d-order tensors, which did not yield such lar ge dif ferences in run time. This is because a fourth -order tensor has many m ore slices than a third-o rder tensor w ith the same numbe r of ele- ments, and each slice is smaller . There is a much lar ger overh ead associated with openin g and clo s- ing m any small ﬁles, compared to a few large ﬁ les. This could be ameliorated by storing sev eral adja- cent slice s toge ther in one ﬁle, instead of using a separa te ﬁ le for each sli ce. Even with third- order tensors, groupin g slic es to- gether in one ﬁle would improv e the speed of SP and MP . Ideally , the user wou ld speci fy the maxi- mum RAM a vail able and SP and MP wou ld group as many slices together as would ﬁ t in the av ailable RAM. 5.4 Per formance with Real Data So far , all our experimen ts ha ve used random ten- sors. Our purpose with this las t grou p of experi- ments is to show that the pre vious observ ations ap- ply to nonrandom tensors. In particu lar , the diffe r- ences in ﬁt that we hav e seen so far are somewhat small. It seems possibl e that the dif ferences might not matter in a real applicat ion of tensors. This group of experi ments shows that the dif ferences in ﬁt result in dif ferences in performanc e on a real task. The task we examin e here is answering multiple- choice syno nym quest ions from the TOEFL tes t. This task was ﬁrst i n vestig ated in Landauer and Du- mais (1997). In ongoing work, we are exp loring the applic ation of third-ord er tensor s to this task, com- bining ideas from L andau er and Dumais (1997) and T urne y (2006) . T able 7 describ es the input data and the output tensor decompositio n. T he ﬁrst mode of the tensor consis ts of all of the 391 uniqu e words that occ ur in the TOEFL questio ns. The second mode is a set of 849 words from Basic English, which is an artiﬁ- cial language that reduces English to a small, easily learne d core v ocab ulary (Ogden, 1930). The third mode consists of 1020 patte rns that join the word s in the ﬁrst two modes. These patter ns were gener - ated using the app roach of Tu rney (2006). The valu e of an element in the tens or is deri ved from the fre- quenc y of the corre sponding word pair and pattern in a lar ge corpus. A TOEFL question consists of a stem word (the tar get word) and four choic e words. The task is to select the choice word that is most similar in mean- ing to the stem word. Our approach i s to measure the similarit y of two TOEFL w ords by the av erage sim- ilarity of their relation s to the B asic English wor ds. Let X be our inpu t tensor . Suppos e we wish to measure the similarit y of two TOEFL words. L et X i :: and X j :: be the slices of X that correspond to the two T OE FL wo rds. Each slice gi ves the weights for all of the patter ns that join the giv en TOEFL word to all of the Basic English words. Our m ea- sure of similarit y betwee n the TOEFL words is cal- culate d by comparing the two slices. T able 8 present s the pe rformance of t he fou r algo- rithms. W e see that the ﬁt foll ows the familiar pat- tern: HOOI has the best ﬁt, then MP , next SP , and lastly H O-SVD. Note that MP and SP ha ve similar ﬁts. The ﬁ nal column of the table giv es the TOEFL scores for the four algorit hms. HOOI has the best TOEFL score , MP and SP hav e the same score, and HO-SVD has the lo west score. The bottom ro w of the table giv es the TOEFL score for the raw input tensor , without the beneﬁt of an y smoothin g from the Tu cker decompo sition. The resul ts valida te the pre vious e xperiments with random tenso rs and i llus- trate the val ue of the T uck er decompos ition on a real task. 6 Conclusions The T ucker decompositi on has been with us since 1966, but it seems that it has only recently started to become popula r . W e believ e that this is beca use only recently has computer hardware reached the point whe re lar ge tenso r decompositio ns are bec om- ing feasibl e. SVD started to attract interes t in the ﬁeld of in- formatio n retr iev al when it was appli ed to “prob- lems of reas onable size (1000-200 0 document ab- stracts ; and 5 000-7000 inde x terms)” (Dee rwester et al., 1990) . In colla borati ve ﬁltering , SVD attracted interes t w hen it achie ved good results on the Net- ﬂix Prize, a datas et with a sparse matrix of 17,000 movie s rated by 500,000 users. In realist ic applic a- tions, size matters. The MA TLA B T ensor T oolbox (Bader and Kold a, 2007a; Bader and Kol da, 2007b) 21 Input tensor size ( I 1 × I 2 × I 3 ) 391 × 849 × 1020 Core size ( J 1 × J 2 × J 3 ) 250 × 250 × 250 Input ﬁle (MiB) 345 Output ﬁle (MiB) 11 9 Density (% Nonzero) 5.27 Nonzeros (Millions) 1 8 T able 7: D escrip tion of the input data and the output decomposit ion. Algorith m Fit Relati ve ﬁt Run time Matlab RAM S ort RAM T otal RAM TOEFL (%) (%) (HH:MM:SS ) (GiB) (GiB) (GiB) (%) HO-SVD 21.716 0.00 00:10:28 5.29 0.00 5.29 80.00 HOOI 22.597 4.05 00:56 :08 5.77 0.00 5.77 83.75 SP 22.321 2.78 00:30 :02 0.33 4.00 4.33 81.25 MP 22.371 3.01 00:43:52 0.33 4.00 4.34 81.25 Raw ten sor - - - - - - 6 7.50 T able 8: Performanc e of the four algorith ms with actual data. Relati ve ﬁt is the perce ntage increase in ﬁ t relati ve to HO-SVD. has d one much to mak e te nsor decompos itions more access ible and easier to experiment with, bu t, as w e ha ve seen here, RAM requir ements beco me prob- lematic with tensors lar ger than 1000 3 . The aim of this paper has been to empirica lly e v al- uate four tensor de compositions , to study their ﬁt and their time and space requirements . O ur primary concer n was the abil ity of the algorith ms to scale up to lar ge tensors . T he implemen tations of HO-SV D and HOOI, tak en from the MA TLA B T enso r T ool- box, a ssumed that the input tensor cou ld ﬁt in RAM, which limited them to tensors of size 1000 3 . On the other hand, SP and MP were able to process tensors of size 2000 3 , with eight times more elemen ts. The exper iments in Sectio n 5.4 suggest that the dif ferences in ﬁt among the four algorithms corre- spond to dif ferenc es in performan ce on real tasks. It seems lik ely tha t good ﬁt will be important for many a pplications ; therefore, we recommend HOOI for those tensors that can ﬁt in the a vail able R AM, and MP for lar ger tensor s. Acknowledgeme nts Thanks to Brandyn W ebb, T amara Ko lda, and Hongche ng W ang for helpful comments. Thanks to T amara Kolda and B rett Bader for the MA TLAB T ensor T oolbox. Refer ences E. Acar and B. Y ener . 200 7. Unsup ervised multiway data an alysis: A literature survey . T ech nical repor t, Computer Scien ce Depar tment, Rensselaer Polytech- nic Institute, Tro y , NY . h ttp://www .cs.rpi.edu/ ∼ ac are/ Acar07 Multiway .pdf. O. Alter , P .O. Brown, and D. Botstein. 2000 . Sing ular value d ecomposition for gen ome-wide expression data processing and modeling. Pr oce edings of the Nation al Academy of Sciences , 97(1 8):1010 1–10106. B.W . Bader and T .G. Kolda. 2 007a. Efﬁcient MA TLAB computatio ns with sparse and factor ed tenso rs. SIAM Journal on Scientiﬁc C omp uting . B.W . Bader and T .G. K old a. 2 007b. M A T LAB T en- sor T oolbox version 2.2. http ://csmr .ca.sandia.gov/ ∼ tgkolda/T ensorT oo lbox/. D. Billsus and M.J. Pazzani. 1 998. Learning co llabora- ti ve information ﬁlters. Pr oceedings o f the F ifteenth Internation al Co nfer ence on Machine Learnin g , pages 46–54 . M. Brand. 2002. Increm ental singular value deco mpo- sition of u ncertain data with missing values. Pr oceed- ings of the 7 th Eur opea n Confer ence on Compu ter V i- sion , pages 707– 720. R. Bro and C.A. Andersson. 1998. Improvin g th e speed of multiway algorithms – Part II: Compres- sion. Chemometrics and Intelligent Laboratory S ys- tems , 42(1 ):105–1 13. 22 J.D. Carro ll and J.J. Chang. 19 70. Analysis of indi- vidual differences in multid imensional scaling via an n-way gen eralization of “Eck art-Y oung” decomp osi- tion. Psychometrika , 35(3):283– 319. P .A . Chew , B.W . Bader , T .G. Kolda, and A. Abde- lali. 200 7. Cro ss-language inform ation retrie val using P ARAF A C2. P r oceedings of the 13th ACM S IGKDD Internation al Confer ence on Kno wledge Discovery and Data Mining , pages 143–1 52. L. De Lathau wer , B. De Mo or , and J. V andewalle. 2000a . A multilinear sing ular value d ecomposition . SIAM Journal on Matrix Analy sis a nd A pplications , 21:125 3–127 8. L. De Lathauwer, B. De Moor , and J. V and ew alle. 2000 b . On the best ran k-1 and rank-( R 1 , R 2 , . . . , R N ) ap- proxim ation of higher-order tensor s. S IAM Journal on Matrix Analysis and Application s , 21:1 324–1 342. S. Deerwester , S.T . Dumais, G.W . Furna s, T .K. Lan dauer, and R. Harshman. 1990 . Ind exing by latent semantic analysis. Journal of the American Society for Informa - tion Science , 41(6) :391–4 07. D.M. Dunla v y , T .G. K olda, an d W .P . Kegelmeyer . 2006. Multilinear a lgebra for analyzing data with multiple linkag es. T echnical Repo rt SAND2006- 2079, Sandia National Lab oratories, Livermore, CA. http://csmr .ca.sandia.g ov/ ∼ tgkolda/pubs/SAND2006- 2079. pdf. R.A. Harshman. 1970. Foundations of the P ARAF AC proced ure: Mo dels and con ditions for an “explana- tory” multi-moda l factor analysis. UCLA W orking P a- pers in Phonetics , 16:1–84. T .G. Kolda. 20 06. Multilinear operators for higher-order decomp ositions. T e chnical Rep ort SAND2006 - 2081, Sandia National Lab oratories, Livermore, CA. http://csmr .ca.sandia.g ov/ ∼ tgkolda/pubs/SAND2006- 2081. pdf. T .K. Landau er and S.T . Dumais. 1 997. A solu tion to Plato’ s pro blem: The la tent semantic analysis theo ry of acquisition, induction, and representation of knowl- edge. Psychological Review , 104(2):211 –240 . M.W . Mah oney , M. M aggioni, an d P . Dr ineas. 2006 . T en sor-CUR decompo sitions for ten sor-based data. Pr oceedings o f the 12th ACM SIGKDD Internatio nal Confer ence on Kno wledge Discovery and Da ta Min - ing , pages 327–33 6. C.K. Ogden. 1 930. Ba sic En glish: A general in- tr oduction with rules and g rammar . K egan Paul, T renc h, T rub ner and Co., L ondon . h ttp://ogden .basic- english.org/. H. Sch ¨ utze. 1998. Automatic w or d sense discrimina tion. Computation al Lingu istics , 24(1):97– 123. J. Sun, D. T ao, a nd C. Faloutsos. 2006 . Beyond streams and grap hs: Dynamic tenso r ana lysis. Pr oce edings of the 12 th ACM SIGKDD In ternationa l Conference on Knowledge Discovery a nd Data Mining , pages 374– 383. L.R. T ucker . 1 966. Some mathema tical n otes on three- mode factor analysis. Psychometrika , 31(3 ):279– 311. P .D . T urney . 2006 . Sim ilarity of semantic relatio ns. Computation al Lingu istics , 32(3):379 –416. H. W ang and N. Ah uja. 2005. Rank- R a pproxim ation of tensors: Using image-as-matrix repr esentation. Pr o- ceedings of the 2 005 IEEE Compute r Society Con- fer ence on Computer V ision a nd P attern Recognition (CVPR’05 ) , 2:34 6–35 3. H. W an g, Q. W u, L. Shi, Y . Y u, an d N. Ahu ja. 2005. Out- of-cor e tensor approx imation of m ulti-dimension al matrices of visua l d ata. Interna tional Conference on Computer Graphics and Interactive T echniques, SIG- GRAPH 2005 , 24:5 27–53 5. Y . Xu, L. Zhan g, and W . Liu. 2 006. Cubic an alysis of social boo kmarkin g for p ersonalized re commend a- tion. Lecture Notes in Computer Science: F r ontiers of WWW Researc h and Developmen t – APW eb 20 06 , 3841:7 33–73 8. 23 Ap pendix: MA TLAB Source f or Multislice Proje ction ------------- -------------- ---------------------------------------------------- function fit = multislice(data_ dir,sparse_fil e,tucker_file,I,J) %MULTISLICE is a low RAM Tucker decompositi on % % Peter Turney % October 26, 2007 % % Copyright 2007, National Research Council of Canada % % This program is free software: you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation, either version 3 of the License, or % (at your option) any later version. % % This program is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You should have received a copy of the GNU General Public License % along with this program. I f not, see . % %% se t parameters % fprintf(’MULT ISLICE is running ...\n’); % maxloops = 50; % maximum number of iterations eigopts.disp = 0; % suppress messa ges from eigs() minfitchange = 1e-4; % minimum change in fit of tensor % %% ma ke slices of input data file % fprintf(’ preparing slices\n’); % mode1_dir = ’slice1’; mode2_dir = ’slice2’; mode3_dir = ’slice3’; % slice(data_di r,sparse_file, mode1_dir,1,I); slice(data_di r,sparse_file, mode2_dir,2,I); slice(data_di r,sparse_file, mode3_dir,3,I); % %% ps eudo HO-SVD initialization % % initialize B % M2 = zeros(I(2),I(2 )); for i = 1:I(3) X3_slice = load_slice(dat a_dir,mode3_di r,i); M2 = M2 + (X3_slice ’ * X3_slice); end for i = 1:I(1) X1_slice = load_slice(dat a_dir,mode1_di r,i); M2 = M2 + (X1_slice * X1_slice’); end [B,D] = eigs(M2 * M2’,J(2),’lm’ ,eigopts); % % initialize C % M3 = zeros(I(3),I(3 )); for i = 1:I(1) X1_slice = load_slice(dat a_dir,mode1_di r,i); M3 = M3 + (X1_slice ’ * X1_slice); end for i = 1:I(2) X2_slice = load_slice(dat a_dir,mode2_di r,i); M3 = M3 + (X2_slice ’ * X2_slice); end 24 [C,D] = eigs(M3 * M3’,J(3),’lm’ ,eigopts); % %% ma in loop % old_fit = 0; % fprintf(’ entering main loop of MULTISLICE\n’); % for loop_num = 1:maxloops % % update A % M1 = zeros(I(1),I(1 )); for i = 1:I(2) X2_slice = load_slice(dat a_dir,mode2_di r,i); M1 = M1 + ((X2_slice * C) * (C’ * X2_slice’)); end for i = 1:I(3) X3_slice = load_slice(dat a_dir,mode3_di r,i); M1 = M1 + ((X3_slice * B) * (B’ * X3_slice’)); end [A,D] = eigs(M1 * M1’,J(1),’lm’ ,eigopts); % % update B % M2 = zeros(I(2),I(2 )); for i = 1:I(3) X3_slice = load_slice(dat a_dir,mode3_di r,i); M2 = M2 + ((X3_slice’ * A) * (A’ * X3_slice)); end for i = 1:I(1) X1_slice = load_slice(dat a_dir,mode1_di r,i); M2 = M2 + ((X1_slice * C) * (C’ * X1_slice’)); end [B,D] = eigs(M2 * M2’,J(2),’lm’ ,eigopts); % % update C % M3 = zeros(I(3),I(3 )); for i = 1:I(1) X1_slice = load_slice(dat a_dir,mode1_di r,i); M3 = M3 + ((X1_slice’ * B) * (B’ * X1_slice)); end for i = 1:I(2) X2_slice = load_slice(dat a_dir,mode2_di r,i); M3 = M3 + ((X2_slice’ * A) * (A’ * X2_slice)); end [C,D] = eigs(M3 * M3’,J(3),’lm’ ,eigopts); % % build the core % G = zeros(I(1 ) * J(2) * J(3),1); G = reshape(G ,[I(1) J(2) J(3)]); for i = 1:I(1) X1_slice = load_slice(dat a_dir,mode1_di r,i); G(i,:,:) = B’ * X1_slice * C; end G = reshape(G ,[I(1) (J(2) * J(3))]); G = A’ * G; G = reshape(G ,[J(1) J(2) J(3)]); % % measure fit % normX = 0; sqerr = 0; for i = 1:I(1) X1_slice = load_slice(dat a_dir,mode1_di r,i); X1_approx = reshape(G,[J( 1) (J(2) * J(3))]); X1_approx = A(i,:) * X1_approx; X1_approx = reshape(X1_ap prox,[J(2) J(3)]); X1_approx = B * X1_approx * C’; 25 sqerr = sqerr + norm(X1_slice-X 1_approx,’fro’ )ˆ2; normX = normX + norm(X1_slice,’ fro’)ˆ2; end fit = 1 - sqrt(sqer r) / sqrt(normX); % fprintf(’ loop %d: fit = %f\n’, loop_num, fit); % % stop if fit is not increasing fast enough % if ((fit - old_fit) < minfitcha nge) break; end % old_fit = fit; % end % fprintf(’ total loops = %d\n’, loop_num); % %% sa ve tensor % output_file = [data_dir, ’/’, tucker_file]; save(output_f ile,’G’,’A’,’B ’,’C’); % fprintf(’ tucker tensor is in %s\n’,tucker _file); % fprintf(’MULT ISLICE is done\n’); % ------------- -------------- ---------------------------------------------------- function slice(data_dir,s parse_file,mod e_slice_dir,mode,I) %SLICE chops a tensor into slices along the given mode % % Peter Turney % October 20, 2007 % % Copyright 2007, National Research Council of Canada % %% in itialize % % set the secondary modes % if (mode == 1) r_mode = 2; c_mode = 3; elseif (mode == 2) r_mode = 1; c_mode = 3; else r_mode = 1; c_mode = 2; end % % get sizes % Ns = I(mode); % number of slices Nr = I(r_mode); % number of rows in each slice Nc = I(c_mode); % number of columns in each slice % %% so rt the index % fprintf(’SLIC E is running ...\n’); % % file names % sub_dir = [dat a_dir, ’/’, mode_slice_dir]; sorted_file = [sub_dir, ’/’, ’sorted.txt’]; % % make sure the directories exist % 26 if (isdir(data_dir) == 0) mkdir(data_di r); end if (isdir(sub_dir) == 0) mkdir(sub_dir ); end % % sort % sort_index(da ta_dir,sparse_ file,mode_slice_dir,mode); % %% co unt nonzeros in each slice % fprintf(’ counting nonzeros in each slice for mode %d\n’,mode); % % vector for storing nonzero count % nonzeros = zeros(Ns,1); % % read sorted file in blocks % % - read in blocks because file may be too big to fit in RAM % - textscan will create one cell for each field % - each cell will contain a column vector of the values in % the given field % - the number of elements in each column vector is the number % of lines that were read % desired_lines = 100000; actual_lines = desired_lines; % sorted_file_i d = fopen(sorted_file, ’r’); while (actual_lines > 0) block = textscan(so rted_file_id,’ %d %d %d % * f’,desired_li nes); mode_subs = block{mode}; actual_lines = size(mode_ subs,1); for i = 1:actual_li nes nonzeros(mode _subs(i)) = nonzeros(mode_subs(i)) + 1; end end fclose(sorted _file_id); % %% ma ke slices % fprintf(’ saving slices for mode %d\n’,mode); % sorted_file_i d = fopen(sorted_file, ’r’); for i = 1:Ns slice_file = sprintf(’%s/ slice%d.mat’, sub_dir, i); nonz = nonzeros(i); block = textscan(so rted_file_id,’ %d %d %d %f’,nonz); slice_rows = double(block {r_mode}); slice_cols = double(block {c_mode}); slice_vals = block{4}; slice = sparse(slic e_rows,slice_c ols,slice_vals,Nr,Nc,nonz); save(slice_fi le,’slice’); end fclose(sorted _file_id); % fprintf(’SLIC E is done\n’); % ------------- -------------- ---------------------------------------------------- function sort_index(data_ dir,sparse_fil e,mode_slice_dir,mode) %SORT_INDEX sorts a sparse tensor index file along the given mode % % Peter Turney % October 20, 2007 % % Copyright 2007, National Research Council of Canada 27 % %% so rt the index % fprintf(’SORT _INDEX is running ...\n’); % % file names % input_file = [data_dir, ’/’, sparse_file]; sub_dir = [dat a_dir, ’/’, mode_slice_dir]; sorted_file = [sub_dir, ’/’, ’sorted.txt’]; % % call Unix ’sort’ command % % -n = numerical sorting % -k = key to sort on % -s = stable sorting % -S = memory for sorting buffer % -o = output file % % - the ’sort’ command is a standard part of Unix and Linux % - if you are running Windows, you can get ’sort’ by % instal ling Cygwin % - the sort buffer is set here to 1 GiB; you can set it % to some othe r value, based on how much RAM you have % command = sprintf(’sort -n -s -S 1G -k %d,%d -o %s %s’, ... mode, mode, sorted_file, input_file); % fprintf(’ calling Unix sort for mode %d\n’, mode); unix(command) ; % fprintf(’SORT _INDEX is done\n’); % ------------- -------------- ---------------------------------------------------- function slice = load_slice(dat a_dir,mode_dir ,i) %LOAD_SLICE loads a sparse slice file % % Peter Turney % October 20, 2007 % % Copyright 2007, National Research Council of Canada % % file name % slice_file = sprintf(’%s/ %s/slice%d.mat ’, data_dir, mode_dir, i); % % load the file % data = load(slice_f ile); % % return the slice % slice = data.slice; % ------------- -------------- ---------------------------------------------------- function test %TEST illustrates how to use multislice.m % % Peter Turney % October 26, 2007 % % Copyright 2007, National Research Council of Canada % % test multislice.m % % set random seed for repeatable experiments % 28 rand(’seed’,5 678); % % set parameters % I = [100 110 120]; % input sparse tensor size J = [10 11 12]; % desired core tensor size density = 0.1; % percent nonzero % data_dir = ’tes t’; % directory for storing tensor sparse_file = ’spten.txt’ ; % fil e for storing raw data tensor tucker_file = ’tucker.mat ’; % fi le for storing Tucker tensor % % make a sparse random tensor and store it in a file % sparse_random _tensor(data_d ir,sparse_file,I,density); % % call multislice % tic; fit = multislice(da ta_dir,sparse_ file,tucker_file,I,J); time = toc; % % show results % fprintf(’\n’) ; fprintf(’Mult islice:\n’); fprintf(’I = [%d %d %d]\n’, I(1), I(2), I(3)); fprintf(’J = [%d %d %d]\n’, J(1), J(2), J(3)); fprintf(’dens ity = %f\n’, density); fprintf(’fit = %f\n’, fit); fprintf(’time = %.1f \n’, time); fprintf(’\n’) ; % ------------- -------------- ---------------------------------------------------- function sparse_random_te nsor(data_dir, sparse_file,I,density) %SPARSE_RANDO M_TENSOR makes a sparse uniformly distributed random tensor % % Peter Turney % October 20, 2007 % % Copyright 2007, National Research Council of Canada % % assume a third-order tensor is desired % %% in itialize % fprintf(’SPAR SE_RANDOM_TENS OR is running ...\n’); % % make sure the directory exists % if (isdir(data_dir) == 0) mkdir(data_di r); end % file_name = [data_dir, ’/’, sparse_file]; % fprintf(’ generating tensor of size %d x %d x %d with density %f\n’, ... I(1), I(2), I(3), density); % %% ma in loop % file_id = fopen(file_name , ’w’); fprintf(’ slice: ’); for i1 = 1:I(1) fprintf(’%d ’,i1); % show progress if ((mod(i1,1 0) == 0) && (i1 ˜= I(1))) fprintf(’\n ’); % time for new line end for i2 = 1:I(2) 29 for i3 = 1:I(3) if (rand < density) fprintf(file_ id,’%d %d %d %f\n’,i1,i2,i3, rand); end end end end fprintf(’\n’) ; fclose(file_i d); % fprintf(’SPAR SE_RANDOM_TENS OR is done\n’); % ------------- -------------- ---------------------------------------------------- 30

Empirical Evaluation of Four Tensor Decomposition Algorithms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment