Riemannian Tensor Completion with Side Information

Man uscript for Review Riemannian T ensor Completion with Side Information Tengfei Zhou Hui Qian ∗ Zebang Shen Congfu Xu Computer Science Institute, Zhejiang Univ ersit y { zhoutengfei zju, qianh ui, shenzebang, xucongfu@zju.edu.cn } No v em ber 21, 2021 Abstract By r estricting the iter ate on a nonline ar manifold, the re c ently pr op ose d Riemannian opti- mization metho ds pr ove to b e b oth eﬃcient and eﬀe ctive in low r ank tensor c ompletion pr oblems. However, existing metho ds fail to exploit the e asily ac c essible side information, due to their for- mat mismatch. Conse quently, ther e is stil l r o om for impr ovement in such metho ds. T o ﬁl l the gap, in this p ap er, a novel Riemannian mo del is pr op osed to or ganic al ly inte grate the original mo del and the side information by over c oming their inconsistency. F or this p articular mo del, an eﬃcient Riemannian c onjugate gr adient descent solver is devise d b ase d on a new metric that c aptures the curvatur e of the obje ctive. Numeric al exp eriments suggest that our solver is mor e ac curate than the state-of-the-art without c ompr omising the eﬃciency. I. Intr oduction Lo w Rank T ensor Completion (LR TC) problem, which aims to reco v er a tensor from its linear measuremen ts, arises naturally in many artiﬁcial intelligence applications. In h yp erspectral image inpain ting, LR TC is applied to in terp olate the unkno wn pixels based on the partial observ ation Xu et al. (2015) . In recommendation tasks, LR TC helps users ﬁnd interesting items under sp eciﬁc con texts such as lo cations or time Liu et al. (2015) . In computational phenotyping, one adopts LR TC to discov ery phenot ypes in heterogeneous electronic health records W ang et al. (2015) . Euclidean Mo dels: LR TC can b e form ulated by a v ariety of optimization models o v er the Eu- clide an space. Amongst them, conv ex mo dels that encapsulate LR TC as a regression problem p enalized by a tensor nuclear norm are the most p opular and well-understoo d Romera-Paredes and P on til (2013), Zhang et al. (2014) . Though most of them hav e sound theoretical guarantees Chen et al. (2013), Y uan and Zhang (2015), Zhang and Aeron (2016) , in general, their solv ers are ill-suited for large tensors b ecause these pro cedures usually inv olv e Singular V alue Decomp osition (SVD) of h uge matrices per iteration Liu et al. (2013) . Another class of Euclidean mo dels is form ulated as the decomp osition problem that factorizes a low rank tensor into small factors Filip ovi ´ c and Juki ´ c (2015), Jain and Oh (2014), Xu et al. (2015) . Man y solvers for suc h decomp osition based mo del ha v e b een prop osed to reco v er large tensors, and low p er-iteration computational cost is illustrated Beutel et al. (2014), Liu et al. (2014), Smith et al. (2016) . Riemannian Mo dels: LR TC can also be modeled b y nonconv ex optimization constrained on R iemannian manifolds Kasai and Mishra (2016), Kressner et al. (2014) , which is easily handled by man y manifold based solvers Absil et al. (2009) . Empirical comparison has shown that Riemannian solv ers use signiﬁcantly less CPU time to recov er the underlying tensor in contrast to the Euclidean solv ers Kasai and Mishra (2016) . The main reason resides in that such solvers a v oid SVD of ∗ Corresponding author. 1 Man uscript for Review h uge matrices by explicitly exploiting the geometrical structure of LR TC, which makes them more suitable for massiv e problem. Of all the Riemanian mo dels, tw o search spaces, ﬁx multi-linear rank manifold Kressner et al. (2014) and T uc k er manifold Kasai and Mishra (2016) , are usually emplo y ed. The former is a sub-manifold of Euclidean space, and the latter is a quotien t manifold induced b y the T ucker de c omp osition . Generally , quotient manifold based solvers hav e higher conv ergence rates b ecause it is usually easier to design a pre-conditioner for them Kasai and Mishra (2016), Mishra and Sepulchre (2016) . Side Information: In the Euclidean mo dels of LR TC, side information is helpful in impro ving the accuracy Acar et al. (2011), Beutel et al. (2014), Narita et al. (2011) . One common form of the side information is the fe atur e matrix , which measures the statistical properties of tensor mo des Kolda and Bader (2009) . F or example, in Netﬂix tasks, feature matrix can b e built from the demograph y of users Bell and Koren (2007) . Another form is the similarity matrix , which quantiﬁes the resemblance b etw een t w o en tities of a tensor mo de. F or instance, the so cial net w ork generates the similarity matrix by utilizing the corresp ondence b etw een users Rai et al. (2015) . In practice, these tw o matrices can b e transformed to eac h other, and we only consider the feature matrix case throughout this pap er. Ho w ev er, as far as w e know, side information has not b een incorp orated in any Riemannian mo del. The ﬁrst diﬃculty lies in the mo del design. F using the side information into the Riemannian mo del inevitably compromises the integrit y of the lo w rank tensor due to the compactness of the manifold. The second diﬃculty results from the solver design. Incorporating the side information ma y aggrav ate the ill-conditioning of LR TC problem and degenerates the conv ergence signiﬁcantly . Con tributions: T o address these diﬃculties, a nov el Riemannian LR TC metho d is proposed from the p ersp ective of b oth mo del and solver designs. By exploring the relation b etw een the subspace spanned by the tensor ﬁb ers and the column space of the feature matrix, we explicitly integrate the side information in a compact wa y . Mean while, a ﬁrst order solv er is devised under the manifold optimization framework. T o ease the ill-conditioning, we design a no v el metric based on an ap- pro ximated Hessian of the cost function. The metric implicitly induce an adaptive preconditioner for our solver. Empirical studies illustrate that our metho d ac hiev es muc h more accurate solutions within comparable pro cessing time than the state-of-the-art. I I. Not a tions and Preliminaries In this paper, we only focus on the 3rd order tensor, but generalizing our metho d to high order is straigh t forw ard. W e use the notation X ∈ R n × m to denote a matrix, and the notation X ∈ R n 1 ×···× n d to denote a d -th order tensor. W e also denote by X ( i 1 , · · · , i d ) the element in p osition ( i 1 , · · · , i d ) of X . F or many cases, we use curly braces with indexes to simplify the notation. F or example, { O i } 3 i =1 is used to denote O 1 , O 2 , O 3 , and { U i O i } 3 i =1 refer to U 1 O 1 , U 2 O 2 , U 3 O 3 . Mo de- k Fiber and matricization: A ﬁb er of a tensor is obtained by v arying one index while ﬁxing the others, i.e. X ( i 1 , · · · , i k − 1 , : , i k +1 , · · · , i d ) is the mode- k ﬁb er of a d -th order tensor X . Here w e use the colon to denote { 1 , . . . , n k } . A mo de- k matricization X ( k ) ∈ R n k × ( n 1 ··· n k − 1 n k +1 ··· n d ) of a tensor X is obtained b y arranging the mode- n ﬁbers of X so that eac h of them is a column of X ( k ) Kolda and Bader (2009) The mode- k product of tensor X and matrix A is denoted b y X × k A , whose mode- k matricization can be expressed as ( X × k A ) ( k ) = A X ( k ) . F or 3rd order tensor X and matrix A 1 , A 2 , A 3 , w e use X × 3 i =1 A i to denote X × 1 A 1 × 2 A 2 × 3 A 3 . Inner pro duct and norm: The inner pro duct of t w o tensors with the same size is deﬁned by h X , Y i = P i 1 , ··· ,i d X ( i 1 , · · · , i d ) Y ( i 1 , · · · , i d ). The F rob enius norm of a tensor X is deﬁned b y k X k F = p h X , X i . Multi-linear rank and T uc k er decomp osition: The m ulti-linear rank rank vec ( X ) of a ten- sor X ∈ R n 1 × n 2 × n 3 is deﬁned as a vector (rank( X (1) ) , rank( X (2) ) , rank( X (3) )). If rank vec ( X ) = 2 Man uscript for Review ( r 1 , r 2 , r 3 ), tuc k er decomposition factorizes X into a small core tensor G ∈ R r 1 × r 2 × r 3 and three matrices U i ∈ R n i × r i with orthogonal columns, that is X = G × 3 i =1 U i . Note that, the tuck er de- comp osition of a tensor is not unique. In fact, if X = G × 3 i =1 U i , we can easily obtain X = H × 3 i =1 V i , with H = G × 3 i =1 O > i , V i = U i O i , where O i ∈ R r i × r i is any orthogonal matrix. Thus, w e obtain the equiv alent class [ G , { U i } 3 i =1 ] , { ( G × 3 i =1 O > i , { U i O i } 3 i =1 ) | O > i O i = I i } . F or simplicit y , we denote [ G , { U i } 3 i =1 ] b y [ X ], when X = G × 3 i =1 U i . Usually , [ X ] is called the T uck er represen tation of X , while X is call the tensor represen tation of [ X ]. W e also use X to denote a sp eciﬁc decomp osition of X , additionally X ∈ [ X ]. I I.1 Searc h Space of Riemannian Mo dels The T uc k er manifold that we used in our Riemannian mo del is a quotient manifold induced by the T uck er decomp osition. In order to lay the ground for T uc k er manifold, we ﬁrst describ e its coun- terpart, the ﬁx m ulti-rank manifold, whic h will b e helpful in understanding the whole deriv ation. A ﬁxed multi-linear rank manifold F r consists of tensors with the same ﬁxed multi-linear rank. Sp eciﬁcally F r = { X ∈ R n 1 × n 2 × n 3 | rank vec ( X ) = r } . T o deﬁne the T uck er manifold, we ﬁrst deﬁne a total space M r = R r 1 × r 2 × r 3 × St( r 1 , n 1 ) × St( r 2 , n 2 ) × St( r 3 , n 3 ) , (1) in which St( r i , n i ) is the Stiefel manifold of a n i × r i matrix with orthogonal columns. Then, we can depict the T uck er manifold of m ulti-linear rank r as follo ws. M r / ∼ ,  [ G , { U i } 3 i =1 ] | ( G , { U i } 3 i =1 ) ∈ M r  . (2) The T uc k er manifold is a quotient manifold of the total space (1). W e use the abstract quotient manifold, rather than the concrete total space, as search space b ecause the non-uniqueness of the T uck er decomposition is undesirable for optimization. Note that such non-uniqueness will in tro duce more lo cal optima in to the minimization. The relation of manifold F r and M r / ∼ is characterized as follo ws. Prop osition 1. The quotient manifold M r / ∼ is diﬀeomorphic to the ﬁx multi-line ar r ank manifold F r , with diﬀe omorphism ρ ( · ) fr om F r to M r / ∼ deﬁne d by ρ ( X ) = [ G , { U i } 3 i =1 ] wher e [ G , { U i } 3 i =1 ] is the tucker r epr esentation of X . This prop osition sa ys that eac h tensor X ∈ F r can b e represen ted b y a unique equiv alent class [ G , { U i } 3 i =1 ] ∈ M / ∼ and vice-versa. I I.2 V anilla Riemannian T ensor Completion The purest incarnation of Riemannian tensor completion mo del is the Riemannian mo del ov er the ﬁx multi-linear rank manifold. Let R ∈ R n 1 × n 2 × n 3 b e a partially observed tensor. Let Ω b e the set whic h con tains the indices of observed entries. The mo del can b e expressed as: min X 1 2 kP Ω ( X − R ) k 2 F s.b.t X ∈ F r , (3) with P Ω maps X to the sparsiﬁed tensor P Ω ( X ), where P Ω ( X )( i 1 , i 2 , i 3 ) = X ( i 1 , i 2 , i 3 ) if ( i 1 , i 2 , i 3 ) ∈ Ω, and P Ω ( X )( i 1 , i 2 , i 3 ) = 0 otherwise. Another p opular mo del, T uc k er mo del, is based on the quotient manifold M r / ∼ , which can b e expressed as: min X 1 2 kP Ω ( ρ − 1 ([ X ]) − R ) k 2 F s.b.t [ X ] ∈ M r / ∼ , (4) 3 Man uscript for Review with ρ deﬁned in Prop. 1. Note that since the dawn of Riemannian framework for LR TC , a quandary exists: on one hand, sparse measuremen t limits the capacit y of the solution; on the other hand, rich side information can not b e incorp orated in to this framework. In man y artiﬁcial intelligence applications, demands for high accuracy further exacerbates suc h dilemma. I I I. Riemannian Model with Side Informa tion W e fo cus on the case that the side information is encoded in feature matrices P i ∈ R n i × k i . Supp ose R ∈ F r has tuck er factors ( G , { U i } 3 i =1 ).Without loss of generality , we assume that k i ≥ r i and P i has orthogonal columns. In the ideal case, w e assume that span( U i ) ⊂ span( P i ) . (5) Suc h relation means that the feature matrices con tain all the information in the laten t space of the underlying tensor. Equiv alently , there exists a matrix W i suc h that U i = P i W i . How ev er, in practice, due to the existence of noise, one can only exp ect such relation to hold approximately , i.e. U i ≈ P i W i . Incorp orating suc h relation to a tensor completion mo del via penalization, w e ha v e the follo wing form ulation min G , U i , W i L ( G , { U } 3 i =1 ) + 3 X i =1 α i | Ω | 2 k U i − P i W i k 2 F , s.t. ( G , { U i } 3 i =1 ) ∈ M r , (6) where L ( G , { U i } 3 i =1 ) = kP Ω ( G × 3 i =1 U i − R ) k 2 F / 2. Fixing G and U i , with resp ect to W i , (6) has a close form solution W i = ( P > i P i ) − 1 P > i U i = P > i U i . (7) Since min x,y l ( x, y ) = min x l ( x, y ( x )) where y ( x ) = arg min y l ( x, y ), one can substitute (7) into the ab o v e problem and obtain the following equiv alence min G , U i L ( G , { U } 3 i =1 ) + 3 X i =1 α i | Ω | 2 trace( U T i ( I i − P i P T i ) U i ) , f ( G , { U i } 3 i =1 ) s.t. ( G , { U i } 3 i =1 ) ∈ M r . (8) Although the cost function is already smo oth ov er the total space M r , due to its in v ariance o v er the equiv alent class [ G , { U i } 3 i =1 ], there can b e inﬁnite lo cal optima, which is extremely undesirable. Indeed, if ( G , { U i } 3 i =1 ) is a lo cal optimal of the ob jective, then so is every p oint in the inﬁnite set [ G , { U i } 3 i =1 ]. One wa y to reduce the n um b er of lo cal optima is to mathematically treated the entire set [ G , { U i } 3 i =1 ] as a point. Consequen tly , we redeﬁne the cost b y ˜ f ([ G , { U i } 3 i =1 ]) = f ( G , { U i } 3 i =1 ) and obtain the follo wing Remainnian optimization problem o ver the quotient manifold M r / ∼ : min [ X ] ˜ f ([ X ]) s.t. [ X ] ∈ M r / ∼ . (9) R emark 1 . In Riemannian optimization literature, problem (8) is called the lifted re presen tation of problem (9) o v er the total space Absil et al. (2009) . This mo del is closely related to the Laplace regularization mo del Narita et al. (2011) . Concretely , they share the same form: min G , U i L ( G , { U i } 3 i =1 ) + 3 X i =1 C i 2 trace( U > i L i U i ) . (10) 4 Man uscript for Review ! " ! " # " ! # " " " ! " $ ! " $ ! " " # " # " ! " $ ! ! " $ " $ " " ! Figur e 1: Optimization F r amework for Quotient Manifold: most Riemannian solvers ar e b ase d on the iter ation formula: [ x + ] ← R [ x ] ( tη [ x ] ) , wher e t > 0 is the stepsize, η [ x ] is the se ar ch dir e ction picke d fr om curr ent tangent sp ac e T [ x ] M / ∼ , and R [ x ] ( · ) is the r etr action, i.e. a map fr om curr ent tangent sp ac e to M / ∼ . Due to the abstr actness of quotient manifold, such iter ation is often lifte d to (r epr esente d in) the total sp ac e as x = R x ( tη x ) wher e x ∈ [ x ] , η x is the horizontal lift of η [ x ] , and R x ( · ) is the lifte d r etr action. Such r epr esentation is p ossible only if M / ∼ has the structur e of R iemannian quotient, that is the total sp ac e is endowe d with an invariant Riemannian metric. The diﬀerence lies in that L i is a pro jection matrix in our case, while, in the Laplace regularization mo del, L i is a Laplacian matrix. R emark 2 . Since each [ X ] ∈ M r / ∼ has a unique tensor represen tation in X ∈ F r , w e sho w that the abstract mo del (9) can b e represented as a concrete mo del ov er the manifold F r . Sp eciﬁcally , the follo wing Prop osition interprets the prop osed model as an optimization problem with a regularizer that encourages the mo de- i space of the estimated tensor close to span( P i ). Pr op osition 2 . if [ X ] is a critic al p oint of pr oblem (9) then its tensor r epr esentation X is a critic al p oint of the fol lowing pr oblem. min X ∈F r 1 2 kP Ω ( X − R ) k 2 F + 3 X i =1 α i | Ω | 2 dist 2 (span( X ( i ) ) , span( P i )) wher e dist( · , · ) is the Cho dal distanc e Y e and Lim (2014) b etwe en two subsp ac es. And vic e versa. IV. Riemannian Conjuga te Gradient Descent W e depict the optimization framew ork for quotien t manifolds in Fig. 1. Under this framework, w e solve the prop osed problem (9) by Riemannian Conjugate Gradien t descent (CG). With the details sp eciﬁed later, we list our CG solver for problem (9) in Alg. 1, where the CG direction is comp osed in the Polak-Ribiere+ manner with the momentum weigh t β ( k ) computed by Flecher- Reev es form ula Absil et al. (2009) , and T k ( · ) is the pro jector of horizontal space H X ( k ) . T o represent Alg. 1 in concrete tensor formulations, four items must b e sp eciﬁed: the Riemannian metric h· , ·i X , the Riemannian gradient grad f ( X ), the retraction R X ( · ), and the pro jector onto horizontal space T X . 5 Man uscript for Review Algorithm 1 CGSI: a Riemannian CG metho d Input: Initializer X (0) = ( G (0) , { U (0) i } 3 i =1 ) and tolerance  1: k = 0; 2: η ( − 1) = ( 0 , { 0 } 3 i =1 ); 3: rep eat 4: compute current Riemannian gradien t ξ ( k ) = grad f ( X ( k ) ); 5: comp ose CG direction η ( k ) = − ξ ( k ) + β ( k ) T k ( η ( k − 1) ); 6: c ho ose a step size t k > 0; 7: up date by retraction X ( k +1) = R X ( k ) ( t k η ( k ) ); 8: k = k + 1; 9: until h ξ ( k − 1) , ξ ( k − 1) i X ( k − 1) ≤  ; 10: return X ( k ) . IV.1 Metric T uning Riemannian metric h· , ·i X of M r is an inner pro duct deﬁned o v er each tangen t space T X M r . A high-qualit y Riemannian solver for a quotient manifold should b e equipp ed with a well-tuned metric, b ecause (1) the metric determines the diﬀerential structure of the quotient manifold, and more imp ortantly (2) it implicitly endows the solver with a preconditioner, which heavily aﬀects the con v ergen t rate Mishra (2014), Mishra and Sepulchre (2014) . F rom the p ersp ective of preconditioning, it seems that the b est candidate is the Newton metric h η , ξ i X = D 2 f ( X )[ η , ξ ] ∀ η , ξ ∈ T X M r where D 2 f ( X ) is the second order diﬀerential of the cost function. Ho w ev er, under such metric, computing the searc h direction inv olv es solving a large system of linear equations, which precludes the Newton metric from the application to h uge datasets. Therefore, w e prop ose to use the following alternative: h η X , ξ X i X = D 2 g ( X )[ η X , ξ X ] = 3 X i =1 h η i , ξ i G ( i ) G > ( i ) i + h η G , ξ G i + 3 X i =1 N α i h η i , ( I i − P i P > i ) ξ i i , (11) in which g ( X ) is a scaled approximation to the original cost function, that is g ( X ) , 1 2 k G × 3 1 U i − R k 2 F + P 3 i =1 α i N 2 trace( U T i ( I i − P i P T i ) U i ) with N = n 1 n 2 n 3 . Our metric is more scalable than Newton metric. The follo wing Prop osition indicates that the scale gradient induced by this metric can b e computed with only O ( P 3 i =1 n i k i r i + r 3 i ) additional op erations, whic h is m uc h less than the op erations required by Newton metric. Prop osition 3. Supp ose that the c ost function f ( · ) has Euclide an gr adient ∇ f ( X ) = ( ∇ G f , {∇ U i f } 3 i =1 ) . Then its sc ale d gr adient ˜ ∇ f ( X ) under the metric (11) c an b e c omputed by: ˜ ∇ G f ( X ) = ∇ G f ( X ) ˜ ∇ U i f ( X ) = E i G − 1 i + F i ( G i + N α i I i ) − 1 wher e E i = P i P > i ∇ U i f , F i = ∇ U i f − E i , and G i = G ( i ) G ( i ) > . Moreo v er, the proposed metric con tains the curv ature information of the cost. It is easy to v alidate that D 2 f ( X ) / | Ω | ≈ D 2 g ( X ) / N , since f ( X ) / | Ω | ≈ g ( X ) / N if the observ ed entries are sampled uniformly at random. The ﬁnal prop osition suggests that the prop osed metric makes the representation of solvers in the total space p ossible. 6 Man uscript for Review Pro jector F ormulation Ψ X ( Z G , { Z i } 3 i =1 ) pro jection of an ambien t vector ( Z G , { Z i } 3 i =1 ) onto T X M r ( Z G , { Z i − V i S i G − 1 i − W i S i G − 1 α i } 3 i =1 ) where S i is the solution of :      sym( V T i V i S i G − 1 i − U > i Z i ) = − sym( W T i W i S i G − 1 α i ) S i = S > i Π X ( η X ) Pro jection of a tangent vector η X of total space onto H X ( η G + X 1 ≤ i ≤ 3 G × i Ω i , { η i − U i Ω i } 3 i =1 ) where ( Ω 1 , Ω 2 , Ω 3 ) is the solution of                            skw( V T i V i Ω i G i + G i Ω i + W > i W i Ω i G α i ) − G ( i ) ( I j i ⊗ Ω k i ) G ( i ) > − G ( i ) ( Ω j i ⊗ I k i ) G ( i ) > = skw( V > i η i G i + W > i η i G α i ) + skw( G ( i ) ( η G ) > ( i ) ) Ω > i = − Ω i ∀ i ∈ { 1 , 2 , 3 } T able 1: Expr essions of Pr oje ctors. We deﬁne the fol lowing matric es: V i := P i P > i U i , W i := U i − V i , G i := G ( i ) G ( i ) > , G α i := N α i I i + G ( i ) G ( i ) > . j i = max { k | k ∈ { 1 , 2 , 3 } , k 6 = i } and k i = min { k | k ∈ { 1 , 2 , 3 } , k 6 = i } . A nd the op er ator sym( · ) and skw ( · ) extr act the symmetric and skew c omp onents of a matrix r esp e ctively, i.e. sym( A ) = ( A + A > ) / 2 and skw( A ) = ( A − A > ) / 2 . Note that the ab ove line ar systems c an b e solve d by MA TLAB c ommand pcg in O ( P 1 ≤ i ≤ 3 ( n i k 2 i + r 3 i )) ﬂops. Prop osition 4. The quotient manifold M r / ∼ admits a structur e of Riemannian quotient manifold, if M r is endowe d with the Riemannian metric deﬁne d in (11). IV.2 Other Optimization Related Items Pro jectors: T o deriv e the optimization related items, t w o orthogonal pro jectors, Ψ X ( · ) and Ψ X ( · ), are required. The former pro jects a v ector on to the tangen t space T X M r , and the latter is a pro jec- tor from the tangent space onto the horizontal space H X . The orthogonality of b oth pro jectors is measured by the metric (11). Mathematical deriv ation of these pro jectors are given in Sec. VI I.2.2 and Sec. VI I.2.1. Riemannian Gradient: According to Absil et al. (2009) , the Riemannian gradient can b e com- puted b y pro jecting the scaled gradient onto tangent space, sp eciﬁcally grad f ( X ) = Ψ X ( ˜ ∇ f ( X )) . (12) Retraction: W e use the retraction deﬁned by R X ( η X ) = ( G + η G , { uf ( U i + η i ) } 3 i =1 ) . (13) where uf ( · ) extracts the orthogonal component from a matrix. Such retraction is prop osed by Kasai and Mishra (2016) . we give rigorous analysis to pro v e that the ab ov e retraction is compatible with the prop osed metric in Sec. VI I.2.3. V. Experiments W e v alidate the eﬀectiveness of the proposed solver CGSI by comparing it with the state-of-the-art. The baseline can be partitioned into three classes. The ﬁrst class con tains Riemannian solv ers including GeomCG Kressner et al. (2014) , FTC Kasai and Mishra (2016) , and gHOI Liu et al. 7 Man uscript for Review (2016) . The second class consists of Euclidean solvers that take no account of the side information, including AltMin Romera-Paredes et al. (2013) and HalR TC Liu et al. (2013) . The third class comprises of tw o metho ds that incorp orate side information, including R UBIK W ang et al. (2015) and TF AI Narita et al. (2011) . All the exp erimen ts are p erformed in Matlab on the same machine with 3.0 GHz In tel E5-2690 CPU and 128GB RAM. All solv ers are based on the T uck er decomp osition, except that RUBIK is based on the CP decomp osition. F or fairness, the CP rank of RUBIK is set to d ( P 3 i =1 n i r i + r 1 r 2 r 3 ) / ( P 3 i =1 n i ) e . V.1 Hyp ersp ectral Image Inpain ting A h ypersp ectral image is a tensor whose the slices are photographs of the same scene under diﬀeren t w a v elets. W e adopt the dataset provided in F oster et al. (2006) whic h contains images about eigh t diﬀerent rural scenes taken under 33 v arious wa v elets. T o make all metho ds in our baseline applicable to the completion problem, w e resize each hypersp ectral images to a small dimension suc h that n 1 = 306, n 2 = 402, and n 3 = 33. Empirically , w e treat these graphs as tensors of rank r = (30 , 30 , 6). The observ ed pixels, or the training set, are sampled from the tensors uniformly at random. And the sample size is set to | Ω | = O S × p in which OS is so-called Over-Sampling ratio and p = P 3 i =1 ( n i r i − r 2 i ) + r 1 r 2 r 3 is the num ber of free parameters in a size n tensor with rank r . In addition to the observ ed entries, the mo de-1 feature matrix is constructed by extracting the top-( r 1 + 10) singular vectors from a matrix of size n 1 × 10 r 1 whose columns are sampled from the mo de-1 ﬁb ers of the hypersp ectral graphs. The recov ery accuracy is measured by Normalized Ro ot Mean Square Error (NRMSE) Kressner et al. (2014) . All the compared metho ds are terminated when the training NRMSE is less than 0 . 003 or iterate more than 300 epo chs. W e rep ort the NRMSE and CPU time of the compared metho ds in T ab. 2. F rom the table, we can see that the prop osed metho d has muc h higher accuracy than the other solv ers in our baseline. The empirical results also indicate that our metho d has nearly the same running time with FTC, the fastest tensor completion metho d. The visual results of the 27th slices of recov ered hypersp ectral images of scene 7 are illustrated in Fig. 2. Original Observ ed CGSI R UBIK FTC Figur e 2: Visual r esults of the r e c over e d 27th fr ame of sc ene7 when OS is set to 3. V.2 Recommender System In recommendation tasks, tw o datasets are considered: MovieLens 10M (ML10M) and MovieLens 20M (ML20M). Both datasets con tain the rating history of users for items at sp eciﬁc momen ts. F or b oth datasets, w e partition the samples into 731 slices in terms of time stamp. Those slices ha v e the identical time interv als. Accordingly , the completion tasks for the tw o datasets are of sizes 71567 × 10681 × 731 and 138493 × 26744 × 731 resp ectiv ely . In addition to the rating history , b oth datasets contain tw o extra ﬁles: one describ es the genres of eac h movie, and the other contains tags of each movie. W e construct a corpus that con tains the text description of all movies from the genres descriptions and all the tags. The feature matrix is extracted from the ab ov e corpus by the laten t semantic analysis (LSA) metho d. The pro cessing is eﬃcient since LSA is implemented via randomized SVD. 8 Man uscript for Review T able 2: Performanc e of the c omp ar e d metho ds on hyp ersp e ctr al images. AltMin FTC GeomCG gHOI HalR TC RUBIK TF AI CGSI data OS NRMSE Time(s) NRMSE Time NRMSE Time NRMSE Time NRMSE Time NRMSE Time NRMSE Time NRMSE Time Scene1 3 0.161 183 0.091 52 0.113 61 0.115 65 0.080 177 0.086 197 0.161 164 0.062 77 5 0.156 307 0.067 76 0.077 93 0.103 109 0.078 177 0.085 194 0.159 273 0.040 100 7 0.156 429 0.060 100 0.056 124 0.092 152 0.077 177 0.085 195 0.159 382 0.039 110 9 0.156 550 0.046 126 0.044 151 0.078 195 0.077 178 0.085 198 0.156 479 0.036 126 Scene2 3 0.173 183 0.093 50 0.114 61 0.125 65 0.066 173 0.061 197 0.173 165 0.048 83 5 0.166 306 0.082 76 0.076 92 0.100 103 0.066 171 0.061 196 0.171 203 0.043 96 7 0.166 428 0.073 101 0.064 123 0.091 152 0.057 172 0.061 197 0.169 386 0.040 110 9 0.166 578 0.062 125 0.056 154 0.073 197 0.057 171 0.060 197 0.169 433 0.038 130 Scene3 3 0.033 226 0.041 68 0.044 181 0.043 187 0.034 174 0.062 189 0.063 131 0.025 83 5 0.033 346 0.030 99 0.029 251 0.037 308 0.033 177 0.061 185 0.062 209 0.021 108 7 0.033 486 0.023 124 0.021 389 0.033 177 0.031 177 0.059 187 0.057 210 0.018 131 9 0.033 587 0.019 156 0.021 386 0.031 491 0.029 172 0.034 189 0.033 229 0.017 143 Scene4 3 0.033 238 0.031 78 0.036 181 0.038 193 0.047 172 0.034 182 0.033 155 0.012 105 5 0.033 359 0.015 108 0.015 254 0.031 293 0.032 171 0.037 183 0.033 247 0.012 118 7 0.033 486 0.012 128 0.012 391 0.021 177 0.029 177 0.027 180 0.033 181 0.011 131 9 0.033 600 0.012 170 0.012 398 0.018 492 0.026 177 0.024 192 0.033 231 0.010 144 Scene5 3 0.059 236 0.051 75 0.077 180 0.169 187 0.086 169 0.126 180 0.062 99 0.024 104 5 0.059 362 0.041 104 0.051 254 0.113 289 0.076 171 0.059 183 0.061 128 0.022 114 7 0.059 483 0.034 137 0.037 325 0.089 398 0.047 173 0.054 190 0.061 181 0.021 128 9 0.059 603 0.028 166 0.029 400 0.065 494 0.042 173 0.058 192 0.061 229 0.021 142 Scene6 3 0.090 237 0.067 76 0.057 181 0.132 189 0.095 177 0.090 180 0.091 170 0.036 107 5 0.090 356 0.039 105 0.040 251 0.095 298 0.083 177 0.081 180 0.091 213 0.034 119 7 0.090 489 0.039 130 0.040 325 0.095 394 0.083 178 0.081 181 0.091 300 0.034 136 9 0.090 600 0.039 165 0.040 396 0.095 501 0.083 178 0.081 183 0.091 383 0.034 143 Scene7 3 0.071 245 0.073 82 0.069 181 0.075 193 0.077 172 0.069 181 0.072 165 0.031 119 5 0.072 377 0.034 102 0.032 225 0.064 293 0.069 172 0.067 180 0.072 203 0.028 158 7 0.072 581 0.028 161 0.028 336 0.052 452 0.062 171 0.064 181 0.072 302 0.026 157 9 0.072 603 0.027 170 0.027 400 0.041 494 0.057 173 0.058 183 0.072 183 0.026 189 Scene8 3 0.039 236 0.030 74 0.042 181 0.050 187 0.071 174 0.034 179 0.040 131 0.013 103 5 0.039 354 0.018 107 0.019 247 0.038 293 0.061 174 0.040 182 0.045 213 0.012 114 7 0.039 701 0.013 102 0.013 381 0.030 234 0.031 181 0.030 182 0.060 363 0.011 169 9 0.039 853 0.012 112 0.012 502 0.026 369 0.027 175 0.031 183 0.039 502 0.011 180 9 Man uscript for Review α 10 -4 10 -3 10 -2 10 -1 10 0 10 1 RMSE 0.78 0.785 0.79 0.795 0.8 0.805 0.81 0.815 0.82 0.825 0.83 CGSI for ML10M CGSI for ML20M FTC for ML20M FTC for ML10M Figur e 3: Eﬀe ct of p ar ameter α on the ac cur acy of CGSI. V arious empirical studies are conducted to v alidate the p erformance of the prop osed metho d. In the ﬁrst scenario, w e record the CPU time and the Ro ot Mean Square Error (RMSE) outputted b y the compared algorithms under diﬀerent choices of m ulti-linear rank. In this scenario, for b oth datasets, 80% samples are chosen as training set, and the rest are left for testing. The results are listed in T ab. 3, which suggests that the prop osed metho d outp erforms all other solvers in terms of accuracy . F or ML10M, our metho d uses signiﬁcantly less CPU time than its comp etitors. In Fig. 4, we report another scenario, in which the p ercen tage of training samples are v aried from 10% to 70% and the rank parameter is ﬁxed to (10 , 10 , 10). Experimental results in this ﬁgure indicate that our metho d has the low est RMSE. T o show the impact of parameter α on the p erformance of our metho d, we depict the relation b et w een RMSE and α in Fig. 3, where the rank parameter is set to (10 , 10 , 10), and the partitioning sc heme for training and testing samples is the same as the ﬁrst scenario. F rom this Figure we can see that our metho d has higher accuracy than the v anilla Riemannian mo del’s solver FTC for a wide range of parameter c hoices. T able 3: Performanc e of the c omp ar e d metho ds on R e c ommendation T asks. AltMin FTC GeomCG gHOI TF AI CGSI dataset rank RMSE Time(s) RMSE Time RMSE Time RMSE Time RMSE Time RMSE Time ML10M (4,4,4) 0.982 924 0.824 236 0.835 307 1.076 467 1.011 426 0.823 178 (6,6,6) 0.968 1830 0.814 535 0.826 679 1.262 1035 0.9948 942 0.814 434 (8,8,8) 1.01 3123 0.822 928 0.833 1135 1.062 1734 0.993 1617 0.810 754 (10,10,10) 1.147 4963 0.824 1631 0.843 2220 1.094 2788 0.992 2522 0.807 1067 ML20M (4,4,4) 1.061 690 0.822 466 0.829 601 1.050 918 1.029 797 0.818 363 (6,6,6) 1.089 3451 0.808 982 0.822 1309 1.057 1869 1.008 1644 0.805 1107 (8,8,8) 1.092 5890 0.812 1725 0.828 2271 1.045 3363 1.004 3144 0.804 1739 (10,10,10) 1.092 9418 0.818 3161 0.834 4308 1.054 5795 1.025 5394 0.799 2813 10 Man uscript for Review Percentage of Training Entries 10% 30% 50% 70% RMSE 0 0.2 0.4 0.6 0.8 1 1.2 AltMin FTC GeomCG gHOI TFAI CGSI Percentage of Training Entries 10% 30% 50% 70% 0 0.2 0.4 0.6 0.8 1 1.2 Figur e 4: A c cur acy of c omp ar e d metho ds under diﬀer ent size of tr aining set VI. Conclusion In this pap er, we exploit the side information to improv e the accuarcy of Riemannian tensor com- pletion. A no v el Riemanian mo del is prop osed. T o solve the mo del eﬃciently , w e design a new Riemannian metric. Suc h metric will induce an adaptive preconditioner for the solvers of the pro- p osed mo del. Then, w e devise a Riemannian conjugate gradient descen t method using the adaptive preconditioner. Empirical results show that our solv er outperforms state-of-the-arts. VI I. appendix VI I.1 Pro of of Prop ositions Before delve in to the proofs of the propositions, we construct the submersion b etw een the total space M r and ﬁx m ultilinear rank manifold F r in the follo wing Lemma. Lemma 5. L et π : M r → F r b e a mapping deﬁne d by π ( G , U 1 , U 2 , U 3 ) = G × 3 i =1 U i . Then it is a submersion fr om M r to F r . Pr o of. T o b egin with, we deﬁne a function π : M r → F r as follo ws π ( G , U 1 , U 2 , U 3 ) = G × 3 i =1 U i . Note that π () is a smo oth function o ver M r , and for all X = ( G , U 1 , U 2 , U 3 ) ∈ M r , and for all the tangen t vectors η X = ( η G , η 1 , η 2 , η 3 ) ∈ T X M r , the ﬁrst order deriv ativ e of π () can b e computed as follo ws: D π ( X )[ η X ] = η G × 3 i =1 U i + G × 1 η 1 × 2 U 2 × 3 U 3 + G × 1 U 1 × 2 η 2 × 3 U 3 + G × 1 U 1 × 2 U 2 × 3 η 3 (14) Note that η G ∈ R r 1 × r 2 × r 3 and η i ∈ T U i St( r i , n i ) whic h means they can be expressed as η i = U i Ω i + U i, ⊥ K i where Ω i ∈ R r i × r i is a sk ew matrix, K i ∈ R ( n i − r i ) × r i and U i, ⊥ ∈ R n i × ( n i − r i ) is the orthogonal basis, the spanned subspace of which is the orthogonal complement of span( U i ). Substitute these expressions to equation (14), w e hav e: D π ( X )[ η X ] = ( η G + 3 X i =1 G × i Ω i ) × 3 i =1 U i + 3 X i =1 G × i U i, ⊥ K i × j 6 = i, 1 ≤ j ≤ 3 U j . (15) 11 Man uscript for Review Therefore, the range of the map D π ( X )[ · ] ov er the tangent space T X M r range( D π ( X )) = ( H × 3 i =1 U i + 3 X i =1 G × i U i, ⊥ K i × j 6 = i, 1 ≤ j ≤ 3 U j | H ∈ R r 1 × r 2 × r 3 , K i ∈ R ( n i − r i ) × r i ) (16) Note that the tangent space of ﬁx multilinear rank manifold F r at the p oint X = π ( G , U 1 , U 2 , U 3 ) is T X F r = ( H × 3 i =1 U i + 3 X i =1 G × i V i × j 6 = i, 1 ≤ j ≤ 3 U j | H ∈ R r 1 × r 2 × r 3 , V i ∈ R n i × r i and V i U i = 0 ) . (17) Using the fact any matrix V i ∈ R n i × r i and V > i × U i = 0 , there exist K i ∈ R ( n i − r i ) × r i suc h that V i = U i, ⊥ K i , w e can infer that range( D π ( X )) = T X F r . (18) As a result, π ( · ) is a submersion from M r to F r . VI I.1.1 Horizon tal Space Prop osition 6. L et X = ( G , U 1 , U 2 , U 3 ) ∈ [ X ] , the horizontal sp ac e of M r at p oint X is  η X ∈ T X M r | V > i η i G i + W > i η i G α i is symmetric ∀ 1 ≤ i ≤ 3  wher e V i = P i P > i U i , W i = U i − P i P > i U i , G i = G ( i ) G > ( i ) , G α i = N α i I i + G ( i ) G > ( i ) . Pr o of. Let X ∈ F r b e a tensor with tuck er factorization X = ( G , U 1 , U 2 , U 3 ) ∈ [ X ]. In quotient manifold framework Absil e t al. (2009), the equiv alent class [ X ] is called the ﬁb er of total space. The lifted represen tation of the tangen t space T [ X ] M r / ∼ is iden tiﬁed with a subspace of the tangen t space T X M r that do es not pro duce a displacement along the ﬁb er [ X ]. This is realized b y decomp osing T X M r in to tw o complementary subspace, the v ertical and horizon tal spaces, such that T X M r = H X ⊕ V X , where H X is the horizon tal space and V X is the v ertical space. It should b e emphasized that the decomp osition is resp ect to the metric (11). The vertical space V X is the tangen t space of the ﬁb er [ X ]. According to Kasai and Mishra (2016), the vertical space can b e expressed as follo ws. V X = { ( − 3 X i =1 G × i Ω i , U 1 Ω 1 , U 2 Ω 2 , U 3 Ω 3 ) | Ω > i = − Ω i } . (19) Since horizontal space H X is an orthogonal complement of V X with resp ect to the Riemannian metric (11), for all horizon tal v ectors η X = ( η G , η 1 , η 2 , η 3 ) ∈ H X w e ha v e h η X , ζ i X = 0 , ∀ ζ ∈ V X . (20) Using the expression for the horizontal space, the ab ov e equation is equiv alen t to the following one: 3 X i =1 h η i , U i Ω i G ( i ) G > ( i ) i + h η G , − 3 X i =1 G × i Ω i i + 3 X i =1 N α i h η i , ( I i − P i P > i ) U i Ω i i = 0 . (21) 12 Man uscript for Review Using the prop erty for the Euclidean inner product that for matrix A , B , C , D we ha v e h A , BCD i = h B > AD > , C i . And for tensor A , B and matrix C w e hav e h A , B × i C i = h A ( i ) B > ( i ) , C i . The ab ov e equation (22) is equiv alent to the following one 3 X i =1  U > i η i G ( i ) G > ( i ) + η G G > ( i ) + N α i ( U i − P i P > i U i ) > η i , Ω i  = 0 , ∀ skew matrix Ω i (22) Th us w e ha v e η X satisfy the follo wing conditions ( P i P > i U i ) > η i G ( i ) G > ( i ) + ( U i − P i P > i U i ) > η i ( N α i I i + G ( i ) G > ( i ) ) is a symmetric matrix ∀ i ∈ { 1 , 2 , 3 } . (23) Deﬁning V i := P i P > i U i , W i := U i − P i P > i U i , G i := G ( i ) G > ( i ) , G α i := N α i I i + G ( i ) G > ( i ) , we obtain the form ula for the horizon tal space: H X =  η X ∈ T X M r | V > i η i G i + W > i η i G α i is symmetric  (24) VI I.1.2 Pro of of Prop. (1) Supp ose X has tuck er factors ( G , U 1 , U 2 , U 3 ), then one can certify that: π − 1 ( X ) = [ G , U 1 , U 2 , U 3 ] . And hence the equiv alent relationship ∼ deﬁned by the equiv alen t classes [ G , U 1 , U 2 , U 3 ] can also b e expressed in terms of the map π ( · ): ( G , U 1 , U 2 , U 3 ) ∼ ( H , V 1 , V 2 , V 3 ) if and only if π ( G , U 1 , U 2 , U 3 ) = π ( H , V 1 , V 2 , V 3 ) . Since π ( · ) is a submersion (see Lemma 5), by the submersion theorem (Prop. 3.5.23 of Abraham et al. (2012)), the equiv alen t relation ∼ deﬁned by the equiv alent classes is regular and the quotient manifold M r / ∼ is diﬀeomorphic to F r . And according to the pro of of Prop. 3.5.23 of Abra- ham et al. (2012), the mapping % ([ G , U 1 , U 2 , U 3 ]) = G × 3 i =1 U i deﬁnes the diﬀeomorphism from M r / ∼ to F r . Therefore, ρ ( X ) = % − 1 ( X ) = [ G , U 1 , U 2 , U 3 ], where [ G , U 1 , U 2 , U 3 ] is the tuck er represen tation of X . VI I.1.3 Pro of of Prop osition 2 Let X = ( G , U 1 , U 2 , U 3 ) b e any tuck er factors of tensor X ∈ F r . According to the deﬁnition of Chordal distance of subspaces of diﬀeren t dimension Y e and Lim (2014), we ha v e dist 2 (span( X ( i ) ) , span( P i )) = dist 2 (span( U i ) , span( P i )) (25) = r i X i =1 sin 2 ( θ i ) + k i − r i (26) = r i X i =1 (1 − cos 2 ( θ i )) + k i − r i (27) = trace( I ) − k P > i U i k 2 F + k i − r i (28) = trace( U > i ( I − P > i P i ) U i ) + k i − r i (29) where in the second equation θ i is the i -th principal angle b etw een span U i and span P i , the second equation is derived from the deﬁnition of Chordal distance, the fourth equation is derived from the 13 Man uscript for Review fact that cos( θ i ) is the i -th singular v alue of P > i Q i due to P i and Q i are orthogonal bases (see Alg 12.4.3 of Golub and V an Loan (2012)). Therefore for all ( G , U 1 , U 2 , U 3 ) ∈ M r , w e ha v e l ( π ( G , U 1 , U 2 , U 3 )) = 1 2 kP Ω ( G × 3 i =1 U i − R ) k 2 F + 3 X i =1 (trace( U > i ( I − P > i P i ) U i ) + k i − r i ) . (30) Whic h is equiv alent to: l ( π ( G , U 1 , U 2 , U 3 )) = f ( G , U 1 , U 2 , U 3 ) + C (31) where C = P 3 i =1 ( k i − r i ) is a constan t. Note that the critical p oints of a function h ( x ) ov er a smooth manifold M are those whose Riemannian gradien t v anishing, that is grad h ( x ) = 0. And one can show that: grad h ( x ) = 0 if and only if D h ( x )[ η x ] = 0 ∀ η x ∈ T x M . (32) T o prov e that X is a critical p oint of l ( · ) ov er F r if and only if [ X ] is a critical p oint of ˜ f ( · ) o v er M r / ∼ , we need to prov e that grad l ( X ) = 0 if and only if grad ˜ f ([ X ]) = 0 . (33) Note that since grad f ( G , U 1 , U 2 , U 3 ) is the horizon tal lift of grad ˜ f ([ X ]) for all ( G , U 1 , U 2 , U 3 ) ∈ [ X ]. W e ha v e grad ˜ f ([ X ]) = 0 if and only if grad f ( G , U 1 , U 2 , U 3 ) = 0 for at least one ( G , U 1 , U 2 , U 3 ) ∈ [ X ]. Th us to prov e (33), one only need to certify grad l ( X ) = 0 if and only if ∃ ( G , U 1 , U 2 , U 3 ) ∈ [ X ] such that grad f ( G , U 1 , U 2 , U 3 ) = 0 . (34) On one side, supp ose grad l ( X ) = 0 , and X = ( G , U 1 , U 2 , U 3 ) ∈ [ X ]. Let η X b e any tangent v ector b elonging to T X M r . W e hav e: D f ( X )[ η X ] = D l ( π ( X ))[ D π ( X )[ η X ]] (35) = D l ( X )[ D π ( X )[ η X ]] (36) = 0 (37) where the ﬁrst equation is derived from equation (31) and chain rule of ﬁrst order deriv ativ e; the third equation is due to grad l ( X ) = 0 and D π ( X )[ η X ] ∈ T X F r since π ( · ) is a submersion (See Lemma 5). Because η X is an arbitrary tangen t v ector, w e hav e D f ( X )[ η X ] = 0 ∀ η X ∈ T X M r . (38) And according to (32) w e ha v e grad f ( X ) = 0 . Thus, we prov e that grad l ( X ) = 0 ⇒ grad f ( X ) = 0 . (39) On the other side, supp ose X = ( G , U 1 , U 2 , U 3 ) ∈ [ X ] and grad f ( X ) = 0 . Then for all η X ∈ T X F r w e ha v e: D l ( X )[ η X ] = D l ( π ( X ))[ η X ] (40) = D l ( π ( X ))[ D π ( X )[ η X ]] (41) = D f ( X )( η X ) (42) = 0 (43) where the second equation is b ecause there exist η X ∈ T X M r suc h that D π ( X )[ η X ] = η X due to π () b eing a submersion (See Lemma 5); the third equation is derived b y the equation (31) and c hain rule of ﬁrst order deriv ativ e; the fourth equation is due to grad f ( X ) = 0 . Th us w e hav e pro v ed that grad f ( X ) = 0 ⇒ grad l ( X ) = 0 (44) Since w e ha v e pro v ed b oth (39) and (44), we hav e (34) holds. 14 Man uscript for Review VI I.1.4 Pro of of Prop osition 3 Since the Euclidean ambien t space (71) R r 1 × r 2 × r 3 × R n 1 × r 1 × R n 2 × r 2 × R n 3 × r 3 is an sp ecial smo oth manifold, with tangent space at eac h its p oint b eing the ambien t space itself Absil et al. (2009). Therefore, one can endow the ambien t space with a metric, and treats it as a Riemannian manifold. By endo wing the am bien t space with the same metric with total space, namely: h X , Y i Z = 3 X i =1 h X U i , Y U i ( Z G ) ( i ) ( Z G ) > ( i ) i + h X G , Y G i + 3 X i =1 N α i h X U i , ( I i − P i P > i ) Y U i i (45) where X , Y , Z are any ambien t vectors, and all of them are tuples like ( X G , X U 1 , X U 2 , X U 3 ). The scaled Euclidean of the cost ˜ ∇ f ( X ) means the ambien t vector which satisﬁes the following condition h ˜ ∇ f ( X ) , Y i X = D f ( X )[ Y ] , ∀ Y ∈ ambien t space (46) This equation is equiv alent to the following: 3 X i =1 h Y i , ˜ ∇ U i f ( X )( X G ) ( i ) ( X G ) > ( i ) i + h Y G , ˜ ∇ G f ( X ) i + 3 X i =1 N α i h Y i , ( I i − P i P > i ) ˜ ∇ U i f ( X ) i = 3 X i =1 h Y i , ∇ U i ¯ f ( X ) i + h Y G , ∇ G ¯ f ( X ) i , ∀ Y ∈ ambien t space . (47) By taking the partial Euclidean gradient b oth side of ab o v e equation with resp ect to Y G and Y U i , one has ˜ ∇ G ¯ f ( X ) = ∇ G ¯ f ( X ) ˜ ∇ U i ¯ f ( X ) = E i (( X G ) ( i ) ( X G ) > ( i ) ) − 1 + F i ( N α i I i + ( X G ) ( i ) ( X G ) > ( i ) ) − 1 (48) where E i = P i P > i ∇ U i f ( X ) and F i = ∇ U i f ( X ) − E i . VI I.1.5 Pro of of Prop osition 4 According to Absil et al. (2009) , to prov e M r / ∼ has the structure of Riemannian manifolds, one need to show that for all [ X ] ∈ M r / ∼ and for all tangent vectors η [ X ] , ξ [ X ] ∈ T [ X ] M r / ∼ we hav e h η X 1 , ξ X 1 i X 1 = h η X 2 , ξ X 2 i X 2 , ∀ X 1 , X 2 ∈ [ X ] (49) where η X 1 , η X 2 are horizontal lift of η [ X ] and ξ X 1 , ξ X 2 are horizontal lift of ξ [ X ] . T o prov e that, w e ﬁrstly express X 2 , η X 2 , ξ X 2 in terms of X 1 , η X 1 , ξ X 1 , then v erify the in v ariant prop erty (49). Let X 1 = ( G , U 1 , U 2 , U 3 ). since X 1 , X 2 ∈ [ X ], there exist orthogonal matrices O i ∈ O ( r i ) suc h that X 2 = ( G × 3 i =1 O > i , U 1 O 1 , U 2 O 2 , U 3 O 3 ) . (50) Let η X 1 = ( η G , η 1 , η 2 , η 3 ), in this paragraph, w e will pro v e that η X 2 can b e expressed by the follo wing form ula η X 2 = ( η G × 3 i =1 O > i , η 1 O 1 , η 2 O 2 , η 3 O 3 ) . (51) Note that η X 2 is the horizon tal lift of η [ X ] , to prov e (51), one only need to show that ( η G × 3 i =1 O > i , η 1 O 1 , η 2 O 2 , η 3 O 3 ) satisfy the follo wing t w o conditions (See Sec. 3.6.2 of Absil et al. (2009)) ζ ∈ H X 2 (52) D τ ( X 2 )[ ζ ] = η [ X ] (53) 15 Man uscript for Review where for brevity we denote ( η G × 3 i =1 O > i , η 1 O 1 , η 2 O 2 , η 3 O 3 ) b y ζ ; the H X is the horizontal space at X (See Lemma 6 for its expression); τ ( · ) is the nature mapping from M r to M r / ∼ which is deﬁned b y τ ( X ) = [ X ] Note that τ () is a comp osition of map ρ ( · ) and map π ( · ) deﬁned in Prop. 1 and Lemma 5, namely τ ( X ) = ρ ( π ( X )) . (54) According to Lemma. 6, H X 1 = { η X 1 ∈ T X 1 M r | V > i η i G i + W > i η i G α i is symmetric } where V i = P i P > i U i , W i = U i − V i , G i = G ( i ) G > ( i ) and G α i = N α i I i + G ( i ) G > ( i ) . Using the equation (50), we ha v e: H X 2 = { η X 2 ∈ T X 2 M r | O > i V > i η i O > i G i O i + O > i W > i η i O > i G α i O i is symmetric } (55) (Note that when proving the ab ov e equation, w e use the equations lik e: ( G × 3 i =1 O > i ) (1) = O > 1 G (1) ( O > 2 ⊗ O > 3 ) > Kolda and Bader (2009) and the prop erties like O > 2 ⊗ O > 3 is orthogonal matrix ). T o prov e ζ ∈ H X 2 , on one hand w e noticed that: ζ > i U i O i + O > i U > i ζ i = O > i η > i U i O i + O > i U > i η i O i (56) = O > i ( η > i U i + U > i η i ) O i (57) = 0 (58) where the ﬁrst equation use the fact ζ i = η i O i , the third equation use the fact η i ∈ T U i St( r i , n i ) is equiv alent to η > i U i + U > i η i = 0 (See Sec 3.5.7 of Absil et al. (2009)). The ab ov e equation implies that ζ i ∈ T U i O i St( r i , n i ). And hence we ha v e ζ ∈  R r 1 × r 2 × r 3 × T U 1 O 1 St( r 1 , n 1 ) × T U 2 O 2 St( r 2 , n 2 ) × T U 3 O 3 St( r 3 , n 3 )  = T X 2 M r . (59) One the other hand, w e ha v e O > i V > i ζ i O > i G i O i + O > i W > i ζ i O > i G α i O i is symmetric since: ( O > i V > i ζ i O > i G i O i + O > i W > i ζ i O > i G α i O i ) > = ( O > i V > i η i O i O > i G i O i + O > i W > i η i O i O > i G α i O i ) > = O > i ( V > i η i G i + W > i η i G α i ) > O i = O > i ( V > i η i G i + W > i η i G α i ) O i = O > i ( V > i η i OO > G i + W > i η i O i O > i G α i ) O i = O > i V > i ζ i O > i G i O i + O > i W > i ζ i O > i G α i O i . Th us, w e ha v e pro v ed that ζ ∈ H X 2 . The following equations v erify (53) holds. D τ ( X 2 )[ ζ ] = D ρ ( π ( X 2 ))[ D π ( X 2 )[ ζ ]] (60) = D ρ ( X ) " ζ G × 3 i =1 U i O i + 3 X i =1 ( G × 3 i =1 O > i ) × i ζ i × 1 ≤ j ≤ 3 ,j 6 = i U i O i # (61) = D ρ ( X ) " ( η G × 3 i =1 O > i ) × 3 i =1 U i O i + 3 X i =1 ( G × 3 i =1 O > i ) × i η i O i × 1 ≤ j ≤ 3 ,j 6 = i U i O i # = D ρ ( X ) " η G × 3 i =1 U i + 3 X i =1 ( G × 3 i =1 O i ) × i η i × 1 ≤ j ≤ 3 ,j 6 = i U i # (62) = D ρ ( X )[ D π ( X 1 )[ η X 1 ]] (63) = D ρ ( π ( X 1 ))[ D π ( X 1 )[ η X 1 ]] (64) = D ρ ( X 1 )[ η X 1 ] (65) = η [ X ] (66) 16 Man uscript for Review where the ﬁrst equation is deriv ed by the chain rule of deriv ativ e, the second equation is derived b y using (14), the third equation is obtained by using our deﬁnition of ζ , the fourth equation is using the prop erty of tensor matrix pro duct that A × i A × i B = A × i ( BA ) and A × i A × j B = A × j B × i A ∀ j 6 = i Kolda and Bader (2009), the ﬁfth equation result from (14), the eighth equation is b ecause η X 1 is the horizon tal lift of η [ X ] . By similar argumen ts of ab ov e paragraph, one can verify that ξ X 2 = ( ξ G × 3 i =1 O > i , ξ 1 O 1 , ξ 2 O 2 , ξ 3 O 3 ) . (67) No w w e ha v e h η X 2 , ξ X 2 i X 2 = 3 X i =1 h η i O i , ξ i O i ( G × 3 i =1 O > i ) ( i ) ( G × 3 i =1 O > i ) > ( i ) i + h η G × 3 i =1 O i , ξ G × 3 i =1 O i i + 3 X i =1 N α i h η i O i , ( I i − P i P > i ) ξ i O i i (68) = 3 X i =1 h η i O i , ξ i O i O > i G ( i ) G > ( i ) O i i + h η G × 3 i =1 O i , ξ G × 3 i =1 O i i + 3 X i =1 N α i h η i O i , ( I i − P i P > i ) ξ i O i i = 3 X i =1 h η i , ξ i G ( i ) G > ( i ) i + h η G , ξ G i + 3 X i =1 N α i h η i , ( I i − P i P > i ) ξ i i (69) = h η X 1 , ξ X 1 i X 1 (70) where the ﬁrst equation use the expressions of X 2 , η X 2 , xi X 2 in terms of X 1 , η X 1 , ξ X 1 (see equations (50,51,67)); the second equation is deriv ed b y using equations like ( G × 3 i =1 O > i ) (1) ( G × 3 i =1 O > i ) > (1) =  O > 1 G (1) ( O > 3 ⊗ O > 2 ) >   O > 1 G (1) ( O > 3 ⊗ O > 2 ) >  > = O > 1 G (1) G > (1) O 1 ; the third equation is derived from the fact that Euclidean inner pro duct is orthogonal inv ariant. And the in v ariant prop erty of the proposed metric is b eing pro v ed. VI I.2 Deriv ation of The Expressions of Optimization Related Ob jects VI I.2.1 Pro jector from ambien t space onto tangen t space W e call the Euclidean space R r 1 × r 2 × r 3 × R n 1 × r 1 × R n 2 × r 2 × R n 3 × r 3 (71) the ambient sp ac e . The vector belonging to ambien t space is called by ambient ve ctor . One ambien t v ector is denoted b y ( Z G , Z 1 , Z 2 , Z 3 ), for brevit y the notation ma y b e shorted to Z . Prop osition 7. L et M r b e the total sp ac e, endowe d with the Riemannian metric (11). L et X = ( G , U 1 , U 2 , U 3 ) ∈ M M r Then the ortho gonal pr oje ction of an ambient ve ctor ( Z G , Z 1 , Z 2 , Z 3 ) onto the tangent sp ac e T X M r c an b e c ompute d by Ψ X ( Z G , Z 1 , Z 2 , Z 3 ) =  Z G , Z 1 − V 1 S 1 ( G (1) G T (1) ) − 1 − W 1 S 1 ( G (1) G T (1) + α 1 N I 1 ) − 1 Z 2 − V 2 S 2 ( G (2) G T (2) ) − 1 − W 2 S 2 ( G (2) G T (2) + α 2 N I 2 ) − 1 Z 3 − V 3 S 3 ( G (3) G T (3) ) − 1 − W 3 S 3 ( G (3) G T (3) + α 3 N I 3 ) − 1  (72) 17 Man uscript for Review wher e V i = P i P > i U i and W i = U i − V i and S i is the solution of the fol lowing matrix line ar e quation ( sym( V T i V i S i ( G ( i ) G ( i ) > ) − 1 ) − sym( U > i Z i ) + sym( W T i W i S i ( N α i I i + G ( i ) G ( i ) > ) − 1 ) = 0 . S i = S > i (73) in which sym( A ) = 1 / 2( A + A > ) for al l squar e matric es. Pr o of. The orthogonal pro jection of an ambien t vector to the tangent space, is computed by sub- traction of its comp onent b elongs to the normal space. T o b egin with we derive the normal space N X whic h orthogonal complement of T X M r with resp ect to the Riemannian metric (11). Let ζ = ( ζ G , ζ 1 , ζ 2 , ζ 3 ) ∈ N X b e an y v ector of the normal space. Then w e ha v e h ζ , η i X = 0 ∀ η ∈ T X M r (74) Since the tangen t space of total space can b e expressed as T X M r = R r 1 × r 2 × r 3 × T U 1 St( r 1 , n 1 ) × T U 2 St( r 2 , n 2 ) × T U 3 St( r 3 , n 3 ) (75) where the tangen t space of Stiefel manifold can b e formulated as: T U i St( r i , n i ) = { U i Ω i + U i, ⊥ K i | Ω i ∈ R r i × r i is sk ew and K i ∈ R ( n i − r i ) × r i } (76) and U i, ⊥ is also a matrix with orthogonal columns such that U > i, ⊥ U i = 0. Using the form ula (75) and (76), the equation (74) is equiv alent to the following form ula 3 X i =1 h U i Ω i + U i, ⊥ K i , ζ i G ( i ) G > ( i ) i + h η G , ζ G i + 3 X i =1 N α i h U i Ω i + U i, ⊥ K i , ( I i − P i P > i ) ζ i i = 0 ∀ K i ∈ R r i × ( n i − r i ) , skew matrix Ω i ∈ R r i × r i , η G ∈ R r 1 × r 2 × r 3 . (77) Using the fact that the condition h Z , U i Ω i + U i, ⊥ K i i = 0 ∀ K i and sk ew matrix Ω i is equiv alent to that Z = U i S i where S i is any symmetric matrix. the ab ov e equation (77) can b e simpliﬁed as the follo wing conditions ( ζ G = 0 P i P > i ζ i G ( i ) G ( i ) > + ( I i − P i P > i ) ζ i ( N α i I i + G ( i ) G ( i ) > ) = U i S i ∀ i ∈ { 1 , 2 , 3 } (78) where S i is a symmetric matrix. Note that the second equation of (78), is equiv alent to the following equations: P i P > i ζ i = V i S i ( G ( i ) G ( i ) > ) − 1 ( I i − P i P > i ) ζ i = W i S i ( N α i I i + G ( i ) G ( i ) > ) − 1 (79) where V i = P i P > i U i and W i = U i − V i , and the ﬁrst equation is obtained by multiplying the b oth side of the second formula of (78) by P i P > i , the second equation is obtained by multiplying the b oth side of the second formula of (78) by I − P i P > i . The ab ov e equation arra y is further equiv alent to ζ i = V i S i ( G ( i ) G ( i ) > ) − 1 + W i S i ( N α i I i + G ( i ) G ( i ) > ) − 1 (80) since one can obtain equation (80) b y adding the tw o equations in (79), and one an obtain the tw o equations in (79) via multiplying b oth sides of (80) by P i P > i or I − P i P i . Therefore, the normal space N X can b e expressed as follo ws. N X =  (0 , ζ 1 , ζ 2 , ζ 3 ) | ζ i = V i S i ( G ( i ) G ( i ) > ) − 1 + W i S i ( N α i I i + G ( i ) G ( i ) > ) − 1 , S i = S > i , 1 ≤ i ≤ 3  . (81) 18 Man uscript for Review No w the pro jection of an ambien t vector can b e calculated by subtracting its comp onents in the normal space N X . Sp eciﬁcally , supp ose Ψ X ( Z G , Z 1 , Z 2 , Z 3 ) = ( Y G , Y 1 , Y 2 , Y 3 ), w e ha v e Y G = Z G and there exist symmetric matrices S i suc h that Y i = Z i − V i S i ( G ( i ) G ( i ) > ) − 1 − W i S i ( N α i I i + G ( i ) G ( i ) > ) − 1 (82) where V i = P i P > i U i , W i = U i − V i and 1 ≤ i ≤ 3. Since ( Y G , Y 1 , Y 2 , Y 3 ) ∈ T X M r w e ha v e U > i Y i + Y > i U i = 0 , 1 ≤ i ≤ 3 . (83) By plugging in the equation (82) into the ab ov e equation we can obtain the linear equations for the symmetric matrix S i : sym( V T i V i S i ( G ( i ) G ( i ) > ) − 1 ) − sym( U > i Z i ) + sym( W T i W i S i ( N α i I i + G ( i ) G ( i ) > ) − 1 ) = 0 . (84) VI I.2.2 Pro jector from T angent Space on to Horizontal Space Prop osition 8. L et M r b e the total sp ac e, endowe d with the R iemannian metric (11). L et X = ( G , U 1 , U 2 , U 3 ) ∈ M M r . Then the ortho gonal pr oje ctor Π X fr om tangent sp ac e T X M r to horizontal sp ac e H X has the fol lowing form Π X ( η X ) = ( η G + 3 X i =1 G × i Ω i , η 1 − U 1 Ω 1 , η 2 − U 2 Ω 2 , η 3 − U 3 Ω 3 ) (85) wher e η X = ( η G , η 1 , η 2 , η 3 ) is a tangent ve ctor. And ( Ω 1 , Ω 2 , Ω 3 ) is the solution of the fol lowing line ar matrix e quation system:            skw( V T i V i Ω i G ( i ) G ( i ) > ) + skw( G ( i ) G ( i ) > Ω i ) + skw( W > i W i Ω i ( N α i I i + G ( i ) G ( i ) > )) − G ( i ) ( I j i ⊗ Ω k i + Ω j i ⊗ I k i ) G ( i ) > = skw[ V > η i G ( i ) G ( i ) > + W > η i ( N α i I i + G ( i ) G ( i ) > ) + G ( i ) ( η G ) > ( i ) ] ∀ i ∈ { 1 , 2 , 3 } Ω > i = − Ω i ∀ i ∈ { 1 , 2 , 3 } (86) wher e j i = max { k | k ∈ { 1 , 2 , 3 } , k 6 = i } and k i = min { k | k ∈ { 1 , 2 , 3 } , k 6 = i } , V i = P i P > i U i and W i = U i − V i . Pr o of. The pro jection from tangent space T X M r on to the horizontal space H X is also derived by subtracting the normal component from the tangent v ector. Note that the normal space to H X in T X M r is the vertical space V X deﬁned in (19). Then the pro jection Ψ( η X ) = ( ς G , ς 1 , ς 2 , ς 3 ) hav e the follo wing form: ( ς G = η G + P 3 i =1 G × i Ω i , ς i = η i − U i Ω i ∀ i = { 1 , 2 , 3 } (87) where Ω i is a skew matrix to b e determined. Since ( ς G , ς 1 , ς 2 , ς 3 ) ∈ H X , then according to Prop. 6, it m ust satisfy that: skw( V > i ς i G i + W > i ς i G α i ) = 0 ∀ 1 ≤ i ≤ 3 (88) where V i = P i P > i U i , W i = U i − P i P > i U i , G i = G ( i ) G > ( i ) , G α i = N α i I i + G ( i ) G > ( i ) , and skw () is a map deﬁne on square matrices, skw( A ) = 1 / 2( A − A > ). Doing some algebra, we obtain the linear system (86). 19 Man uscript for Review VI I.2.3 Retraction W e pro of that the retraction is compatible with the metric (11) b y showing it induce an retraction o v er the quotien t manifold. Lemma 9. L et R · ( · ) b e the r etr action deﬁne d in (13). Then E [ X ] ( η [ X ] ) := [ R X ( η X )] wher e X ∈ [ X ] and η X is a horizontal lift of η [ X ] , deﬁnes an r etr action over the quotient manifold M r / ∼ . Pr o of. Let X 1 , X 2 b e any tuck er factors b elonging to equiv alen t classes [ X ]. Let η [ X ] b e any tangen t v ector in the tangent space T [ X ] M r / ∼ . Let η X 1 and η X 2 are horizon tal lifts of η [ X ] . Supp ose X 1 = ( G , U 1 , U 2 , U 3 ), then w e ha v e [ R X 2 ( η X 2 )] = h R ( G × 3 i =1 O > i , { U i O i } ) ( η G × 3 i =1 O > i , { η i O i } 3 i =1 ) i (89) =  ( G + η G ) × 3 i =1 O > i , { uf ( U i O i + η i O i ) } 3 i =1  (90) =  ( G + η G ) × 3 i =1 O > i , { uf ( U i + η i ) O i } 3 i =1  (91) =  ( G + η G ) , { uf ( U i + η i ) } 3 i =1  (92) = [ R X 1 ( η X 1 )] (93) where the ﬁrst equation is because of (50) and (51), the second equation use the deﬁnition of retraction (13), the third equation is b ecause uf ( A O ) = uf ( A ) O for all orthogonal matrix O . Th us, according to Prop 4.1.3 of Absil et al. (2009), we hav e that E · ( · ) is a v alid retraction of M r / ∼ .eqn:retraction VI I.2.4 The Euclidean Gradient of the Cost The Euclidean gradient of the cost ∇ f ( G , U 1 , U 2 , U 3 ) can b e decomp ose as ∇ f ( G , U 1 , U 2 , U 3 ) = ( ∇ G f , ∇ U 1 f , ∇ U 2 f , ∇ U 3 f ) where ∇ G f and ∇ U i f are partial deriv ativ es of the cost with resp ect to G and U i . By doing some algebra, one has: ∇ G f ( G , U 1 , U 2 , U 3 ) = S × 3 i =1 U > i ∇ U i f ( G , U 1 , U 2 , U 3 ) = S ( i ) ( U j i ⊗ U k i ) G ( i ) + N α i W i (94) where S = P Ω ( G × 3 i =1 U i − R ) W i = U i − P i P > i U i , (95) j i = max { k | k ∈ { 1 , 2 , 3 } , k 6 = i } and k i = min { k | k ∈ { 1 , 2 , 3 } , k 6 = i } . VI I.3 More Empirical Results: Sim ulation In the sim ulations, w e complete a random tensor R whose size is ﬁxed to 5000 × 5000 × 5000 and multilinear rank to (10 , 10 , 10). And it is generated b y R = A × 1 B 1 × 2 B 2 × 3 B 3 where A ∈ R 10 × 10 × 10 and B i ∈ R 5000 × 10 are random (multi-dimensional) arrays with i.i.d standard Gaussian entries. The side informations are enco ded in three feature matrices. They are generated b y F i = B i + s k B i k F N i where N i is a noise matrix with entries drew from i.i.d normal distribution. The indices of the observed en tries Ω are sampled from the full indices set of the 5000 × 5000 × 5000 tensor uniformly at random. Its cardinality | Ω | is set to O S × D where D = 3 × (5000 × 10 − 10 2 ) + 10 3 20 Man uscript for Review is the dimension of the manifolds of 5000 × 5000 × 5000 tensors with multilinear rank (10 , 10 , 10) and O S is called the Ov er-Sampling ratio. W e compare the ﬁve tensor completion solvers under the following four scenarios. In eac h run the compared solvers are started with the same initializer generated from random, and stopp ed when either the norm of the gradient is less than 10 − 4 or the n um b er of iterations is more than 300. T o show the eﬀectiv eness of the prop ose metric, we also implemen ted an Riemannian CG solv er, with the least square metric Kasai and Mishra (2016). And the parameters of C GS I and F T C S I are set to the same v alues as they solve the same problem. VI I.3.1 Case 1: inﬂuence of sampling ratio W e study the num b er of observed samples on the p erformance of the compared solvers. W e v ary the ov ersampling ratio in the set O S ∈ { 0 . 1 , 1 , 5 } while ﬁxing the noise scale of the feature matrices to 10 − 5 . Then, run the ﬁve solvers on eac h tasks. F or each run, w e set α i , 1 ≤ i ≤ 3 are all set to 10 / | Ω | and λ = 0 for CGSI and FTCSI. The parameters of other baselines are set to the defaults. W e rep ort the conv ergence b ehavior of the compared solvers in Fig. 5(a-c). Note that in Fig. 5(a) the RMSE curve of FTC coincides with that of GE and in Fig. 5(c) the RMSE curv e of FTC coincides with that of FTCSI. F rom Fig. 5(a) and (b), we can see that only CGSI and FTCSI successfully bring the RMSE do wn b elo w 10 − 2 when OS is smaller than 1. This sho ws that when the observed en tries are scarce, using the side information in the optimization can make a big diﬀerence on the accuracy of tensor completion task. And from Fig. 5(a-c), we can see that CGSI con v erges to the solution faster than FTCSI. This is shows that our prop osed metric can indeed accelerate the con v ergence of Riemannian conjugate gradient descent metho d. VI I.3.2 Case 2: inﬂuence of noisy side information T o study the aﬀect of noisy feature matrix on the p erformance of the prop osed metho d. W e ﬁx the ov ersampling ratio to O S = 1 and v ary the noise scale of the feature c matrix in the set { 10 − 4 , 10 − 3 , 10 − 2 } . F or CGSI and FTCSI, their parameters α i are all set to 1 and λ is set to 0. The conv ergence b eha vior of the compared methods are rep orted in Fig. 5(d-f ). F rom these ﬁgures w e can see that when con v erging, the RMSE of CGSI and FTCSI are similar. This is because they solv e the s ame problem. And even the feature matrices are noisy , the RMSE of CGSI and FTCSI are muc h b etter than the other baselines. These ﬁgures also show that CGSI is muc h faster than FTCSI, whic h is attributed to that CGSI is endow ed with a b etter Riemannian metric. VI I.3.3 Case 3: inﬂuence of non-relev ant features W e consider the p erformance of the proposed metho d, when the provided feature matrices F i ha v e m uc h more columns than the correct ones B i . The matrices F i ∈ R 5000 × 10( k +1) is generated by augmen ting the correct feature matrices B i with 10 k randomly generated columns. That is, we set F i = [ B i , G i ] + 10 − 5 k B i k E i where G i ∈ R 5000 × 10 k and E i ∈ R 5000 × 10( k +1) are random matrices with en tries drew from i.i.d standard Gaussian distribution. W e ﬁx the o v ersampling ratio to O S = 1, and v ary the parameter k ∈ { 10 , 30 , 50 } . F or CGSI and FTCSI, α i , 1 ≤ i ≤ 3 are set to 0 . 5 and λ is set to 0. The parameters of other baselines are set to the default. W e rep ort the con v ergence b ehavior of the compared solvers in Fig. 5 (g-i). F rom these ﬁgures we can see that b oth CGSI and FTCSI successfully bring the RMSE down around 10 − 5 ev en when the columns of F i are 50 times larger than B i . And These ﬁgures also shows that the prop osed solver CGSI conv erges m uc h faster than FTCSI, which is attributed to CGSI b eing endow ed with a b e tter Riemannian metric. 21 Man uscript for Review VI I.3.4 Case 4: inﬂuence of noisy samples W e consider the case where the observ ed en tries are noisy b y adding a scaled Gaussian noise  P Ω ( E ) to P Ω ( R ) where E is a noise tensor with i.i.d standard Gaussian en tries. W e ﬁx the ov ersampling ratio O S to 1, the noise scale c of feature matrices to 10 − 4 and v ary the noise scale of samples suc h that  ∈ { 10 − 4 , 10 − 3 , 10 − 2 } . F or CGSI and FTCSI, their parameters are set as follow. When α i = 5 , 1 ≤ i ≤ 3 and λ = 0. The parameters of other baselines are set to defaults. W e rep ort the p erformance of the compared solvers in Fig. 5 (j-l). F rom these ﬁgures we can see that only the solvers for the prop osed mo del, that is CGSI and FTCSI, bring the RMSE down to the level of noise  when con v erging. This shows that when the observ ed entries are few, exploiting the side information can signiﬁcan tly improv es the RMSE. Also we can see that CGSI con v erges muc h faster than FTCSI, this exhibit that the prop osed metric (11) is able to accelerate the con v ergence of Riemannian conjugate gradien t descen t metho d. References Ralph Abraham, Jerrold E Marsden, and T udor Ratiu. Manifolds, tensor analysis, and applic ations , v olume 75. Springer Science & Business Media, 2012. P-A Absil, Robert Mahon y , and Rodolphe Sepulchre. Optimization algorithms on matrix manifolds . Princeton Univ ersit y Press, 2009. Evrim Acar, T amara G. Kolda, and Daniel M. Dunla vy . All-at-once optimization for coupled matrix and tensor factorizations. In MLG’11 , 2011. Rob ert M Bell and Y ehuda Koren. Lessons from the netﬂix prize c hallenge. A cm Sigkdd Explor ations Newsletter , 9(2):75–79, 2007. Alex Beutel, P artha Pratim T alukdar, Abhimanu Kumar, Christos F aloutsos, Ev angelos E Papalex- akis, and Eric P Xing. Flexifact: Scalable ﬂexible factorization of coupled tensors on hado op. In ICDM , pages 109–117. SIAM, 2014. Shouyuan Chen, Michael R Lyu, Irwin King, and Zenglin Xu. Exact and stable reco v ery of pairwise in teraction tensors. In NIPS , 2013. Mark o Filip ovi ´ c and An te Juki´ c. T uck er factorization with missing data with application to lo w-n- rank tensor completion. Multidimensional systems and signal pr o c essing , 26(3):677–692, 2015. Da vid H F oster, Kinjiro Amano, S´ ergio MC Nascimento, and Mic hael J F oster. F requency of metamerism in natural scenes. Journal of the Optic al So ciety ofAmeric a A , 23:2359–2372, 2006. Gene H Golub and Charles F V an Loan. Matrix c omputations , volume 3. JHU Press, 2012. Prateek Jain and Sewoong Oh. Prov able tensor factorization with missing data. In A dvanc es in Neur al Information Pr o c essing Systems , pages 1431–1439, 2014. Hiro yuki Kasai and Bamdev Mishra. Low-rank tensor completion: a riemannian manifold precon- ditioning approac h. In ICML , 2016. T amara G Kolda and Brett W Bader. T ensor decomp ositions and applications. SIAM r eview , 51 (3):455–500, 2009. Daniel Kressner, Michael Steinlechner, and Bart V andereyc k en. Lo w-rank tensor completion by riemannian optimization. BIT Numeric al Mathematics , 54(2):447–468, 2014. 22 Man uscript for Review (a) 0 100 200 300 RMSE 10 -5 10 0 FTC FTCSI AltMin GeomCG CGSI (b) 0 100 200 300 10 -5 10 0 (c) 0 100 200 300 10 -5 10 0 (d) 0 100 200 300 RMSE 10 -5 10 0 (e) 0 100 200 300 10 -2 10 0 10 2 (f) 0 100 200 300 10 -2 10 0 10 2 (h) 0 100 200 300 RMSE 10 -5 10 0 (i) 0 100 200 300 10 -5 10 0 (j) 0 100 200 300 10 -5 10 0 CPU time (k) 0 100 200 300 RMSE 10 -5 10 0 CPU time (l) 0 100 200 300 10 -2 10 0 10 2 CPU time (m) 0 100 200 300 10 -2 10 0 10 2 Figur e 5: Simulation r esults of diﬀer ent solvers on the task of tensor c ompletion. 23 Man uscript for Review Ji Liu, Przemysla w Musialski, P eter W onk a, and Jieping Y e. T ensor completion for estimating missing v alues in visual data. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e , 35(1):208–220, 2013. Qiang Liu, Shu W u, and Liang W ang. Cot: Con textual operating tensor for context-a w are recom- mender systems. In AAAI , pages 203–209, 2015. Y uanyuan Liu, F anhua Shang, Hong Cheng, James Cheng, and Hanghang T ong. F actor matrix trace norm minimization for low-rank tensor completion. In Pr o c e e dings of the 2014 SIAM International Confer enc e on Data Mining , pages 866–874. SIAM, 2014. Y uanyuan Liu, F anhua Shang, W ei F an, James Cheng, and Hong Cheng. Generalized higher order orthogonal iteration for tensor learning and decomposition. IEEE tr ansactions on neur al networks and le arning systems , 27(12):2551–2563, 2016. Bamdev Mishra. A Riemannian appr o ach to lar ge-sc ale c onstr aine d le ast-squar es with symmetries . PhD thesis, Univ ersite de Liege, Liege, Belgique, 2014. Bamdev Mishra and Ro dolphe Sepulchre. R3mc: A riemannian three-factor algorithm for low-rank matrix completion. In De cision and Contr ol (CDC), 2014 IEEE 53r d Annual Confer enc e on , pages 1137–1142. IEEE, 2014. Bamdev Mishra and Ro dolphe Sepulchre. Riemannian preconditioning. SIAM Journal on Opti- mization , 26(1):635–660, 2016. A tsuhiro Narita, Kohei Hay ashi, Ryota T omiok a, and Hisashi Kashima. T ensor factorization us- ing auxiliary information. In Joint Eur op e an Confer enc e on Machine L e arning and Know le dge Disc overy in Datab ases , pages 501–516. Springer, 2011. Piyush Rai, Yingjian W ang, and Lawrence Carin. Leveraging features and netw orks for probabilistic tensor decomp osition. In AAAI , pages 2942–2948, 2015. Bernardino Romera-P aredes and Massimiliano Pon til. A new conv ex relaxation for tensor comple- tion. In A dvanc es in Neur al Information Pr o c essing Systems , pages 2967–2975, 2013. Bernardino Romera-P aredes, Hane Aung, Nadia Bianchi-Berthouze, and Massimiliano Pon til. Mul- tilinear m ultitask learning. In Pr o c e e dings of the 30th International Confer enc e on Machine L e arning , pages 1444–1452, 2013. Shaden Smith, Jongso o P ark, and George Karypis. An exploration of optimization algorithms for high p erformance tensor completion. In Pr o c e e dings of the 2016 ACM/IEEE c onfer enc e on Sup er c omputing , 2016. Yic hen W ang, Rob ert Chen, Joydeep Ghosh, Josh ua C Denny , Ab el Kho, Y ou Chen, Bradley A Malin, and Jimeng Sun. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In Pr o c e e dings of the 21th A CM SIGKDD International Confer enc e on Know le dge Disc overy and Data Mining , pages 1265–1274. ACM, 2015. Y angyang Xu, Ruru Hao, W otao Yin, and Zhixun Su. Parallel matrix factorization for lo w-rank tensor completion. Inverse Pr oblems & Imaging , 9(2), 2015. Ke Y e and Lek-Heng Lim. Distance b etw een subspaces of diﬀerent dimensions. arXiv pr eprint arXiv:1407.0900 , 2014. Ming Y uan and Cun-Hui Zhang. On tensor completion via nuclear norm minimization. F oundations of Computational Mathematics , pages 1–38, 2015. 24 Man uscript for Review Xiao qin Zhang, Zhengyuan Zhou, Di W ang, and Yi Ma. Hybrid singular v alue thresholding for tensor completion. In AAAI , pages 1362–1368, 2014. Zemin Zhang and Shuc hin Aeron. Exact tensor completion using t-svd. IEEE T r ansactions on Signal Pr o c essing , 2016. 25

Riemannian Tensor Completion with Side Information

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment