Tensor Computation: A New Framework for High-Dimensional Problems in EDA
Many critical EDA problems suffer from the curse of dimensionality, i.e. the very fast-scaling computational burden produced by large number of parameters and/or unknown variables. This phenomenon may be caused by multiple spatial or temporal factors…
Authors: Zheng Zhang, Kim Batselier, Haotian Liu
ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 1 T ensor Computation: A Ne w Fra me work for High-Dimensional Problems in ED A Zheng Zhang, Kim Batselier , Haotian Liu, Luca Daniel and Ngai W ong (In vited K e ynote P aper) Abstract —Many critical ED A pr oblems suffer fr om the curse of dimensionality , i.e. th e very f ast-scaling computational burden produced by large number of parameters and/or unk nown variables. This phenomenon may b e caused by multiple spatial or tempora l factors (e.g. 3-D field so lvers discr etizations and multi-rate circuit simulation), nonlinearity o f devices a nd circuits, large number of design or optimization paramete rs (e.g. full- chip routing/placement and circuit sizing), or extensive process variations (e.g. variability/reliability analysis and design for manufacturability). The computational challenges generated by such high di mensional pr oblems are generally hard to handle efficiently with tradi tional ED A core algorithms th at are b ased on matrix and v ector computation. This paper presents “tensor computation” as an alter nativ e general fra mework for the d e- velopment of effi cient ED A algorithms and t ools. A t ensor is a high-dimensional generalization of a m atrix and a vector , and is a natural choice for both storing and solving efficiently h igh- dimensional ED A problems. This pap er gives a basic tutorial on tensors, demonstrates some recent examples of EDA app lications (e.g., nonlinear circuit modeling and h igh-dimensional uncer - tainty qu antification), and su ggests further open ED A p roblems where the use of tensor computation could be of advantage. I . I N T R O D U C T I O N A. Success of Matrix & V ecto r Compu tation in EDA Hystory The advancement o f fabrication technolo gy and th e de- velopment of Electronic Design Automation (E D A) are tw o engines that h av e been dr iving the progr ess of semico nducto r industries. The first integrated c ircuit (IC) was in vented in 1959 by Jack Kilby . Howe ver , un til the early 197 0s de signers could o nly handle a small n umber of tran sistors man ually . The idea of E D A, namely designin g electronic circuits and systems automatically using computers, was prop osed in the 1960s. Nonetheless, this idea was regarded as science fiction u ntil SPICE [ 1] was released by UC Berkeley . Due to the success of SPICE, nume rous E D A algorithms and tools were fur ther developed to accelerate v arious design tasks, and desig ners could design large-scale co mplex chip s without spendin g months or years on lab or-intensiv e work. The EDA area indeed encom passes a very large variety of diverse topics, e.g. , hardware description lang uages, logic synthesis, formal verification . This paper mainly con cerns computatio nal prob lems in EDA. Spec ifically , we fo cus on Z. Zhang and L. Daniel are with Department of Electric al Engineeri ng and Computer Science , Massachusetts Institute of T echnology (MIT), Cambridge, MA. E-mails: { z zhang, luca } @mit.edu K. Batsel ier and N. W ong are with De partment of Elec trica l and Electroni c Engineeri ng, the Uni v ersity of Hong Ko ng. E-mails: { kimb, nwong } @ eee.hku.hk H. Liu is wit h Cadenc e Design Systems, Inc. San Jose, CA. E -mail: haotia n@cade nce.com modeling , simulation and optimizatio n pr oblems, whose per- forman ce heavily relies on effecti ve numerical implemen ta- tion. V ery o ften, numeric al mod eling or simulatio n core tools are called repeatedly by many h igher-lev el EDA too ls such as design optimization and system-le vel v erification. Many efficient matrix -based and vector-based algorithm s have been developed to address the c omputatio nal challeng es in EDA. Here we briefly summarize a small numb er of examples among the n umerou s r esearch results. In the context of circuit simulation , mod ified n odal a naly- sis [2] was proposed to d escribe the dy namic network of a general electron ic circuit. Standar d nume rical integratio n an d linear/non linear equation solvers (e .g., Gaussian elimination, LU factorizatio n, Newton’ s iteration ) were implemented in the early version of SPICE [ 1]. Driven by communication IC design , specialized RF simulator s were developed for periodic steady- state [3]–[ 7] and n oise [8] simulation . Iterative solvers and their p arallel variants were f urther im plemented to speed up large-scale linear [9] –[11] an d no nlinear circuit simulation [1 2], [13]. In ord er to han dle proc ess variations, both Mon te Carlo [ 14], [15] and stochastic sp ectral meth- ods [16]–[26]) were in vestigated to accelerate stochastic circuit simulation. Efficient models were dev eloped at almost every design lev el of hiera rchy . At the pr ocess level, many statistical and learning algor ithms wer e prop osed to cha racterize manufac- turing process variations [27]–[29]. At the device le vel, a huge number o f physics-based ( e.g., BS IM [30] for MOS- FET an d RLC intercon nect m odels) and math-b ased m odel- ing f rameworks were rep orted and implem ented. Math- based approa ches are also applicable to circ uit a nd system- lev el problem s due to their gen eric form ulation. Th ey start from a detailed math ematical d escription [e .g., a par tial d ifferen- tial equatio n (PDE ) or integral equation describin g device physics [31]–[3 3] o r a dynam ic sy stem describin g electronic circuits] o r some measurem ent data, then gen erate c ompact models by mo del orde r reduc tion [34]–[42] or system iden- tification [43]–[46]. These tech niques wer e further extende d to problem s with de sign par ameters o r pro cess uncer tain- ties [ 47]–[55]. Thanks to the progre ss of num erical o ptimization [56]– [58], a lot of alg orithmic solu tions were developed to solve ED A pr oblems such as VL SI placement [59], r outing [60], logic synthesis [61] and analog /RF circu it optimization [62], [63]. Based on design heuristics or n umerical ap prox imation, the perfo rmance of many ED A o ptimization engines were improved. For instance , in analo g/RF circ uit optim ization, posyno mial or polyn omial perform ance models were extracted ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 2 to sign ificantly red uce the numb er of circuit simulations [6 4]– [66]. B. Algorithmic Cha llenges an d Motivation Examp les Despite the success in many ED A applica tions, con ven- tional matrix-b ased an d vector-based algorith ms have certain intrinsic limitations wh en applied to p roblem s with h igh dimensiona lity . These problems generally in volve an extremely large n umber of unkn own variables o r require many sim- ulation/measu rement samples to character ize a q uantity of interest. Below we summar ize some rep resentative mo tiv ation examples among numer ous EDA pr oblems: • Parameterized 3-D Field Solvers. Lots o f devices ar e described by PDEs or integral equ ations [ 31]–[33] with d spatial dim ensions. W ith n discretization elemen ts along each spatial dimen sion ( i.e., x -, y - or z -d irection), the number of unkn own elemen ts is approxim ately N = n d in a fin ite-difference or finite-elem ent scheme. When n is large ( e.g. mor e than thousand s and of ten millio ns), ev en a fast iterative matrix solver with O ( N ) complexity cannot hand le a 3-D device simulation . If design par am- eters (e.g . ma terial pr operties) are considered and the PDE is furth er discretized in the parameter sp ace, the computatio nal cost qu ickly extends beyond th e cap ability of existing matrix - or vector-based algorithm s. • Multi-Rate Circuit Simulation. Widely separated time scales appe ar in many electronic cir cuits (e.g. switched capacitor filters and mixers), and they a re difficult to simulate u sing standard transient simulato rs. Multi-time PDE solvers [67] reduce the comp utational cost b y dis- cretizing the d ifferential equa tion alo ng d tempor al axes describing different time scales. Similar to a 3-D d evice simulator, this treatment may also be af fected by the curse of dimensiona lity . Freq uency-do main appro aches such as multi-tone harm onic balance [68], [6 9] may be more efficient for some RF circuits with d sinusoidal in puts, but th eir comp lexity also become s p rohibitively high as d in creases. • Probabilistic N oise Simulation. Whe n simula ting a c ir- cuit influ enced by n oise, som e prob abilistic appro aches (such as those based on Fokker-Planck equ ations [ 70]) compute the joint d ensity f unction of its d state variables along the time axis. In practice, th e d -variable joint density func tion must b e fin ely discretized in the d - dimensiona l space, leadin g to a huge compu tational cost. • Nonlinear or Paramet erized Model Order Reductio n. The c urse of dimen sionality is a lo ng-stand ing challenge in mo del ord er reduction . In m ulti-param eter model order reduction [4 7], [4 8], [54], [55], a h uge nu mber of mo- ments m ust be m atched, leading to a huge- size reduced - order model. In nonlinea r mode l order reductio n based on T ay lor exp ansions o r V olterra series [3 6]–[38], the com- plexity is an expon ential function of the hig hest degree of T ay lor or V o lterra series. Therefore, existing matrix-b ased algorithm s can on ly cap ture low-order n onlinear ity . • Design Space Explo ration. Consider a classical design space exploration pr oblem: optimize the circu it perf or- mance (e.g., small-signal gain of an operational amplifier) by cho osing the b est values of d de sign pa rameters (e.g. the sizes o f all transistors). When the p erform ance metric is a strongly non linear and discontinuo us fu nction of design parameters, sweeping the whole pa rameter space is po ssibly the only fea sible solution . Even if a small number of samples are used for e ach parameter, a h uge number of simulations are r equired to explore th e who le parameter space. • V ariability- A wa re Design A utoma tion. Pro cess varia- tion is a cr itical issue in nan o-scale chip design. Captu r- ing the co mplex stoch astic behavior cau sed by p rocess uncertainties can be a data-in tensiv e task. For instance, a hug e nu mber of measuremen t data p oints are r equired to characterize accurately the variability of device pa - rameters [2 7]–[29]. In circu it mod eling and simulation, the classical stocha stic collocation alg orithm [22]–[24] requires many simulatio n samples in o rder to con struct a s urrog ate mod el. Altho ugh some algo rithms s uch as compressed sensing [2 9], [71] can redu ce m easurement or co mputation al cost, lots o f hid den data informa tion cannot b e fully exploited by matrix-b ased algorith ms. C. T owar d T enso r Comp utations? In th is paper we argue that o ne effecti ve way to ad dress the above challeng es is to utilize ten sor co mputation . T ensors are high- dimension al generalizatio ns of vectors and matr ices. T ensors we re de veloped well over a century ago, but have been main ly app lied in physics, che mometrics and psycho- metrics [72]. Due to the ir high ef ficiency and convenience in rep resenting and h andling hu ge data arr ays, tensors are only recently b eginning to be successfully applied in m any engineer ing fields, includin g (but not limited to) signal pro- cessing [73], big data [ 74], machin e learn ing and scientific computin g. Noneth eless, tensor s still seem a relatively unex- plored an d u nexploited conce pt in the ED A field . The g oals an d organ ization of this p aper inclu de: • Providing a hands-on “pr imer” introdu ction to tensors and their b asic com putation techniqu es ( Section II an d append ices), as well as the mo st p ractically usefu l tec h- inques such a s tensor d ecompo sition (Section III) and tensor completion (Sectio n IV) ; • Summarizin g, as gu iding examples, a few r ecent tensor- based ED A algorithms, includ ing progress in high - dimensiona l unce rtainty qu antification (Section V) and nonlinear circu it mo deling an d simulatio n ( Section VI); • Suggesting some theor etical an d applicatio n open chal- lenges in tensor-based EDA ( Sections VII and VI II) in order to stimulate fu rther researc h co ntributions. I I . T E N S O R B A S I C S This section revie ws some basic te nsor n otions and op- erations necessary for un derstandin g the key ideas in the paper . Different fields have been using different conventions for tensors. Our exposition will tr y to u se on e of the most popular and co nsistent notations. ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 3 i 1 i 2 i 3 1 4 7 10 2 5 8 11 3 6 9 12 é ù ê ú ê ú ê ú ë û 13 16 19 22 14 17 20 23 15 18 21 24 é ù ê ú ê ú ê ú ë û Fig. 1. An example tensor A ∈ R 3 × 4 × 2 . A. Notations a nd P r eliminaries W e use b oldface capital c alligraphic letters (e.g. A ) to denote tenso rs, boldface capital letters (e.g . A ) to den ote matrices, bo ldface letters (e.g. a ) to denote vectors, and roman (e.g. a ) o r Gr eek ( e.g. α ) letters to deno te scalars. T ensor . A tenso r is a high-d imensional gene ralization of a matrix or vector . A vector a ∈ R n is a 1-way data array , and its i th elem ent a i is specified by th e index i . A matrix A ∈ R n 1 × n 2 is a 2 -way data arr ay , and each elemen t a i 1 i 2 is specified by a row ind ex i 1 and a column index i 2 . By extending this idea to th e h igh-d imensional case d ≥ 3 , a tensor A ∈ R n 1 × n 2 ×···× n d represents a d -way d ata array , and its element a i 1 i 2 ··· i d is specified by d indices. Here, the positive integer d is a lso called the order of a tensor . Fig . 1 illustrates an examp le 3 × 4 × 2 tensor . B. Basic T ensor Arithmetic Definition 1: T ensor inner product. The inner prod uct between two tensors A , B ∈ R n 1 ×···× n d is d efined as h A , B i = X i 1 ,i 2 ,...,i d a i 1 ··· i d b i 1 ··· i d . As nor m of a ten sor A , it is typically convenient to use th e Frobeniu s n orm || A || F := p h A , A i . Definition 2: T ensor k -mo de product. The k -mo de produ ct B = A × k U of a tensor A ∈ R n 1 ×···× n k ×···× n d with a matrix U ∈ R p k × n k is defined by b i 1 ··· i k − 1 j i k +1 ··· i d = n k X i k =1 u j i k a i 1 ··· i k ··· i d , (1) and B ∈ R n 1 ×···× n k − 1 × p k × n k +1 ×···× n d . Definition 3: k -mo de product shorthand notation. The multiplication of a d -way ten sor A with the matrices U (1) , . . . , U ( d ) along each of its d modes respectively is [ [ A ; U (1) , . . . , U ( d ) ] ] , A × 1 U (1) × 2 · · · × d U ( d ) . When A is diagonal with all 1’ s on its diagonal an d 0’ s elsewhere, the n A is o mitted fr om the n otation, e.g . [ [ U (1) , . . . , U ( d ) ] ] . Definition 4: Rank-1 tensor . A ran k-1 d -way ten sor ca n be written as the o uter p roduc t of d vector s A = u (1) ◦ u (2) ◦ · · · ◦ u ( d ) = [ [ u (1) , . . . , u ( d ) ] ] , (2) where u (1) ∈ R n 1 , . . . , u ( d ) ∈ R n d . The entries o f A are completely determined by a i 1 i 2 ··· i d = u (1) i 1 u (2) i 2 · · · u ( d ) i d . T ABLE I S T O R A G E C O S T S O F M A I N S T R E A M T E N S O R D E C O M P O S I T I O N A P P R O A C H E S . Decompositi on Elements to store Comments Canonic al P olyadi c [81], [82] ndr see Fig. 2 T uck er [83] r d + ndr see Fig. 3 T ensor Train [84] n ( d − 2) r 2 + 2 nr see Fig. 4 Some ad ditional notation s and operations are in troduced in Appen dix A. The applicatio ns in Sections V a nd VI will make it cle ar that the main problems in ten sor-based E D A applications are either comp uting a tensor deco mposition or solving a tenso r completio n p roblem. Bo th of them will now be d iscussed in o rder . I I I . T E N S O R D E C O M P O S I T I O N A. Computationa l Adv antage of T en sor Decomposition s. The numb er of elements in a d -way tensor is n 1 n 2 · · · n d , which grows very fast as d increases. T en sor decom positions compress and repr esent a high-dimen sional tensor by a smaller number of factors. As a re sult, it is possible to solve h igh- dimensiona l problem s (c.f. Section s V to VI I) with a lower storage and computatio nal cost. T able I summarizes the storage cost of three mainstrea m tenso r de composition s in order to intuitively show the ir advantage. State- of-the- art imp lementa- tions of these metho ds can be fo und in [ 75]–[77]. Sp ecific examples are for instance: • While the h idden layers o f a neural network could consume almost all of the memory in a server , using a canonical or ten sor-train decom position in stead results in an extraordin ary compression ( [78], [79]) by u p to a factor of 20 0 , 000 . • High-or der models describing nonlinea r dynamic systems can a lso be significan tly co mpressed u sing tensor dec om- position as will be shown in d etails in Section VI. • High-dim ensional integra tion a nd conv olution ar e lon g- standing cha llenges in many engine ering fields ( e.g. computatio nal fin ance and imag e pr ocessing). T hese two problem s can be wr itten as the inner produc t of two tensors, a nd while a direct computation w ould h av e a complexity of O ( n d ) , u sing a low-rank can onical or tensor-train decomp osition, results in an extraord inarily lower O ( nd ) complexity [80]. In this section we will brie fly d iscuss the mo st pop ular and useful tensor d ecompo sitions, highlig hting advantages of each. B. Canonical P olyad ic Decompo sition Polyadic Decomposition . A polyadic decom position ex- presses a d -way tensor as the sum of r rank - 1 terms: A = r X i =1 σ i u (1) i ◦ · · · ◦ u ( d ) i = [ [ D ; U (1) , · · · , U ( d ) ] ] . (3) The subscript i of the unit-no rm u (1) i vectors indicates a summation index an d n ot the vector entries. The u ( k ) i vectors ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 4 A s 1 s r + + … = (1) 1 u ( 2 ) 1 u (3 ) 1 u (1) r u ( 2 ) r u (3 ) r u n 1 n 2 n 3 n 1 n 2 n 3 Fig. 2. Decomposing A into the sum of r rank- 1 outer products. A = n 1 n 2 n 3 n 1 n 2 n 3 r 3 r 1 r 2 r 1 r 2 r 3 S (1 ) U ( 2 ) U (3 ) U Fig. 3. The Tuck er decomposition decomposes a 3-way tensor A into a core tensor S and fac tor matrices U (1) , U (2) , U (3) . are called the m ode- k vectors. Co llecting all vecto rs of th e same mode k in matr ix U ( k ) ∈ R n k × r , this dec omposition is rewritten as th e k -mo de prod ucts o f matrices { U ( k ) } d k =1 with a cubical diag onal tensor D ∈ R r ×···× r containing all the σ i values. Note that we can a lways absorb each of the scalars σ i into o ne of the mode vector s, th en write A = [ [ U (1) , · · · , U ( d ) ] ] . Example 1: Th e po lyadic deco mposition of a 3 -way tensor is shown in Fig. 2. T ensor Rank . The m inimum r := R fo r the equality ( 3) to hold is called the tensor rank which, unlike the matrix case, is in g eneral NP-hard to co mpute [8 5]. Canonical Polyadic Decompositio n (CPD) . T he corre- sponding decomposition with the minimal R is called the canon ical po lyadic decompo sition (CPD). It is also called Canonica l Decomp osition (CANDECOMP) [81] or P arallel F actor (P ARAF AC ) [82] in the literature. A CPD is un ique, up to scaling an d permutatio n of the mod e vectors, un der mild condition s. A classical u niquen ess re sult f or 3-way tensors is described by Kr uskal [86]. Th ese unique ness con ditions d o not a pply to th e matrix case 1 . The compu tation of a polyadic decompo sition, together with two variants ar e discussed in App endix B. C. T ucker Decompo sition T ucker Decomposition . Removin g the co nstraint that D is cubical and diagon al in (3 ) re sults in A = S × 1 U (1) × 2 U (2) · · · × d U ( d ) (4) = [ [ S ; U (1) , U (2) , . . . , U ( d ) ] ] 1 Indeed, for a gi ven matrix decompositi on A = U V and any nonsingular matrix T we ha ve t hat A = U T T − 1 V . Onl y by a dding suf ficient c onditi ons (e.g. orthogona l or triangular factors) the matrix decomposit ion can be made unique. Remarkably , the CPD for higher order tensors does not need any such conditi ons to ensure its uniquene ss. A = n 1 n 2 n 3 n 1 r 1 n 2 r 1 r 2 r 2 n 3 G (1) G (2) G (3) Fig. 4. The T ensor Train decomposition decomposes a 3-way tensor A into two matric es G (1) , G (3) and a 3 -way tensor G (2) . with the factor matrices U ( k ) ∈ R n k × r k and a cor e tensor S ∈ R r 1 × r 2 ×···× r d . The T ucker d ecompo sition can signifi- cantly reduce the storage cost wh en r k is ( much) smaller than n k . This decom position is illustrated in Fig. 3. Multilinear Rank . T he m inimal size ( r 1 , r 2 , . . . , r d ) o f th e core tensor S f or (4) to ho ld is called the multilinear ran k of A , and it can be com puted as r 1 = ra nk ( A (1) ) , . . . , r d = rank ( A ( d ) ) . Note that A ( k ) is a matrix obtained by reshapin g (see App endix A) A alo ng its k th mod e. For the m atrix case we have th at r 1 = r 2 , i.e., the r ow rank equals the colu mn rank. Th is is n ot tru e anymore when d ≥ 3 . T ucker vs. CPD . The T ucker d ecompo sition can be con - sidered as an expansion in ran k-1 terms that is not necessarily canonical, while the CPD does n ot n ecessarily have a minimal core. This indicates the d ifferent usages of these two de com- positions: the CPD is typ ically used to decom pose data into interpretab le mo de vecto rs while the T ucker decom position is most often used to comp ress da ta into a ten sor o f smaller size. Unlike th e CPD, th e T ucker dec omposition is in gen eral not unique 2 . A variant of the T ucker decomp osition, called high -order singular value decomp osition (SVD) or HOSVD, is summa- rized in App endix C. D. T ensor T rain Decompo sition T ensor T rain (TT) Decompo sition . A ten sor tra in de com- position [84] repre sents a d -way tensor A by two 2 -way tensors an d ( d − 2 ) 3 -way tensors. Spe cifically , each en try of A ∈ R n 1 ×···× n d is expressed as a i 1 i 2 ··· i d = G (1) i 1 G (2) i 2 · · · G ( d ) i d , (5) where G ( k ) ∈ R r k − 1 × n k × r k is th e k -th core ten sor, r 0 = r d = 1 , and thus G (1) and G ( d ) are matrices. The vector ( r 0 , r 1 , · · · , r d ) is called the tensor train rank . E ach elemen t of the co re G ( k ) , deno ted as g ( k ) α k − 1 i k α k +1 has three in dices. By fixing the 2 nd index i k , we o btain a matr ix G ( k ) i k (or vector for k = 1 or k = d ). Computing T ensor T rain De compositions . Computing a tensor train decomp osition consists o f doing d − 1 co nsecutive reshaping s and low-rank matrix decompositions. An adv antage of tensor tr ain decom position is that a quasi-o ptimal appro xi- mation can be obtain ed with a given error b ound and with an automatic ra nk d eterminatio n [8 4]. 2 One can alw ays right-mult iply the fac tor matrices U ( k ) with any nonsin- gular matrix T ( k ) and multiply the core tensor S with their in verse s T ( k ) − 1 . This means that the subspaces that are defined by the factor m atrice s U ( i ) are in v ar iant while the bases in these subspaces can be chosen arbitrari ly . ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 5 E. Choice of T e nsor Decompo sition Methods Canonical and ten sor tr ain d ecompo sitions are pr eferred for high-o rder tensors since the their r esulting tensor factors have a low storage cost linearly dependent on n and d . F or some cases (e.g., f unctional a pprox imation), a tensor train deco mposition is pref erred due to a unique featu re, i.e., it can be implemen ted with cross appro ximation [87] an d witho ut knowing the whole tensor . This is very attractive, because in many cases ob taining a tensor element can be expen siv e. T ucker decompositions are m ostly applied to lower-order ten sors due to th eir storage cost of O ( r d ) , and are very u seful for finding the d ominan t subspace of some m odes such a s in data m ining applicatio ns. I V . T E N S O R C O M P L E T I O N ( O R R E C O V E RY ) T ensor decompo sition is a powerful too l to red uce storag e and comp utational cost, however mo st ap proach es need a whole tensor a-priori . In practice, o btaining each eleme nt of a tensor may req uire an expen siv e c omputer simulation or n on- trivial hard ware measurem ent. T herefor e, it is necessary to estimate a whole tensor b ased on only a small number of av ailable elem ents. Th is can be do ne by tensor c ompletion or tensor recovery . This idea finds applicatio ns in many fields. For instance in biom edical imagin g, one wants to recon struct the wh ole ma gnetic reson ance imagin g data set based on a few m easurements. In design space exploration, one m ay o nly have a small num ber of tensor elements obtained from c ircuit simulations, while all oth er sweeping samples in the param eter space mu st b e estimated . A. Ill-P o sed T en sor Completion/Recovery Let I includ e all indices for the elem ents of A , a nd its subset Ω h olds th e indice s of som e av ailable ten sor e lements. A projection operato r P Ω is d efined for A : B = P Ω ( A ) ⇔ b i 1 ··· i d = a i 1 ··· i d , if i 1 · · · i d ∈ Ω 0 , other wise . In tensor comp letion, one wants to find a ten sor X such that it m atches A for th e eleme nts specified b y Ω : k P Ω ( X − A ) k 2 F = 0 . (6) This pr oblem is ill-posed , because any value can be assigned to x i 1 ··· i d if i 1 · · · i d / ∈ Ω . B. Re gularized T ensor Completio n Regularization makes th e tensor comp letion p roblem we ll- posed by add ing c onstraints to (6). Several existing ideas are summarized below . • Nuclear -Norm Minimization. This idea search es for the minimal-ran k ten sor b y solv ing the p roblem: min X k X k ∗ s . t . P Ω ( X ) = P Ω ( A ) . (7) The n uclear no rm of a m atrix is the sum of all singular values, but the n uclear no rm of a tensor does n ot have a rigoro us or unified definitio n. In [88], [8 9], the tensor nu- clear nor m k X k ∗ is h euristically app roximated using th e weighted sum of matrix nuclear n orms o f X ( k ) ’ s for all modes. This h euristic m akes (7) conve x, and its o ptimal solution can be compu ted by available alg orithms [90], [91]. Note th at in (7) one has to com pute a full tenso r X , leading to an exponential co mplexity w ith respe ct to the o rder d . • App roximation with Fixed Ra nks. Some techniques compute a ten sor X by fixin g its tensor ran k. For instance, one can so lve the following prob lem min X k P Ω ( X − A ) k 2 F s . t . m ultilinear rank( X ) = ( r 1 , . . . , r d ) (8) with X param eterized by a pr oper low-multilinear ran k factorization. Kre sner et al. [92] computes the high er- order SVD r epresentation using Rieman nian optimiza- tion [9 3]. I n [94], the unknown X is param eterized by some tenso r-train factors. The low-rank factorization significantly reduc es the n umber of u nknown variables. Howe ver , how to ch oose an optimal tensor ran k still remains an o pen qu estion. • Probabilistic T ensor Completion. In ord er to auto - matically determ ine the tensor rank, so me prob abilis- tic approa ches based on Bayesian statistics ha ve been developed. Specifically , one may treat the tensor fac- tors as unknown rand om variables assigned with proper prior pr obability density functions to enforce low-rank proper ties. This idea has been ap plied successfully to obtain poly adic de composition [95], [96] and T ucker decomp osition [97] fro m inc omplete data with autom atic rank de termination . • Low-Rank a nd Sparse Co nstraints. I n som e cases, a low-rank tenso r A m ay have a spar se pro perty after a linear transfo rmation. Let z = [ z 1 , . . . , z m ] w ith z k = h A , W k i , one may find t hat many elements of z a re close to zero. T o e xploit th e lo w-rank and sparse pro perties simultaneou sly , the following op timization p roblem [98], [99] m ay b e solved: min X 1 2 k P Ω ( X − A ) k 2 F + λ m X k =1 | h X , W k i | s . t . multilinear rank( X ) = ( r 1 , . . . , r d ) . (9) In signal processing, z may repr esent the coefficients of m ultidimension al Fourier or wavelet transforms. In uncertainty quan tification, z collects the coefficients of a generalized p olynom ial-chaos expan sion. The formu la- tion (9) is gener ally non- conv ex, an d locating its global minimum is n on-trivial. C. Choice o f T enso r Re covery Method s Low-rank constrain ts have p roven to be a go od cho ice for instance in sign al an d image pro cessing (e.g., M RI reconstru c- tion) [88], [89], [92], [96]. Both lo w-rank and sparse properties may b e con sidered for high-dimension al fu nctional ap prox- imation (e.g ., polyn omial-chao s expansions) [98]. Nuclear- norm m inimization and p robab ilistic tensor comple tion are very attrac ti ve in the sen se th at tensor ran ks can be autom ati- cally d etermined, howe ver th ey a re not so efficient o r re liable ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 6 for high-o rder ten sor pro blems. It is expensive to evaluate the nuclear norm of a high-ord er tensor . R egarding probabilis- tic tensor co mpletion, imp lementation experien ce shows that many samples may be requir ed to ob tain an accur ate result. V . A P P L I C A T I O N S I N U N C E RTA I N T Y Q U A N T I FI C A T I O N T ensor techniques can advance th e r esearch of many EDA topics due to the ubiquitou s e xistence of high -dimension al problem s in th e ED A com munity , especially when consider- ing pro cess variations. This section summarizes som e recent progr ess on ten sor-based resear ch in solv ing high-d imensional uncertainty qu antification pro blems, and co uld b e used as guiding re ference for the effecti ve employm ent of tensors in other ED A pro blems. A. Uncertainty Qu antifica tion (UQ) Process variation is on e of the main source s causing yield degradation and chip failures. In or der to improve ch ip yield, efficient stochastic algorith ms are d esired in order to simu- late n ano-scale designs. The design pro blems ar e gen erally described by comp lex differential equ ations, and they have to be solved repea tedly in trad itional Monte -Carlo simulator s. Stochastic spectr al metho ds have emerged as a pr omis- ing ca ndidate du e to their hig h efficiency in EDA app lica- tions [16]–[26]. L et th e r andom vecto r ξ ∈ R d describe process variation. Un der some a ssumptions, an ou tput of interest (e.g. , chip fr equency) y ( ξ ) can be ap proxim ated b y a truncated generalized polyn omial-chao s expansion [100]: y ( ξ ) ≈ p X | α | =0 c α Ψ α ( ξ ) . (10) Here { Ψ α ( ξ ) } ar e orthon ormal polyn omial basis fun ctions; the ind ex vector α ∈ N d indicates th e poly nomial order, and its elem ent-wise su m | α | is bound ed by p . Th e co efficient c α can b e compu ted b y c α = E ( Ψ α ( ξ ) y ( ξ )) (11) where E d enotes expectation. Main Challenge . Stochastic spectral methods b ecome inef- ficient when there are many r andom par ameters, because ev al- uating c α in volv es a challengin g d -dimensiona l n umerical inte- gration. In hig h-dimen sional cases, Monte Carlo was regarded more efficient than stochastic spectral metho ds. Howe ver , we will show that with tenso r computation , stocha stic spectral methods can outper form Mo nte Carlo for som e challen ging UQ problem s. B. High-D S tochastic Collocation by T ensor Recovery Problem Description. I n stochastic collocation [1 01]– [103], (11) is e valuated by a quadratur e r ule. For instance, with n j integration samples a nd weights [ 104] p roper ly ch osen for each e lement o f ξ , c α can b e ev aluated by c α = h Y , W α i . (12) Here b oth Y and W α are ten sors of size n 1 × · · · × n d . The rank-1 tensor W α only dep ends on Ψ α ( ξ ) and qu adrature V dd Fig. 5. Schematic of a multistage CMOS ring oscillat or (with 7 inv erte rs). T ABLE II C O M PA R I S O N O F S I M U L AT I O N C O S T F O R T H E R I N G O S C I L L ATO R , U S I N G D I FF E R E N T K I N D S O F S T O C H A S T I C C O L L O C ATI O N . method tensor product sparse grid t ensor completion total samples 1 . 6 × 10 27 6844 500 weights, a nd thus is easy to c ompute. Obtaining Y exactly is almost impo ssible because it has th e values o f y at all integration samples. In stead of com puting all elements of Y by n 1 n 2 · · · n d numerical simu lations, we estimate Y using only a small num ber of (say , sev eral hund reds) simu lations. As shown in compressive sensing [71], the approxim ation (10) usua lly h as spar se struc tures, and thus the low-rank and sparse tensor c ompletion mo del (9) can b e u sed. Using tensor recovery , stochastic co llocation may requ ire o nly a few hundr ed simulation s, thu s can be very efficient for some h igh- dimensiona l p roblem s. Example [9 8], [99]. The CMOS rin g o scillator in Fig. 5 has 57 rand om parameter s describing thr eshold voltages, gate- oxide thickne ss, and effecti ve gate length/width. Since our focus is to ha ndle high dimensionality , all par ameters are assumed mutually in depend ent. W e aim to obtain a 2 nd -order polyno mial-chao s expan sion fo r its frequen cy by repe ated periodic steady- state simu lations. Three in tegration p oints are chosen for each par ameter, leading to 3 57 ≈ 1 . 6 × 10 27 sam- ples to simulate in standa rd stochastic collocation. Advanced integration rules such as sparse grid [105] still needs over 6000 simulations. As sho wn in T able II, with tensor completion (9), the tensor representing 3 57 solution samples can be well approx imated by u sing only 50 0 samples. As shown in Fig. 6, the optimization so lver converges after 4 6 iterations, a nd the tensor factors are comp uted with less than 1% relative errors; the o btained mod el is very sparse, and the obtained den sity function of the o scillator fr equency is very accu rate. Why Not Use T ensor Decomposition? Since Y is not giv en a prio ri , neither CPD n or T ucker decomp osition is fe asible here. For the above example o ur experim ents show th at tensor train decomp osition requ ires abo ut 10 5 simulations to obtain the low-rank factors with a cceptable accu racy , an d its co st is ev en higher than Monte Carlo. C. High-D Hierar chical UQ with T ensor T rain Hierarchical UQ . In a hier archical UQ framew ork, one estimates the high-level uncertainty of a large system that consists of se veral comp onents or subsystems by ap plying ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 7 iteration # 0 10 20 30 40 50 10 -5 10 0 10 5 10 10 10 15 10 20 10 25 relative error of tensor factors iteration # 0 10 20 30 40 50 10 1 10 2 10 3 10 4 10 5 10 6 10 7 cost funct. 0 200 400 600 800 1000 1200 1400 1600 1800 -0.5 0 0.5 1 1.5 2 2.5 3 gPC coeff. for α 0 frequency (MHz) 65 70 75 80 85 90 95 0 0.05 0.1 0.15 tensor recovery MC (5K samples) Fig. 6. T en sor-rec ov ery results of the ring oscillator . T op left: relati ve error of the tensor factors; top right: decrease of the cost function in (9); bottom left: sparsity of the obtained polynomia l-chaos expan sion; bottom right: obtained density function v .s. Monte Carlo using 5000 samples. stochastic spectr al method s at d ifferent levels of the d esign hierarchy . Assume th at several po lynom ial-chaos expansions are given in the for m (1 0), and e ach y ( ξ ) describ es the outp ut of a c ompon ent or subsystem. I n Fig. 7, y ( ξ ) is used as a ne w rand om input such that the system-level simulation can b e a ccelerated by igno ring the b ottom-level variations ξ . Howe ver , the q uadratur e samp les and basis fu nctions of y are unknown, and one must c ompute suc h inform ation using a 3-term recurr ence relatio n [1 04]. This requ ires evaluating the following numeric al integration with high accur acy: E ( g ( y ( ξ ))) = h G , W i , (13) where the elem ents of tenso rs G and W ∈ R ˆ n 1 ×···× ˆ n d are g ( y ( ξ i 1 ··· i d )) an d w i 1 1 · · · w i d d , respectively . Note that ξ i 1 ··· i d and w i 1 1 · · · w i d d are the d -dimensional nu merical quadr ature samples and weights, respectively . Choice of T ensor Decompositions. W e aim to obtain a low-rank re presentation of Y , such that G and E ( g ( y ( ξ ))) can be compute d easily . Due to the extrem ely high accuracy requirem ent in th e 3-term recu rrence relation [104], tensor completion method s are not f easible. Neither cano nical te nsor decomp osition nor T ucker decomposition is applicable here, as they need the whole high -way tensor Y befor e factorizatio n. T ensor-train decompo sition is a go od choice, sinc e it can co m- pute a high- accuracy lo w-rank representatio n without kn owing the wh ole tensor Y ; there fore, it was u sed in [1 8] to ac celerate the 3 -term recurr ence relation and the subseq uent hierarch ical UQ flow . Example. The tensor-train-b ased flow h as been applied to the oscillato r with four MEMS cap acitors and 1 84 random parameters shown in Fig. 8, which pr eviously could o nly be solved usin g rand om samplin g app roaches. I n [ 18], a sparse generalized po lynomial- chaos expansion was first computed as a stoch astic model fo r the ME MS c apacitor y ( ξ ) . The Fig. 7. Hierarchical uncertai nty quantificat ion. The stochastic outputs of bottom-le vel components/ de vice s are used as ne w random inputs for upper - le ve l uncerta inty analysis. discretization of y ( ξ ) on a 4 6 -dimen sional integration grid was represented by a tensor Y (with 9 integration points along each dimension) , then Y was approximated by a tensor train decom- position. After this approx imation, ( 13) was easily co mputed to obtain th e ne w orth onorm al polyno mials and quadratur e points f or y . Finally , a stochastic oscillator simulator [21] was c alled at the system level using the newly obtained basis function s and quadr ature points. As shown in T able III, this circuit was simulated by the tenso r-train-based h ierarchical approa ch in o nly 10 min in MA TLAB, whereas Monte Carlo with 5000 sample s re quired m ore th an 15 hou rs [18]. The variations o f the steady-state wa veforms from bo th m ethods are alm ost the same, cf. Fig. 9. V I . A P P L I C AT I O N S I N N O N L I N E A R C I R C U I T M O D E L I N G Nonlinear devices or circuits mu st be well modeled in ord er to en able efficient system-level simu lation and optimization . ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 8 C R V ctrl V dd L (W/L) n 10.83fF 1 10 Ω 0-2.5V 2.5V Parameter V alue 8.8nH 4/0.25 L L R V ctrl C m R ss M n M n I ss C m C V dd V 1 V 2 0.5mA I ss R ss 10 6 Ω C m C m RF Conducting Path Secondary Actuator RF Capacitor Contact Bump Primary Actuator Fig. 8. Left: the schematic of a MEMS switch acting as capaci tor C m , which has 46 process v ariati ons; right: an oscillator using 4 MEMS switches as capac itors (with 184 random parameters in total). T ABLE III S I M U L AT I O N T I M E O F T H E M E M S - I C C O - D E S I G N I N F I G . 8 method Monte Carlo proposed [18] total samples 15 . 4 hours 10 minutes 0 0.1 0.2 0.3 0.4 0.5 0.6 0 2 4 6 t [ns] (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0 2 4 6 t [ns] (b) Fig. 9. Real izati on of the steady-st ate wav eforms for the oscillator in Fig. 8. T op: tensor-ba sed hierarc hical approach; bottom: Monte Carlo. Capturing the (p ossibly h igh) nonline arity can re sult in hig h- dimensiona l p roblems. Fortunately , the multiway nature o f a tensor allows the easy capturing of high- order no nlinearities of an alog, m ixed-signal circuits and in MEMS design. A. Nonlinear Mo deling an d Mo del Order Redu ction Similar to th e T ay lor expansion , it is shown in [3 7], [38], [106]–[108] th at many non linear dyn amical systems can b e approx imated by expanding the nonlinear terms arou nd an equilibriu m point, leading to the following ordin ary differential equation ˙ x = Ax + B x 2 + C x 3 + D ( u ⊗ x ) + E u , (14) where the state vector x ( t ) ∈ R n contains the voltages an d/or currents inside a circuit netw ork, and the vector u ( t ) ∈ R m de- notes time-varying inpu t signals. The x 2 , x 3 notation r efers to repeated Kronecker pro ducts (cf. App endix A) . Th e matr ix A ∈ R n × n describes lin ear behavior , while the matrices B ∈ R n × n 2 and C ∈ R n × n 3 describe 2 nd- and 3 rd -orde r polyno mial approx imations of som e no nlinear b ehavior . Th e matrix D ∈ R n × nm captures the co upling b etween the state B x 2 V T B x 2 V V ^ B ^ x 2 ^ Fig. 10. Tradi tional project ion-base d nonlinea r model order reducti on meth- ods reduce a large system matrix B to a small but dense matrix ˆ B through an orthogonal projectio n matrix V . variables and input sign als and E ∈ R n × m describes how the input signals are injected into the circ uit. Th is differential equation will serve as the b asis in the fo llowing model orde r reduction ap plications. Matrix-based Nonlinea r Model Order Reduction . The idea of non linear model order reductio n is to extract a compact reduced -order mode l that acc urately ap proxim ates the inpu t- output relationship of the orig inal large nonline ar system. Sim- ulation of the reduc ed-ord er mod el is usually much faster , so that efficient an d r eliable system-level verification is obtained . For i nstance, projection- based nonline ar mod el order reductio n methods reduce the original system in ( 14) to a compact reduced model with size q ≪ n ˙ ˆ x = ˆ A ˆ x + ˆ B ˆ x 2 + ˆ C ˆ x 3 + ˆ D ( u ⊗ ˆ x ) + ˆ E u , (15) where ˆ x ∈ R q , ˆ A ∈ R q × q , ˆ B ∈ R q × q 2 , ˆ C ∈ R q × q 3 , ˆ D ∈ R q × q m and ˆ E ∈ R q × m . The red uction is achieved thr ough applying an ortho gonal pro jection matrix V ∈ R n × q on the system matr ices in (14). Fig. 1 0 illustrates ho w p rojection- based metho ds red uce B to a dense system matr ix ˆ B with a smaller size. Most tradition al matr ix-based we akly nonline ar m odel or- der red uction metho ds [36]–[40] suffer fr om the exponential growth of the size o f the redu ced sy stem matrices ˆ B , ˆ C , ˆ D . As a result, simulatin g high- order strong ly n onlinear reduced models is so metimes even slower than simulating the orig inal system. T ensor -based Nonlinear Model O rder Reduction . A tensor-based reductio n schem e was proposed in [1 09]. Th e ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 9 T ABLE IV C O M P U TATI O N A N D S T O R A G E C O M P L E X I T I E S O F D I FF E R E N T N O N L I N E A R M O D E L O R D E R R E D U C T I O N A P P R O AC H E S O N A q - S TA TE R E D U C E D S Y S T E M W I T H d T H - O R D E R N O N L I N E A R I T Y . Reduct ion methods Function e v aluat ion cost Jacobian matrix ev alua tion cost Storage cost Tra dition al matrix-base d method [36]–[40] O ( q d +1 ) O ( q d +2 ) O ( q d +1 ) T ensor-based method [109] O ( q dr ) O ( q 2 dr ) O ( q dr ) Symmetric tensor-ba sed method [110] O ( q r ) O ( q 2 r ) O ( q r ) B A D C E x T x . = x T x T x T + + u T + + (2-way) (3-way) (3-way) (4-way conceptual) (2- way) (a) B x T B V T ^ x T V T B ^ x T ^ (b) Fig. 11. T enso r structure s used in [109 ]. (a) T en sor representat ion of the original nonlinear system in (14); (b) tensor B is reduced to a compact tensor ˆ B with a projectio n m atrix V in [109 ]. coefficient matrices B , C , D of the p olyno mial system (14) were reshaped in to the respec ti ve tensors B ∈ R n × n × n , C ∈ R n × n × n × n and D ∈ R n × n × m , as demonstrated in Fig. 11(a ). These tensors we re the n d ecompo sed via e.g. CPD, T ucker or T ensor T rain rank-1 SVD, resulting in a tensor approx imation o f (14) a s ˙ x = Ax + [ [ B (1) , x T B (2) , x T B (3) ] ] + [ [ C (1) , x T C (2) , x T C (3) , x T C (4) ] ] + [ [ D (1) , x T D (2) , u T D (3) ] ] + E u , (16) where B ( k ) , C ( k ) , D ( k ) , d enote the k th- mode factor matrix from th e polyad ic decomp osition of the ten sors B , C , D re- spectiv ely . Consequ ently , the reduced -order mo del inhe rits the same tenso r structure a s (1 6) (with smaller sizes of the mode factors). If we take tensor B as an example, its reduction process in [109] is shown in Fig. 1 1(b). Computational a nd Storage Benefits . Unlike p revious matrix-ba sed appro aches, simulation of the tensor-structure reduced mod el completely av oids the overhead o f solvin g high-o rder dense system matrices, since the dense Kron ecker produ cts in (14) are resolved b y m atrix-vector multiplica tions between the mo de factor ma trices and the state vector s. There- fore, substantial im provement on efficiency can b e achieved. Meanwhile, th ese mode factor matrices can significantly re- duce the memory r equirem ent since they replace all dense tensors and can be red uced and stored befor ehand. T able IV shows th e compu tational complexities of fun ction and Jaco- bian matr ix e valuations when simulating a red uced model with d th-order n onlinearity , wh ere r deno tes the tensor rank used in the p olyadic decomp ositions in [109]. The stora ge co sts of those m ethods are also listed in th e last colum n o f T able IV. Symmetric T ensor -based Nonlinear Model Order Re- duction . A symmetric tensor-based order reduction m ethod in [110] f urther utilizes the all-but-first-mode partial symme try of the system tensor B ( C ), i.e., the m ode factors of B ( C ) are exactly the same, except for the first mode only . This partial sy mmetry prop erty is also kep t b y its reduced-ord er model. T he sy mmetric tenso r-based reduction me thod in [1 10] provides f urther im provements of computation performance and storage require ment over [109], as shown in the last row of T able IV. B. V olterra-Based Simu lation an d Id entificatio n for ICs V olterra theory has lon g be en u sed in an alyzing comm uni- cation systems and in non linear con trol [11 1], [1 12]. I t c an be r egarded as a kind of T aylo r series with m emory effects since its ev aluation at a p articular time poin t requ ires inp ut informa tion fro m the p ast. Given a cer tain inpu t and a black - box mo del of a no nlinear system with time/f requen cy-domain V olterra kernels, the outpu t respo nse can be co mputed by the summation o f a ser ies o f mu ltidimensional co n volutions. For instance, a 3 rd -orde r response can be written in a d iscretized form as y 3 [ k ] = M X m 1 =1 M X m 2 =1 M X m 3 =1 h 3 [ m 1 , m 2 , m 3 ] 3 Y i =1 u [ k − m i ] , (17) where h 3 denotes the 3 rd-or der V o lterra kern el, u is the dis- cretized inp ut an d M is the mem ory . Such a mu ltidimensiona l conv olution is usually done by multidim ensional fast Fourier transform s ( FFT) and inverse fast Fourier tr ansforms (I FFT). ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 10 L C C 1 2 3 v 1 *v 2 Z Z Z (a) (b) (c) Fig. 12. (a) System diagram of a 3 rd-order mixer circuit . The symbol Π denotes a mixer; (b) the equi v ale nt circuit of the mixer . Z = R = 50 Ω ; (c) the circuit schematic diagram of the low-pa ss filters H a , H b and H c , with L = 42 . 52 nH and C = 8 . 5 pF . Although the fo rmulation does no t pr eclude itself f rom mo del- ing strong non linearities, the exponential complexity growth in multidimen sional FFT/IFFT c omputatio ns results in the curse of d imensionality that fo rbids its p ractical im plementation . T ensor -V olterra Model- based Simulat ion . Obviously , the 3 rd-o rder V olterra kernel h 3 itself ca n be viewed as a 3 -way tensor . By compr essing the V olterra kernel into a poly adic decomp osition, it is proven in [1 13] that th e com putationa lly expensiv e mu ltidimensiona l FFT/IFFT can be r eplaced by a number of cheap one-d imensional FFT/I FFTs without com- promising much acc uracy . Computational and Storage Benefits . The c hosen rank for the polyadic decom position h as a significant impact o n both the acc uracy and efficiency o f th e simulation algo - rithm. In [113], the rank s were ch osen a p riori and it was demonstra ted that the compu tational complexity for the tensor- V olterra based method to calculate an d th-order response is in O (( R real + R imag ) dm log m ) , where m is the n umber of steps in the time/f requen cy axis, a nd R real and R imag denote the pre scribed ranks of the p olyadic decompo sition used for the real and imag inary parts of the V olter ra kern el, respectively . In contrast, th e complexity for the traditional multidimen sional FFT/IFFT a pproach is in O ( dm d log m ) . In addition, the tensor-V olterra m odel requires the storage of only the factor matrices in mem ory , with space co mplexity O (( R real + R imag ) dm ) , while O ( m d ) m emory is re quired for the conventional ap proach . In [11 3], the me thod was app lied to com pute the time- domain response of a 3 rd -order mixer s ystem sho wn in Fig. 12. The 3 rd -orde r response y 3 is simulated to a square pulse inp ut with m = 201 time steps. As shown in Fig. 13(a), a ra nk- 20 (or above) p olyadic d ecompo sition f or b oth the r eal and imaginary parts of the kernel tensor matche d the reference result from multidimensional FFT/IFFT f airly well. Fi gs. 13(b) and ( c) demonstra te a c ertain trade-o ff between the a ccuracy and efficiency wh en using different ranks for the poly adic decomp osition. Nonetheless, a 60x speed up is still achiev able for ranks aroun d 1 00 with a 0 .6% er ror . System Identifica tion . In [114]–[116], sim ilar tensor- V olterra models wer e used to iden tify the blac k-box V olterr a kernels h i . It was repo rted in [11 4]–[116] that given certain input and o utput data, id entification of th e kernels in the polyadic deco mposition for m could sign ificantly r educe the parametric c omplexity with goo d accuracy . V I I . F U T U R E T O P I C S : E D A A P P L I C AT I O N S This section de scribes some EDA pro blems that could be potentially solved with, or that cou ld benefit sig nificantly fr om employing tensors. Since many EDA pr oblems are charac- terized by high dimension ality , the potential application of tensors in EDA can b e vast an d is defin itely no t limited to the top ics summarized below . A. ED A Optimization with T ensors Many ED A p roblem s require s olving a large-scale optimiza- tion p roblem in the f ollowing f orm: min x f ( x ) , s . t . x ∈ C (18 ) where x = [ x 1 , · · · , x n ] denotes n d esign o r d ecision vari- ables, f ( x ) is a cost function (e.g., p ower con sumption of a chip, layo ut are a, signal delay) , and C is a feasible set specifying some design constraints. This formu lation can describe problem s such as circuit optimizatio n [64]–[ 66], placement [59], rou ting [60], and power managem ent [11 7]. The optimization prob lem (18) is compu tationally expensive if x ha s m any elements. It is po ssible to acceler ate the ab ove large-scale optimiza- tion p roblems by e xploiting tensors. By adding some e xtra variables ˆ x with ˆ n elements, one could fo rm a lo nger vector ¯ x = [ x , ˆ x ] such th at ¯ x h as n 1 × · · · × n d variables in total. Let ¯ X be a te nsor such that ¯ x = vec ( ¯ X ) , let x = Q ¯ x with Q being the first n r ows of an identity matr ix, th en (18) can be wr itten in th e following tensor fo rmat: min ¯ X ¯ f ( ¯ X ) , s . t . ¯ X ∈ ¯ C (19) with ¯ f ( ¯ X ) = f ( Q v ec ( ¯ X )) an d ¯ C = { ¯ X | Q vec ( ¯ X ) ∈ C } . Although problem (19) has mor e u nknown variables than (18), th e low-rank represen tation of tensor ¯ X may h av e much fewer unknown elements. There fore, it is high ly possible that solving ( 19) will requ ire much lower computa tional cost for lots of a pplications. B. High-Dimension al Mod eling an d Simu lation Consider a g eneral algebr aic equation resulting fro m a hig h- dimensiona l m odeling or sim ulation pr oblem g ( x ) = 0 , with x ∈ R N and N = n 1 × n 2 · · · × n d (20) which can be so lved by Newton’ s iteration. For simplicity , we assume n i = n . When an iterative linear eq uation solver is applied in side a Newton’ s iter ation, it is possible to solve this problem at the complexity o f O ( N ) = O ( n d ) . Howev er , since N is a n exponential fun ction of n , the iter ativ e m atrix so lver ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 11 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 1.2 x 10 −4 time (nsec) y 3 (V) Reference Rank 10 Rank 20 Rank 40 (a) 0 20 40 60 80 100 10 −2 10 −1 10 0 rank R=R real,3 =R imag,3 relative error (b) 0 20 40 60 80 100 0 100 200 300 400 rank R=R real,3 =R imag,3 speedup (c) Fig. 13. Numerical results of the mixer . (a) Time- domain results of y 3 computed by the method in [113] with diffe rent rank approximations; (b) relati v e errors of [113] with differe nt ranks; (c) speedups brought by [113] with dif ferent ranks. quickly beco mes in efficient as d increases. In stead, we rewrite (20) as the fo llowing eq uiv alent o ptimization p roblem: min x f ( x ) = k g ( x ) k 2 2 , s . t . x ∈ R N . This least-squ are optimization is a special ca se of ( 18), and thus th e tensor-based optimiz ation idea m ay be exploited to solve the above problem at the cost of O ( n ) . A potential applicatio n lies in the PDE or integral e quation solvers f or d evice simulation. Examp les inclu de the Maxwell equations for par asitic extra ction [31]–[33], the Navier-Stokes equation d escribing b io-MEMS [1 18], an d the Poisson equa- tion describing heating effects [119]. These problem s ca n be described as (20) after nu merical discretizatio n. Th e tenso r representatio n of x can be easily ob tained based on the n umer- ical discretization scheme. For instance, on a regular 3-D cubic structure, a finite-d ifference or finite-eleme nt discretization may use n x , n y and n z discretization ele ments in the x , y and z dir ections respectively . Consequen tly , x cou ld be comp actly represented as a 3 -way ten sor with size n x × n y × n z to explo it its low-rank p roperty in th e spatial domain . This idea ca n also be exploited to simulate multi-rate circuits or m ulti-tone RF circuits. I n both cases, the tensor r ep- resentation of x can be natu rally obtained based on the time- domain discretizatio n or m ulti-dimensio nal Fourier tran sform. In multi-to ne h armon ic balan ce [68], [6 9], the d imensionality d is the total numb er o f RF in puts. I n th e multi-time PDE solver [67], d is the n umber of time axes describ ing d ifferent time scales. C. Pr o cess V aria tion Modeling In order to charac terize the inter-die and in tra-die p rocess variations across a silicon wafer with k dice, on e may n eed to measur e each d ie with a n m × n array of devices or circuits [27]–[ 29]. Th e variations of a certain par ameter (e.g., transistor thresh old voltage) on the i th die can be described by matrix A i ∈ R m × n , and thus on e could d escribe th e whole-wafer variation b y stackin g all matrices into a tensor A ∈ R k × m × n , with A i being the i th slice. This re presentation is g raphically sh own in Fig. 1 4. Fig. 14. Repre sent multiple testing chips on a wafer as a single tensor . Each slice of the tensor captures the spatial vari ations on a single die. Instead of measuring each device on ea ch d ie (which requires k mn measur ements in total), o ne cou ld measure on ly a few devices on each wafer , then estimate the full-wafer variations using ten sor completion . On e m ay employ conve x optimization to locate the g lobally optimal solution of this 3 - way tensor comp letion pro blem. V I I I . F U T U R E T O P I C S : T H E O R E T I C A L C H A L L E N G E S T ensor theory is b y itself an active research topic. This section sum marizes some th eoretical open p roblems. A. Challenges in T ensor Decomp osition Polyadic and ten sor train d ecompo sitions are preferr ed f or high-o rder tensors du e to th eir better scalability . In spite o f their better com putational scalability , the f ollowing challenges still exist: • Rank Deter mination in CPD. The tensor r anks ar e usually determined by two method s. First, one may fix the rank and search for the tensor factors. Secon d, o ne may increase th e ran k in crementally to achieve an acce ptable accuracy . Neither m ethods are optimal in the theoretical sense. • Optimization in P olyadic D ecomposition. Most rank - r poly adic d ecompo sition alg orithms employ alterna ting least-squares ( ALS) to solve non- conv ex optimizatio n problem s. Su ch schem es do not guarante e the global ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 12 optimum , and thus it is h ighly de sirable to develop global optimization algorithm s f or th e CPD. • Faster T ensor T ra in Decomposition. Computin g the tensor train d ecompo sition r equires the co mputation of many low-rank decompo sitions. The state-of-th e-art im - plementation emp loys “cross approx imation” to p erform low-rank ap proxim ations [ 87], but it still ne eds too ma ny iterations to find a “g ood” r epresentation . • Preserving T ensor Structures and/or P roperties. In some cases, the g i ven tensor m ay have som e special proper ties such as symmetry or non-n egati veness. T hese proper ties ne ed to be preserved in their d ecompo sed forms for specific ap plications. B. Challenges in T ensor Co mpletion Major challenges of tensor completio n in clude: • A ut omatic Rank Determination. In high-dim ensional tensor completio n, it is impor tant to determin e the ten- sor rank automatica lly . Althou gh so me p robabilistic ap- proach es su ch as variational Bayesian metho ds [96], [97] have been r eported , they ar e gener ally not robust for very high-o rder tenso rs. • Con vex T ensor Co mpletion. Most tensor com pletion problem s are form ulated as non-co n vex o ptimization problem s. Nuclea r-norm m inimization is co n vex, b ut it is only app licable to low-order tensors. De veloping a scal- able co n vex formulation for th e minimal- rank comp letion still remains an open pro blem fo r hig h-ord er cases. • Robust T ensor Completio n. In p ractical tensor c om- pletion, the a vailable tensor elements f rom measurem ent or simu lations can be noisy or even wrong. F or these problem s, the developed tensor completion algorithms should b e robust against noisy input. • Optimal Selection of Samples. T wo critical f undam ental questions shou ld be addressed. First, how many samples are requ ired to (faithfully ) recover a tensor? Second , h ow can we select th e samples o ptimally? I X . C O N C L U S I O N By exploiting low-rank and o ther prope rties of tensors (e.g., sparsity , symmetr y), the storage and computational cost of many challeng ing ED A proble ms can be significantly r educed. For instance, in the high-dim ensional stochastic co llocation modeling o f a CMOS ring oscillator, exploitin g tensor com- pletion require d only a few hu ndred circu it/device simulation samples vs. the huge nu mber of simulations (e. g., 10 27 ) required by standard appro aches to build a stoch astic model of similar a ccuracy . When ap plied to hier archical uncertain ty quantification , a tensor-train ap proach allowed the ea sy ha n- dling of an extrem ely challeng ing MEMS/IC co-desig n pr ob- lem with over 180 unco rrelated ra ndom parameter s describ ing process variations. In nonlinear model order reduction, the high-o rder nonlin ear terms were easily approxim ated by a tensor-based pr ojection framew ork. Finally , a 60 × speedup was ob served when using tensor co mputation in a 3rd-o rder V olterra-series n onlinear modeling examp le, while maintain- ing a 0 . 6 % relativ e error compared with the conventional FFT/IFFT ap proach. These ar e just few initial representative examples for the hug e potential th at a tensor computation framework can offered to EDA a lgorithms. W e belie ve that the space of EDA app lications th at could benefit fr om the use of tenso rs is vast an d re mains mainly un explored, ranging from EDA optimization p roblems, to device field solvers, and to pr ocess variation modeling . A C K N O W L E D G E M E N T This work was supported in p art by th e NSF NEEDS progr am, th e AIM Ph otonics Program , the Hong K ong Re- search Gran ts Counc il u nder Project 172 1231 5, the Un iv ersity Research Committee of Th e University of Ho ng Kong, and the MI T Greater China Fu nd for Innovation. A P P E N D I X A A D D I T I O N A L N OTA T I O N S A N D D E FI N I T I O N S Diagonal, Cubic a nd Symmetric T ensors . Th e diag onal entries of a tensor A are the en tries a i 1 i 2 ··· i d for which i 1 = i 2 = · · · = i d . A tensor S is diagona l if all o f its non-d iagonal e ntries are zero . A cu bical ten sor is a tensor for which n 1 = n 2 = · · · = n d . A cubical tensor A is symmetric if a i 1 ··· i d = a π ( i 1 ,...,i d ) where π ( i 1 , . . . , i d ) is any per mutation of th e indices. The Kronecker product [ 120] is d enoted b y ⊗ . W e use the notation x d = x ⊗ x ⊗ · · · ⊗ x f or the d -times repeated Kronecker prod uct. Definition 5: Reshaping. Reshap ing, also c alled unfolding , is an other often u sed tensor o peration. The most commo n reshaping is th e ma tricization, wh ich reord ers the entr ies of A into a matrix. The m ode- n m atricization o f a tensor A , denoted A ( n ) , rear ranges the entries of A such that the r ows of the r esulting matr ix are ind exed by the n th tensor index i n . The r emaining indices ar e gr ouped in ascen ding or der . Example 2: The 3 - way ten sor of Fig. 1 can be reshap ed as a 2 × 12 matrix or a 3 × 8 matr ix, an d so forth . Th e mo de- 1 and mode- 3 un folding s are A (1) = 1 4 7 10 1 3 16 19 2 2 2 5 8 11 1 4 17 20 2 3 3 6 9 12 1 5 18 21 2 4 , A (3) = 1 2 3 4 · · · 9 10 11 12 13 14 15 16 · · · 21 2 2 23 24 . The column ind ices of A (1) , A (3) are [ i 2 i 3 ] an d [ i 1 i 2 ] , resp ec- ti vely . Definition 6 : V ecto rization. Another important resh aping is the vectorization. The vectorization of a tensor A , deno ted vec ( A ) , rear ranges its entries in one vector . Example 3: For the tensor in Fig. 1 , we have vec ( A ) = 1 2 · · · 2 4 T . A P P E N D I X B C O M P U TA T I O N A N D V A R I A N T S O F T H E P O LY A D I C D E C O M P O S I T I O N Computing Polyadic Dec ompositions . Since the tensor rank is not kn own a pr iori, in pr actice, o ne u sually co mputes ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 13 a lo w-rank r < R approx imation of a given tensor A by minimizing the Frobeniu s norm of the d ifference betwe en A and its ap proxim ation. Specifically , the u ser specifies r and then solves the minim ization pro blem argmin D , U (1) ,..., U ( d ) || A − [ [ D ; U (1) , . . . , U ( d ) ] ] || F where D ∈ R r × r ×···× r , U ( i ) ∈ R n i × r ( i = { 1 , . . . , d } ) . One can then incremen t r a nd com pute new app roximatio ns until a “good eno ugh” fit is ob tained. A c ommon method for solv ing this optim ization problem is the Alternating Least Squ ares (ALS) metho d [82]. Other popular optimization algorithm s are nonlinear conjugate gr adient methods, q uasi-Newton or nonlinear least squares ( e.g. Levenberg-Marquard t) [121]. Th e computatio nal complexity per iteration of the ALS, Le venberg- Marquar dt (LM) and Enhan ced Line Search (EL S) method s to compute a poly adic decomp osition of a 3 -way tensor, wh ere n = min ( n 1 , n 2 , n 3 ) , are given in T able V. T ABLE V C O M P U TATI O N A L C O S T S O F 3 T E N S O R D E C O M P O S I T I O N M E T H O D S F O R A 3 - WA Y T E N S O R [ 1 2 2 ] . Methods Cost per iterati on ALS ( n 2 n 3 + n 1 n 3 + n 1 n 2 )(7 n 2 + n ) + 3 nn 1 n 2 n 3 LM n 1 n 2 n 3 ( n 1 + n 2 + n 3 ) 2 n 2 ELS (8 n + 9) n 1 n 2 n 3 T wo variants of the p olyadic deco mposition are summarized below . 1) P AR A TREE o r tensor-train rank-1 SVD (TT r1SVD): This po lyadic decomp osition [123], [1 24] consists of ortho g- onal rank-1 terms and is co mputed by consecutive re shapings and SVDs. This com putation implies th at the obtained decom - position d oes not n eed an initial guess and will be un ique for a fixed order of in dices. Similar to SVD in the matrix case, this decomp osition has an appro ximation e rror easily expressed in terms of the σ i ’ s [ 123]. 2) CPD for S ymmetric T enso rs: The CPD of a symmetr ic tensor does not in g eneral re sult in a summa tion of symme tric rank-1 terms. In some a pplications, it is mo re mea ningfu l to enforce the symme tric co nstraints exp licitly , an d wr ite A = P R i =1 λ i v d i , wher e λ i ∈ R , A is a d -way sym metric tensor . Here v d i is a shorthan d for the d - way o uter pr oduct o f a vector v i with itself, i.e. , v d i = v i ◦ v i ◦ · · · ◦ v i . A P P E N D I X C H I G H E R - O R D E R S V D The Higher-Order SVD (H OSVD) [125] is ob tained f rom the T ucker d ecompo sition when the factor ma trices U ( i ) are orthog onal, when any two slices of th e core te nsor S i n the same mode are orthogo nal, h S i k = p , S i k = q i = 0 if p 6 = q for any k = 1 , . . . , d , and when the slices of the core tensor S in the same mod e are ordere d accord ing to their Frobeniu s n orm, || S i k =1 || ≥ || S i k =2 || ≥ · · · ≥ || S i k = n k || for k = { 1 , . . . , d } . Its c omputatio n consists of d SVDs to com pute the factor matrices and a contractio n of their in verses with th e origina l tensor to co mpute the HOSVD core tensor . For a 3- way ten sor this entails a comp utational co st of 2 n 1 n 2 n 3 ( n 1 + n 2 + n 3 )+ 5( n 2 1 n 2 n 3 + n 1 n 2 2 n 3 + n 1 n 2 n 2 3 )2( n 3 1 + n 3 2 + n 3 3 ) / 3( n 3 1 + n 3 2 + n 3 3 ) / 3 [122]. R E F E R E N C E S [1] L. Nagel and D. O. Pederson, “SPICE (Simulatio n Program with Inte grated Circuit Emphasis), ” Unive rsity of California, Berkele y , T ech. Rep., April 1973. [2] C.-W . Ho, R. Ruehli, and P . Brennan, “The modified nodal approach to networ k analysis, ” IEEE T ran s. Cir cuits Syst. , vol. 22, no. 6, pp. 504–509, June 1975. [3] K. Kundert, J. K. White, and A. Sangiov anni -V incenteli, Steady-st ate methods for si mulation analog and micr owave cir cuits . Kluwer Academic Publishers, Boston, 1990. [4] T . Aprille and T . Tr ick, “Stea dy-state analysis of nonline ar circuits with periodi c inputs, ” IEEE Pro c. , vol. 60, no. 1, pp. 108–114, J an. 1972. [5] ——, “ A computer algorith m to determine the steady-stat e response of nonline ar oscilla tors, ” IEEE T rans. Circui t Theory , vol. CT -19, no. 4, pp. 354–360, July 1972. [6] K. Kundert , J. White, and A. Sangiov anni-V in cente lli, “ An env elo pe- follo wing method for the ef ficient transient simulation of switching po wer and filter circuits, ” in Proc. Int. Conf. Compute r-Aided Design , 1988 Nov . [7] L. Petzold, “ An ef ficient numerical met hod fo r high ly oscil latory ordinary dif ferent ial equations, ” SIAM J. Numer . Anal. , vol . 18, no. 3, pp. 455–479, June 1981. [8] A. Demir , A. Mehrotra, and J. Roycho wdhury , “Phase noise in oscil- lators: A unifying theory and numerical methods for characte rizat ion, ” IEEE T rans. Circuit s Syst. I: Fundamental Theory and Applicati ons , vol. 47, no. 5, pp. 655–674, 2000. [9] J. N. K ozh aya, S. R. Nassif, and F . N. Najm, “ A multigri d-lik e techni que for power grid analysi s, ” IE EE T r ans. CAD of Inte gr . Circ uits Syst. , vol. 21, no. 10, pp. 1148–1160, 2002. [10] T .-H. Chen and C. C.-P . Chen, “Ef ficie nt large- scale powe r grid anal ysis based on preconditione d Krylov-subspa ce iterati ve methods, ” in Proc. Design Automation Conf. , 2001, pp. 559–562. [11] Z. Feng and P . Li, “Multigri d on GPU: tackl ing powe r grid analysis on parall el SIMT platforms, ” in P r oc. Intl. Conf. Computer-Aided Design , 2008, pp. 647–654. [12] R. T eli che v esky and J. K. White, “Effic ient steady-state analysis based on m atrix-fre e Krylov-subpsa ce methods, ” in Pr oc. Design Automati on Conf . , J une 1995, pp. 480–484. [13] X. L iu, H . Y u, and S. T an, “ A GPU-accelerat ed pa ralle l shooting algorit hm for analysis of radio frequenc y and microw a ve integrate d circui ts, ” IEE E T ra ns. VLSI , vol. 23, no. 3, pp. 480–492, 2015. [14] S. W einzierl, “Introducti on to Monte Carlo methods, ” theory Group, The Netherland s, T ech. Rep. NIKHEF-00-012, 2000. [15] A. Singhee and R. A. Rutenba r , “Why Quasi-Monte Carlo is better than Monte Carlo or Latin hypercube sampling for statistica l circui t analysi s, ” IEEE T rans. CAD of Inte gr . Circui ts Syst. , vol. 29, no. 11, pp. 1763–1776, 2010. [16] Z. Z hang, X. Y ang, G. Marucci, P . Maffezz oni, I. M. Elfad el, G. Kar- niadaki s, and L. Daniel, “Stochasti c testing simulator for inte grate d circui ts and MEMS: Hierarchica l an d sparse technique s, ” in Proc. Custom Inte gr . Circui ts Conf. San Jose, CA, Sept. 2014, pp. 1–8. [17] Z. Z hang, I. A. M. Elfadel, and L . Daniel , “Uncertai nty quantifica tion for integrate d circuits: Stochastic spectral methods, ” in Proc . Int. Cont. Computer -Aided Design . San Jose, CA, Nov 2013, pp. 803–810. [18] Z. Zhang, I. Osledets, X. Y ang, G. E. Karniadakis, and L. Daniel, “Enablin g high-dimension al hierarc hical uncertai nty quantification by ANO V A and tensor-tr ain decomposition, ” IEEE T r ans. CAD of Inte gr . Cir cuit s Syst. , vol. 34, no. 1, pp. 63 – 76, Jan 2015. [19] T .-W . W eng , Z. Zhang, Z . Su, Y . Marzouk, A. Melloni , and L. Daniel, “Uncert ainty quantificat ion of silicon photonic device s with correlate d and non-Gaussian random paramete rs, ” Optics Express , vol. 23, no. 4, pp. 4242 – 4254, Feb 2015. [20] Z. Zhang , T . A. El-Moselhy , I. A. M. Elfadel, and L. Daniel, “Stoc hastic testing method for transistor-l e ve l uncertainty quantifica tion based on general ized polynomia l chaos, ” IEEE T rans. CAD Inte gr . Circuit s Syst. , vol. 32, no. 10, Oct. 2013. [21] Z. Zhang, T . A. El-Moselhy , P . Maf fezzoni , I. A. M. Elfadel , and L. Daniel, “Efficien t uncertainty quantific ation for the periodic steady state of forced and autonomous circuits, ” IEE E T rans. Cir cuits Syst. II: Exp. Briefs , vol. 60, no. 10, Oct. 2013. ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 14 [22] R. Pulch, “Modelling and simulation of autonomous oscillato rs with random parameters, ” Math. Computers in Simulation , vol. 81, no. 6, pp. 1128–1143, Feb 2011. [23] J. W ang, P . Ghanta , and S. Vrudhula , “Stochasti c analysis of inter- connec t performance in the presence of process varia tions, ” in Proc. Design Auto Conf . , 2004, pp. 880–886. [24] S. V rudhula, J. M. W ang, and P . Ghanta, “Hermite polynomial based interc onnect analysis in the presence of process variat ions, ” IEEE T rans. CA D Inte gr . Cir cuit s Syst. , vol . 25, no. 10, pp. 2001–2011, Oct. 2006. [25] M. Rufui e, E. Gad, M. Nakhla , R. Achar , and M. F arhan, “F ast va riabil - ity analysis of general nonlinear circuits using decouple d polynomial chaos, ” in W orkshop Signal and P ower Inte grit y , May 2014, pp. 1–4. [26] P . Manfredi, D. V . Ginste, D. D. Z utter , and F . Canav ero, “Stochasti c modeling of nonline ar circuits via SPICE-compatible spectral equi v a- lents, ” IEEE T ran s. Cir cuits Syst. I: Re gul ar P apers , vol. 61, no. 7, pp. 2057–2065, July 2014. [27] D. S. Boning, K. Balakrishnan, H. Cai, N. Drego, A. Farahanch i, K. M. Gettings, D. Lim, A. Somani, H. T aylor , D. Truque, and X. Xie, “V ariation, ” IEEE T rans. Semicond. Manuf. , vol. 21, no. 1, pp. 63–71, Feb . 2008. [28] L. Y u, S. Saxena, C. Hess, A. Elfadel , D. Antonia dis, and D. Boning, “Remembran ce of transistors past: Compact model parameter extrac- tion using Bayesia n inference and incomplete ne w measurements, ” in Pr oc. Design Automation Conf , 2014, pp. 1–6. [29] W . Zhang, X. Li, F . Liu, E. Acar , R. A. Rute nbar , and R. D. Blanto n, “V irtual probe: A statistica l frame w ork for low-c ost silicon charac teriz ation of nanoscale inte grate d circuits, ” IEE E T rans. CA D of Inte gr . Circ uits Syst. , vol. 30, no. 12, pp. 1814–1827, 2011. [30] Y . S. Chauhan, S. V enugopalan, M. A. Karim, S. Khandelw al, N. Pay- dav osi, P . Thakur , A. M. Nikneja d, and C. C. Hu, “BSIM–industry standard compact MOSFE T models, ” in Proc. E SSCIRC , 2012, pp. 30–33. [31] K. Nabors and J. White, “FastCap: a multip ole accelerat ed 3-D capac itanc e extract ion program, ” IEEE T r ans. CAD of Inte gr . Circ uits Syst. , vol. 10, no. 1, pp. 1447–1459, Nov 1991. [32] M. Kamon, M. J. Tsuk, and J. K. W hite, “F A STHENR Y : a multipole - accel erate d 3-D inductan ce extrac tion program, ” IEE E T rans. Micr ow . Theory T ech. , vol. 42, no. 9, pp. 1750–1758, Sept. 1994. [33] J. Phillip s and J. K. White, “A precorrecte d-FFT method for electro- static analysis of complicated 3-D s tructure s, ” IE EE T rans. CAD of Inte gr . Circ uits Syst. , vol. 16, no. 10, pp. 1059–1072, Oct 1997. [34] A. Odaba sioglu, M. Cel ik, and L. T . Pile ggi, “PRIMA: P assi ve reduc ed- order interconne ct macromodeli ng algorith m, ” IEEE T r ans. CAD of Inte gr . Circ uits Syst. , vol. 17, no. 8, pp. 645–654, Aug. 1998. [35] J. R. Phillips, L . Daniel, and L. M. Silveira , “Guarantee d passiv e balanc ing transformations for m odel order reduction, ” IEE E T ran s. CAD of Inte gr . Cir cuit s Syst. , vol. 22, no. 8, pp. 1027–1041, Aug. 2003. [36] J. Royc ho wdhury , “Reduced- order modeling of time-v aryi ng systems, ” IEEE T ran s. Cir cuits and Syst. II: Analog and Digital Signal P r ocess. , vol. 46, no. 10, pp. 1273–1288, Oct 1999. [37] P . Li and L. Pileggi , “Compact reduced-order modeling of weakly nonline ar analog and RF circuits, ” IE EE T rans. CAD of Inte gr . Cir cuit s Syst. , vol. 24, no. 2, pp. 184–203, F eb . 2005. [38] J. R. Phillips, “Project ion-base d approache s for m odel reduction of weakly nonlinear time-v aryin g systems, ” IEEE T rans. Comput.-Aided Design Inte gr . Circuit s Syst. , vol. 22, no. 2, pp. 171–187, Feb . 2003. [39] C. Gu, “QLMOR: a projectio n-based nonlinear model order reducti on approac h using quadratic-li near represen tatio n of nonlinea r systems, ” IEEE T r ans. Comput.-Aided Design Inte gr . Circui ts Syst. , vol. 30, no. 9, pp. 1307–1320, Sep. 2011. [40] Y . Z hang, H. L iu, Q. W ang, N. Fong, and N. W ong, “Fast nonlinear model order reduction via associate d transforms of high-order Volterra transfer functions, ” in Proc . Design Autom. Conf. , J un. 2012, pp. 289– 294. [41] B. N. Bond and L. Daniel, “Stable reduced models for nonlinea r descript or systems through piece wise-lin ear approximatio n and projec - tion, ” IEE E T rans. CAD of Inte gr . Cir cuits and Syst. , vol. 28, no. 10, pp. 1467–1480, 2009. [42] M. Rewien ski and J . White, “ A trajectory piece wise-l inear approach to model order reduction and fast simulation of nonlinea r circuits and micromachi ned devi ces, ” IEEE T ran s. Comput.-Aided Design Inte gr . Cir cuit s Syst. , vol. 22, no. 2, pp. 155–170, Feb . 2003. [43] B. Gustavsen and S. Semlyen, “Rational approximat ion of frequency domain responses by vect or fitting, ” IEEE T r ans. P owe r Delivery , vol. 14, no. 3, p. 10521061, Aug. [44] S. Griv et-T alocia , “Passi vity enforcement via perturbation of Hamilto- nian m atric es, ” IEEE T ra ns. Circui ts and Systems I: Regul ar P apers , vol. 51, no. 9, pp. 1755–1769, Sept. [45] C. P . Coelho, J. Phillips, and L. M. Silveira, “ A con v ex programming approac h for generatin g guarantee d passive appr oximatio ns to tab ulat ed frequenc y-da ta, ” IEEE Tr ans. CAD of Inte gr . Circuit s Syst. , vol. 23, no. 2, pp. 293–301, F eb . 2004. [46] B. N. Bond, Z. Mahmood, Y . Li, R. Sredoje vic, A. Megretsk i, V . Sto- jano vic, Y . A vniel, and L . Daniel, “Compact m odelin g of nonlinear analog circu its using system identifica tion via semidefinite programing and incremental stabili ty certific ation, ” IEEE T rans. CAD of Inte gr . Cir cuit s Syst. , vol. 29, no. 8, p. 11491162, Aug. [47] L. Daniel, C. S. Ong, S. C. Low , K. H. L ee, and J. White, “ A multi- paramete r moment-matchin g m odel-red uction approac h for generati ng geometri cally parameterized interconnect performance models, ” IEEE T rans. CAD of Inte gr . Circuit s Syst. , vol. 23, no. 5, pp. 678–693, May 2004. [48] ——, “Geometri cally parameterized interconne ct performance models for interconnec t synthesis, ” in Pr oc. IEEE/ACM Intl. Symp. Physical Design , May 2002, pp. 202–207. [49] K. C. Sou, A. Megretski, and L . Daniel, “ A quasi-con v e x optimizati on approac h to parameteriz ed model order reductio n, ” IEE E T ran s. CAD of Inte gr . Circui ts Syst. , vol. 27, no. 3, pp. 456–469, March 2008. [50] B. N. Bond and L. Daniel, “Parameter ized model order reduction of nonline ar dynamical systems, ” in Pr oc. Intl. Conf. Computer Aided Design , Nov . 2005, pp. 487–494. [51] ——, “ A piece wise- linea r moment-matchin g approach to paramet erize d model-orde r reduction for highly nonlinear systems, ” IEEE T ran s. CAD of Inte gr . Circui ts Syst. , vol. 26, no. 12, pp. 2116–2129, 2007. [52] T . Moselhy and L. Daniel, “V ariation- aw are interconn ect extract ion using statistica l moment preserving model order reducti on, ” in Pr oc. Design, Autom. T est in Euro pe , Mar . 2010, pp. 453–458. [53] F . F erranti , L. Knockaert, and T . Dhaene, “Guarante ed passi ve pa- rameteri zed admittance -based macromodel ing, ” IEEE T rans. Advanced P acka g. , vol. 33, no. 3, pp. 623–629, 2010. [54] J. F . V illen a and L. M. Silveira , “SP ARE–a scalable algorit hm for passi ve , structure preserving, parameter -aw are model order reductio n, ” IEEE T rans. CAD of Inte gr . Cir cuit s Syst. , vol. 29, no. 6, pp. 925–938, 2010. [55] L. M. Silveir a and J . R. Phillips, “Resamplin g plans for sample point select ion in multipoint model-order reduction, ” IE EE T ran s. CAD of Inte gr . Circuit s Syst. , vol. 25, no. 12, pp. 2775–2783, 2006. [56] S. Boyd and L. V andenberghe , Con ve x Optimization . Cambridge Uni ve rsity P ress, 2004. [57] L. V andenberghe and S. Boyd, “Semidefini te programming, ” SIAM Revie w , vol. 38, no. 1, pp. 49–95, 1996. [58] D. P . Bertsekas, Nonlinear pr ogr amming . Athena Scientific, 1999. [59] K. Shahookar and P . Mazumder , “VLSI cell placemen t technique s, ” ACM Comput. Surveys , vol. 23, no. 2, pp. 143–220, 1991. [60] J. Cong, L. He, C.-K. Koh, and P . H. Madden, “Performanc e opti- mizatio n of VL SI interco nnect layout, ” Inte gr ation, the VLSI Journal , vol. 21, no. 1, pp. 1–94, 1996. [61] G. De Micheli, Synthesis and Optimizat ion of Digital Circuit s . McGra w-Hill, 1994. [62] G. Gielen, H. W alsc harts, and W . Sansen, “ Analog circuit design optimiza tion based on symbolic simulation and simulated annealing, ” IEEE J . Solid-State Cir cuits , vol. 25, no. 3, pp. 707–713, 1990. [63] W . Cai, X. Zhou, and X. Cui, “Optimiz ation of a GPU implementation of multi-dime nsional RF pulse design algorithm, ” in Bioinformatic s and Biomedical Engineering, IEEE Intl. Conf. on , 2011, pp. 1–4. [64] M. H ershenson, S . P . Boyd, and T . H. Lee, “Optimal design of a CMOS op-amp via geometric programming, ” IEEE T rans. CAD of Inte gr . Circuit s Syst. , vol. 20, no. 1, pp. 1–21, 2001. [65] X. Li, P . Gopalakrishna n, Y . Xu, and T . Pileggi , “Robust analog/RF circui t design with projection -based posynomial modeling, ” in Pr oc. Intl. Conf. Computer -aide d design , 2004, pp. 855–862. [66] Y . X u, K.-L . Hsiung, X. Li, I. Nausieda, S. Boyd, and L. Pile ggi, “OPERA: optimizat ion with ellipsoidal uncertainty for robust analog IC design, ” in Proc . Design Autom. Conf . , 2005, pp. 632–637. [67] J. Roycho wdhury , “ Analyzing circ uits with widely separate d time scales using numerical PDE m ethods, ” IEE E T ra ns. Circui ts Syst.: Fundamental Theory and Application s , vol. 48, no. 5, pp. 578–594, May 2001. [68] R. C. Melville , P . Feldmann, and J. Roycho wdhury , “Effic ient multi- tone distortion analysi s of analog inte grate d circuits, ” in Proc. Custom Inte gr . Circuit s Conf. , May 1995, pp. 241–244. ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 15 [69] N. B. De Carv alho and J. C. Pedro, “Multi tone frequenc y-domain simulatio n of nonlinear circuit s in large- and sm all-signa l regimes, ” IEEE T r ans. Micr owave Theory and T echni ques , vol. 46, no. 12, pp. 2016–2024, Dec 1998. [70] M. Bonnin and F . Corinto , “Phase noise and noise induced frequency shift in stochastic nonlinear oscillator s, ” IEEE T rans. Cir cuits Syst. I: Re gular P apers , vol. 60, no. 8, pp. 2104–2115, 2013. [71] X. L i, “Finding determinist ic solution from underdete rmined equation : larg e-scale performance m odeling of analog/RF circui ts, ” IEEE T rans. CAD of Inte gr . Circui ts Syst. , vol. 29, no. 11, pp. 1661–1668, Nov . 2011. [72] T . Kol da and B. Bader , “T en sor decompositio ns and applicat ions, ” SIAM Revie w , vol. 51, no. 3, pp. 455–500, 2009. [73] A. Cichoc ki, D. Mandic, L. De Lathauwer , G. Zhou, Q. Zhao, C. Ca- iafa , and H. A. Phan, “T ensor decomposit ions for signal processin g applic ations: From two-w ay to multiway component analysis, ” IEEE Signal Pro cess. Mag. , vol. 32, no. 2, pp. 145–163, March 2015. [74] N. V ervliet, O. Debals, L . Sorber , and L. D. Lathauwer , “Breaking the curse of dimensionalit y using decompositions of incomplete tensors: T ensor-based scienti fic computing in big data analysis, ” IEEE Signal Pr ocess. Mag . , vol. 31, no. 5, pp. 71–79, Sep. 2014. [75] B. W . Ba der , T . G. Kol da et al. , “MA TL AB T ensor T oolbox V ersion 2.6, ” February 2015. [Online]. A va ilabl e: http:/ /www .sandia.gov/ ∼ tgko lda/T ensorT oolbox/ [76] N. V ervliet, O. Debals, L. Sorber , M. V an Barel, and L. De Lathauwer . (2016, Mar .) T ensorlab 3.0. [Online]. A vail able: http:/ /www .tensorlab .net [77] I. Oseledets, S. Dolgov , V . Kaze e v , O. Lebede v a, and T . Ma ch. (2012) TT -T oolb ox 2. 2. [Online]. A v ai lable : http:/ /spring.inm.ras.ru/o sel/do wnload/tt22.zip [78] A. N ovik o v , D. Podoprikhin, A. Osokin, and D. V etrov , “T ensorizing neural netwo rks, ” in Advances in Neural Information Proc essing Sys- tems 28 . Curran Associates, Inc., 2015. [79] V . Lebede v , Y . Ganin, M. Rakhuba, I. Oseledet s, and V . Lempit- sky , “Speedi ng-up con volutional neural networks using fine-tuned cp- decomposit ion, ” arX iv pre print arXiv:1412.6553 , 2014. [80] M. Rakhuba and I. V . Oseledets, “Fast multidimensional con vol ution in lo w-rank tensor formats via cross approximation , ” SIAM J ournal on Scient ific Computing , vol. 37, no. 2, pp. A565–A582, 2015. [81] J. D. Carroll and J. J . Chang, “ Analysis of indi vidual diff erence s in multidimensiona l scaling via an n-way generalizat ion of “Eckart- Young” decomposition , ” Ps ych ometrika , vol. 35, no. 3, pp. 283–319, 1970. [82] R. A. Harshman, “Founda tions of the P ARAF A C procedure: Models and condi tions for an “explan atory” multi-modal fa ctor analy sis, ” UCLA W orking P ape rs in Phonetics , vol. 16, no. 1, p. 84, 1970. [83] L. R. Tuck er , “Some mathematical notes on three-mode factor analy- sis, ” P sycho metrika , vol. 31, no. 3, pp. 279–311, 1966. [84] I. Osele dets, “T ensor-tra in decomposition, ” SIAM J . Sci . Comp. , v ol. 3 3, no. 5, pp. 2295–2317, 2011. [85] J. H ˚ astad, “T ensor rank is NP-complete , ” J. Algorithms , vol. 11, no. 4, pp. 644–654, 1990. [86] J. B. Kruskal, “Three-w ay arrays: rank and uniqueness of trilinear de- compositio ns, with applicati on to arithmet ic complexit y and statistics, ” Linear Algebra and its A pplicat ions , vol. 18, no. 2, pp. 95–138, 1977. [87] I. Oseledets and E . T yrtyshni ko v , “TT-cross approximat ion for mul- tidimensi onal arrays, ” Linear A lge bra and its Applications , vol. 422, no. 1, pp. 70–88, 2010. [88] S. Gandy , B. Recht, and I. Y amada, “T ensor completion and low-n-ran k tensor recove ry via con v e x optimizat ion, ” In ver se Probl ems , vol. 27, no. 2, p. 119, Jan. 2011. [89] J. Liu, P . Musialski, P . W onka, and J. Y e, “T ensor completion for estimati ng missing value s in visual data, ” IE EE T rans. P attern Anal. Mach ine Intellige nce , vol. 35, no. 1, pp. 208–220, Jan. 2013. [90] J. Douglas and H. Rachford, “On the numerical solution of heat con- duction problems in two and three space variabl es, ” T rans. American Math. Society , vol. 82, no. 2, pp. 421–439, Jul. 1956. [91] D. Gabay and B. Mercier , “ A dual algorithm for the solution of non- linea r variat ional problems via finite-eleme nt approximat ions, ” Comp. Math. Appl. , vol. 2, no. 1, pp. 17–40, Jan. 1976. [92] D. Kressner , M. Steinle chner , and B. V andere yck en, “Low-rank tensor completi on by Riemanni an optimiza tion, ” BIT Numer . Math. , vol. 54, no. 2, pp. 447–468, J un. 2014. [93] P .-A. Absil, R. Mahony , and R. Sepulc hre, Optimization algorithms on matrix manifolds . Princeton Uni versi ty Press, 2008. [94] S. Holtz, T . Rohwedder , and R. Schneider , “The alternatin g linear scheme for tensor optimiza tion in the tensor train format, ” SIAM J. Sci. Comput. , 2012. [95] P . Rai, Y . W ang, S. Guo, G. Chen, D. Dunson, and L. Carin, “Scalable Bayesia n low-ra nk decompositi on of incomplete multiw ay tensors, ” in Pr oc. Int. Conf. Mach ine Learning , 2014, pp. 1800–1809. [96] Q. Zhao, L. Zhang, and A. Cichocki, “Bayesian CP factoriza tion of incomple te tensors with automatic rank determination, ” IEE E T rans. P attern Anal. and Machine Intell ige nce , vol. 37, no. 9, pp. 1751–1763, 2015. [97] ——, “Bayesian sparse T uck er models for dimension reduct ion and tensor completion, ” , May 2015. [98] Z. Zhang, T .-W . W eng, and L. Daniel, “ A big-dat a approach to handle process variat ions: Uncerta inty quantificati on by tensor recov ery , ” in Pr oc. Int. W orkshop Signal and P ower Inte grit y , May 2016. [99] ——, “ A big-data approach to handle many process vari ations: tensor reco very and appli catio ns, ” IEE E T rans. Comp., P acka g . Manuf. T ech n. , submitted in 2016. [100] D. Xiu and G. E. Karniada kis, “The Wiener Askey polynomia l chaos for stochasti c differe ntial equations, ” SIAM J. Sci . Comp. , vol. 24 , no. 2, pp. 619–644, Feb . 2002. [101] D. Xiu and J. S. Hesthav en, “High-order colloca tion methods for dif feren tial equations with random inputs, ” SIAM J. Sci. Comp. , vo l. 27, no. 3, pp. 1118–1139, Mar 2005. [102] I. Babuˇ ska, F . N obile, and R. T empone, “ A stochastic collocation method for elliptic partial differen tial equati ons with random input data, ” SIAM J . Numer . Anal. , vol. 45, no. 3, pp. 1005–1034, Mar 2007. [103] F . Nobile, R. T empone , and C. G. W ebster , “ A sparse grid s tochast ic colloc ation method for partial differe ntial equations with random input data, ” SIAM J. Numer . Anal. , vol. 46, no. 5, pp. 2309–2345, May 2008. [104] G. H. Golub and J. H. W elsch , “Calculat ion of gauss quadrature rules, ” Math. Comp. , vol. 23, pp. 221–230, 1969. [105] H.-J. Bungartz and M. Griebel , “Sparse grids, ” Acta Numerica , vol. 13, pp. 147–269, 2004. [106] T . W ang, H. Liu, Y . W ang, and N. W ong, “W eakly nonlinear circuit analysi s based on fast multidimensiona l in v erse Laplace transform, ” in Pr oc. A sia South P acific Design Autom. Conf. , Jan. 2012, pp. 547–552. [107] H. L iu and N. W ong, “ Autonomous V olterra algorithm for steady- state analy sis of nonli near circuits, ” IEEE T rans. Comput.-Aide d Design Inte gr . Circuit s Syst. , vol. 32, no. 6, pp. 858–868, Jun. 2013. [108] Y . Zhang and N. W ong, “Compact model order reduction of weakly nonline ar s ystems by associated transform, ” Intl. J. Cir cuit Theory and Applicat ions , 2015. [109] H. Liu, L . Daniel , and N. W ong, “Model reduction and simulation of nonlinear circui ts via tensor decomposition, ” IEEE T rans. CAD of Inte gr . Circuit s Syst. , vol. 34, no. 7, pp. 1059–1069, Jul. 2015. [110] J. Deng, H. Liu, K. Batsel ier , Y . K. Kwok, and N. W ong, “STORM: a nonline ar model order reduct ion method via symmetric tensor decom- position, ” in Proc . Asia and South P acific Design Autom. Conf. , Jan. 2016, pp. 557–562. [111] E. Bedrosi an and S. O. Rice, “The output properti es of V olterra systems (nonline ar systems with m emory) dri ve n by harmonic and Gaussian inputs, ” Proc. IEE E , vol. 59, no. 12, pp. 1688–1707, Dec. 1971. [112] W . Rugh, Nonline ar System Theory – The Volterra-W iener A ppr oac h . Balti more, MD: Johns Hopkins Univ . Press, 1981. [113] H. Liu, X. Xiong, K . Batsel ier , L . Jiang, L. Daniel, and N. W ong, “ST A VE S: S peedy tensor-aide d vol terra-b ased electronic simulator , ” in Pr oc. Int. Computer -Aided Design , Nov 2015, pp. 583–588. [114] G. Favier , A. Y . Kibangou, and T . Bouilloc, “Nonlinea r system model- ing and ident ificati on using V olterra-P ARAF AC models, ” Int. J . Adapt. Contr ol Signal P r ocess , vol. 26, no. 1, pp. 30–53, Jan. 2012. [115] A. Khouaja and G. Favie r , “Identific ation of P ARAF AC-V olterra cubic models using an alternat ing recursi ve least squares algorithm, ” in Pr oc. Eur op. Signal Pr ocess. Conf . , 2004, pp. 1903–1906. [116] K. Batselier , Z. Chen, H. Liu, and N. W ong, “ A tensor-based Volterra series black-box nonline ar system identificati on and simulati on frame- work, ” in Pr oc. Intl. Conf . Computer Aided Design , 2016. [117] L. Benini, A. Bogliolo, G. A. Paleo logo, and G. De Micheli, “Policy optimiza tion for dynamic power management, ” IEE E T rans. CAD of Inte gr . Circuit s Syst. , vol. 18, no. 6, pp. 813–833, 1999. [118] D. V asilye v , M. Rewienski , and J. White, “Macromodel generation for BioMEMS component s using a stabilized balance d truncatio n plus trajec tory pie ce wise- linea r approach, ” IEEE T r ans. CAD of Inte gr . Cir cuit s and Syst. , vol. 25, no. 2, pp. 285–293, 2006. [119] W . Y u, T . Zhang, X. Y uan, and H. Qian, “Fast 3-D thermal simulation for integrat ed circui ts with domain de compositi on method, ” IEEE T rans. CAD of Inte gr . Circ uits Syst. , vol. 32, no. 12, pp. 2014–2018, 2013. ACCEPTED BY IEEE TRANSACTIONS ON COMPUTER -AIDED DESIGN OF INTEGR A TED CIRCUITS AND SYSTEMS, VOL. XX, NO. XX, XX 2016 16 [120] C. F . V . L oan, “The ubiquitous Kroneck er product, ” J. Comp. A ppl. Math. , vol. 123, no. 1-2, pp. 85–100, Nov . 2000. [121] L. Sorber , M. V . Barel, and L. D. Lathauwer , “Optimiz ation-b ased algorit hms for tensor decompositions: Canonica l polyadic decomposi- tion, decomposi tion in ran k- ( l r , l r , 1) terms, and a new generaliz ation, ” SIAM J. Optim. , vol. 23, no. 2, pp. 695–720, 2013. [122] P . Comon, X. Luciani, and A. L. F . de Almeida, “T ensor decompo- sitions, altern ating least squares and other tales, ” J. Chemometric s , vol. 23, no. 7-8, pp. 393–405, JUL -A UG 2009. [123] K. Batselie r , H. L iu, and N. W ong, “ A constructi ve algorithm for decomposing a tensor into a finite sum of orthonormal rank-1 terms, ” SIAM J. Matrix Anal. Appl. , vol. 26, no. 3, pp. 1315–133 7, Sep. 2015. [124] J. Salmi, A. Richter , and V . Koi vunen, “Sequent ial unfolding SVD for tensors with applicat ions in array signal processing, ” IEE E T ran s. Signal Pro cess. , vol. 57, no. 12, pp. 4719–4733, Dec. 2009. [125] L. D. Lathauwer , B. D. Moor , and J. V ande wal le, “ A multilinear singular value decomposit ion, ” SIAM J. Matrix Anal. Appl. , vol. 21, no. 4, pp. 1253–1278, 2000. PLA CE PHO TO HERE Zheng Zhang (M’15) recei v ed the Ph.D degree (2015) in Electrical Engineeri ng and Computer S ci- ence from the Massachuse tts Institut e of T echnology (MIT), Cambridge, MA. Currentl y he is a Postdoc Associate with the Research Laborato ry of Electron- ics at MIT . His research interests include uncertaint y quantific ation , tensor and model order reduction, with div erse enginee ring applicat ions includi ng na- noelec tronic s, energy systems and biomedical imag- ing. His industria l e xperie nces include Co vent or Inc. and Maxim-IC; acade mic visiting experie nces include UC San Diego, Bro wn Univ ersity and Politechni co di Milano; gov ernment lab experie nces include the Argonne National Laboratory . Dr . Zhang recei ve d the 2016 A CM Outstanding Ph.D Dissertation A ward in Electroni c Design Automation, the 2015 Doctoral Dissertation Seminar A ward (i.e., Best Thesis A ward) from the Microsystems T echnology Laboratory of MIT , the 2014 Donald O. Pederson Best Paper A ward from IEEE Transacti ons on Computer-Ai ded Design of Integrate d Circuits and Systems, the 2014 Chinese Governme nt A ward for Outstanding Students Abroad, and the 2011 Li Ka-Shing Prize from the Univ ersit y of Hong Kong. PLA CE PHO TO HERE Kim Batseli er (M’13) receiv ed the M. S. degree in Electro-Me chanic al E ngineer ing and the Ph.D. Degre e in E ngineer ing Science from the KULeuven, Belgiu m, in 2005 and 2013 respecti vely . He worked as a research enginee r at BioRICS on automated performanc e monitoring until 2009. He is currently a Post-Doctoral Research Fello w at The Univ ersity of Hong Kong since 2013. His current research intere sts include linear and nonlinear system the- ory/ide ntificat ion, algebra ic geometry , tensors, and numerica l algorithms. PLA CE PHO TO HERE Haotian L iu (S’11) recei ved the B.S. degree in Microel ectron ic Engineering from Tsinghua Univ er- sity , Beijing , China, in 2010, and the Ph.D. degree in Electroni c Engineering from the Unive rsity of Hong Kong, Hong Kong, in 2014. He is currently a software engineer with Cadence Design Systems, Inc. San Jose, CA. In 2014, Dr . L iu was a visiting scholar with the Massachuset ts Institute of T echn ology (MIT), Cam- bridge, MA. His research interests include numerical simulatio n methods for very-lar ge-scal e inte grati on (VLSI) circuits, model order reduction, parallel computatio n and static timing analysi s. PLA CE PHO TO HERE Luca Daniel (S’98- M’03) is a Full Professor in the E lectri cal Engineering and Computer Science Departmen t of the Massachusett s Institute of T ech- nology (MIT). He recei ve d the Ph.D . degree in E lec- trical Engineering from the Univ ersit y of Califor- nia, Berkel ey , in 2003. Industry experi ences include HP Research L abs, Palo Alto (1998) and Cadence Berk ele y Labs (2001). Dr . Daniel current research interests include inte- gral equati on solvers, uncerta inty quantificati on and paramete rized model order reduction, appli ed to RF circui ts, silicon photonics, MEMs, Magnetic Resonanc e Imaging scanners, and the human cardio v ascular system. Prof. Danie l was the re cipie nt of the 1999 IEEE T rans. on Powe r Electron ics best paper award; the 2003 best PhD thesis awards from the Electrica l Engineeri ng and the Applied Math departments at UC Berkele y; the 2003 A CM Out standin g Ph.D. Dissertati on A ward in Ele ctroni c Design Automatio n; the 2009 IBM Corporation Faculty A ward; the 2010 IEEE Early Career A ward in Electronic Design Automatio n; the 2014 IE EE Trans. On Computer Aided Design best paper award ; and sev en best paper awards in conference s. PLA CE PHO TO HERE Ngai W ong (S’98-M’02) recei ved his B.E ng. and Ph.D. degrees in Electri cal and Electro nic Engineer - ing from T he Uni ve rsity of Hong Kong, Hong Kong, in 1999 and 2003, respecti vely . Dr . W ong was a visiting scho lar with Purdue Uni ve rsity , W est Lafayett e, IN, in 2003. He is cur- rently an Associate Professor with the Departmen t of E lectri cal and Electroni c Engineering , The Uni- versi ty of Hong Kong. His current research interests include linear and nonline ar circuit m odeling and simulatio n, model order reduction , passivity test and enforce ment, and tensor-ba sed numeric al al gorithms in electronic design automati on (ED A).
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment