Learning Algebraic Models of Quantum Entanglement

LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT HAMZA JAFF ALI AND LUKE OEDING Abstract. W e review sup ervised learning and deep neural netw ork design for learning mem b ership on algebraic v arieties. W e demonstrate that these trained artiﬁcial neural net works can predict the entanglemen t t yp e for quan tum states. W e give examples for detecting degenerate states, as well as border rank classiﬁcation for up to 5 binary qubits and 3 qutrits (ternary qubits). 1. Introduction Recen t eﬀorts to unite Quantum Information, Quantum Computing, and Machine Learn- ing hav e largely b een centered on integrating quan tum algorithms and quan tum information pro cessing in to mac hine learning arc hitectures [ 9 , 55 , 65 , 83 – 85 , 87 , 99 , 100 , 103 ]. Our approac h is quite the opp osite – we lev erage Machine Learning tec hniques to build classiﬁers to distin- guish diﬀeren t t yp es of quan tum en tanglement. Machine Learning has b een used to address problems in quan tum physics and quan tum information, suc h as quan tum state tomograph y [ 81 ], quantum error correction co de [ 75 ] and wa v e-function reconstruction [ 5 ]. Here w e fo cus on quantum en tanglemen t. While we w ere inspired by the approac h of learning algebraic v a- rieties in [ 14 ], our metho ds diﬀer in that w e are not trying to ﬁnd in trinsic deﬁning equations of the algebraic mo dels for entanglemen t t yp es, but rather building a classiﬁer that directly determines the entanglemen t class. Distinguishing en tanglement types ma y b e useful for quan tum information pro cessing and in improving and increasing the eﬃciency of quantum algorithms, quan tum comm unication proto cols, quantum cryptographic schemes, and quan- tum games. Our metho ds generalize to cases where a classiﬁcation of all en tanglement t yp es is not kno wn, and cases where the n umber of diﬀerent classes is not ﬁnite (see for instance [ 59 , Ch.10], or [ 94 ]). W e only fo cus on pur e states for represen ting quantum systems, which is suﬃcien t for studying quan tum computations and quantum algorithms. This is opp osed to the noisy approac h with density matrices and mixed states, whic h is used when one needs to account for the noise and the in teraction with the en vironment [ 76 ]. 1.1. Basic notions. A basic reference for tensors is [ 59 ]. The quantum state of a particle can be represented by a unit v ector | ψ i in a Hilbert space H (t ypically H = C d or R d ), with basis {| x i | x ∈ J 0 , d − 1 K } in decimal notation. The state of an n -qudit quantum system is represen ted b y a unit v ector | ψ i in a tensor product H ⊗ n of the state spaces for eac h particle, where for simplicit y w e assume all particles of the same type. The tensor space H ⊗ n has basis | ij . . . k i := | i i ⊗ | j i ⊗ · · · ⊗ | k i , where | i i , | j i , . . . , | k i are basis elements of H . The dual vector space H ∗ is represented b y v ectors h ϕ | , and we use the standard Hermitian inner pro duct ( | ψ i , | ϕ i ) 7→ h ϕ | ψ i =: P i ϕ i ψ i ∈ C , where | ϕ i = P i ϕ i | i i , and similarly for ψ . Date : December 29, 2020. Key wor ds and phr ases. Quantum Entanglemen t, Classiﬁcation, Algebraic V arieties, Machine Learning, Neural Net works. 1 LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 2 The study of quantum en tanglement is often focused on orbits (equiv alence classes) under the action of the SLOCC (Sto chastic Lo cal Operations and Classical Comm unication) group, whic h algebraists kno w as G = SL( H ) × n , the cartesian pro duct of sp ecial linear groups, i.e. normal changes of co ordinates in each mo de. 1.2. First examples: separating states by algebraic inv arian ts. Consider the pair of binary state particles, called a 2-qubit system. Here there are only tw o diﬀeren t entanglemen t t yp es up to the SLOCC action, represen ted by | 00 i and 1 √ 2 ( | 00 i + | 11 i ). All other 2-qubit states can b e mov ed to one of these by the action of the SLOCC group [ 74 ]. Determining on whic h orbit a given state | ϕ i is (and hence its en tanglemen t t yp e), can b e done b y computing the 2 × 2 matrix determinan t: Set ϕ i,j := h ij | ϕ i . The v alue of det ( ϕ 00 ϕ 01 ϕ 10 ϕ 11 ) is either 0 or 1 2 if | ϕ i is resp ectively of t yp e | 00 i or 1 √ 2 ( | 00 i + | 11 i ) [ 74 ]. In particular, the non-general states liv e on a quadric h yp ersurface [ 16 ]. An Artiﬁcial Neural Net work classiﬁer (see Section 2.1 ) can also b e trained to test membership on these tw o types of states. F or 3-qubit systems, the rank (a n umerical inv arian t) and the determinan t (a p olyno- mial inv ariant) generalize to the multilinear rank and the Ca yley hyperdeterminant [ 74 ], resp ectiv ely . A giv en state | ϕ i ∈ H ⊗ 3 = C 2 ⊗ C 2 ⊗ C 2 has co ordinates ϕ ij k = h ij k | ϕ i . The three 1-ﬂattenings are as follo ws: F 1 ( | ϕ i ) = ( ϕ ij k ) i,j k , F 2 ( | ϕ i ) = ( ϕ ij k ) j,ik , F 3 ( | ϕ i ) = ( ϕ ij k ) k,ij . The ranks of these ﬂattenings comprise a v ector called the multiline ar r ank . No ﬂattening has rank 0 since that only occurs if | ϕ i = 0, but | ϕ i is a unit v ector. If the m ultilinear rank is (1 , 1 , 1), then | ϕ i is separable. If the m ultilinear rank is ( 1 , 2 , 2) then | ϕ i is a bi-separable state of the form 1 √ 2 ( | 000 i + | 011 i ) up to SLOCC. The other bi-separable states are permutations of this up to SLOCC. Finally , if the m ultilinear rank is (2 , 2 , 2), then up to a SLOCC transformation, | ϕ i is either the so called W-state 1 √ 3 ( | 001 i + | 010 i + | 100 i ) or a general p oin t 1 √ 2 ( | 000 i + | 111 i ) [ 46 , 74 ]. These states are distinguished, resp ectively , b y the v anishing or non-v anishing of the well-kno wn SLOCC-in v arian t, the 2 × 2 × 2 h yp erdeterminan t: ∆ 222 ( x ) = x 2 000 x 2 111 + x 2 011 x 2 100 + x 2 010 x 2 101 + x 2 001 x 2 110 − 2 x 000 x 011 x 100 x 111 − 2 x 000 x 010 x 101 x 111 − 2 x 000 x 001 x 110 x 111 + 4 x 000 x 011 x 101 x 110 − 2 x 010 x 011 x 100 x 101 − 2 x 001 x 011 x 100 x 110 − 2 x 001 x 010 x 101 x 110 + 4 x 001 x 010 x 100 x 111 . These are all the p ossible types of states up to SLOCC for 3-qubit systems, and w e ha ve a complete algebraic description of these states as w ell. So, this is a go o d testing ground for other methods since we know how to test ground truth algebraically . In Section 4.2 w e summarize the p erformance of a neural netw ork for separating these states. In the 4-qubit case there is still a classiﬁcation [ 93 ], and a complete set of inv arian ts that can be used to classify en tanglemen t t yp es is kno wn [ 47 , 48 ]. In addition, algebraic in v ariants for all b order ranks for 4-qubit systems are known classically (originally studied by Segre [ 86 ]) and are given b y the minors of 1- and 2- ﬂattenings [ 59 , Ch. 7.2]. W e note that the 2 × 2 × 2 × 2 hyperdeterminant has degree 24, and is computable b y Sc hlaﬂi’s metho d, [ 33 ]. Ho wev er, in the 5-qubit case the 2 × 5 h yp erdeterminant has degree 128, is not computable by Sc hlaﬂi’s metho d, [ 97 ], and is surely very complicated as indicated by the 4-qubit case [ 51 ]. Ho wev er, in Section 3.2 w e show that neural net w orks can distinguish b etw een singular and non-singular states ev en though algebraic inv arian ts are likely to fail. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 3 1.3. Prior W ork and Outline. Machine Learning has b een used to study en tanglemen t detection, entanglemen t measurement, and entanglemen t classiﬁcation. A common fo cus is the density matrices formalism, for whic h the asymptotic problem of deciding separabilit y is a NP-hard problem. Neural net w orks hav e b een used for estimating en tanglemen t measures suc h as logarithmic negativity , R ´ en yi en trop y or concurrence in 2-qubits (pure and mixed) or many-bo dy systems [ 8 , 36 , 37 ]; for enco ding several CHSH (witness) inequalities simul- taneously in a netw ork to detect en tangled states [ 70 ]; for computing the closest separable state in a complex v alued net work [ 17 ]; for recognizing the en tanglemen t class of 3-qubit systems [ 6 ]; and for detecting entanglemen t in families of qubits and qutrits systems in the bipartite case [ 101 ]. A con v ex h ull approximation metho d com bined with decision tree and standard ensemble learning (bagging) algorithms w as used in [ 66 ] to classify separable and en tangled states. The forest algorithm (also using decision trees) was used in [ 95 ] to detect en tanglement and w as compared to quan tum state tomography (up to ﬁv e qubits). Princi- pal Comp onen t Analysis (PCA) was used to determine the dimension and in trinsic deﬁning equations of certain algebraic v arieties [ 14 ]. In the 2-qubit case neural netw orks, supp ort v ector mac hines, and decision trees w ere used to detect non-classical correlations such as en tanglement, non-locality , and quantum steering [ 105 ]. Indeed, Machine Learning can b e a relev an t to ol for en tanglement detection, at least in some limited cases. Ho wev er, prior studies w ere mostly limited to the 2-qubit or bipartite case, b ecause of the p ossibilit y of generating prop erly separable and entangled mixed states using the PPT criterion or Sc hmidt decomp osition. It seems diﬃcult to generalize the prior sup ervised approaches to higher dimensions or to systems with more than tw o particles. Instead, w e fo cus on the problem of en tanglemen t detection and classiﬁcation b y learning algebraic v arieties (Segre v ariet y , dual v ariet y , secant v arieties) that c haracterize diﬀerent en tanglement classes for pure states. Our metho d can b e generalized to higher dimensions and systems with several particles, bringing original to ols for distinguishing non-equiv alen t en tanglement classes for quantum systems for whic h we do not kno w the complete clas- siﬁcation or we do not hav e exact algorithms (as prop osed by [ 46 – 48 ]) to determine the en tanglement t yp e (as it is the case for 5-qubit systems for instance). As noted in [ 14 ] there are man y instances of high degree hypersurfaces (lik e the hypersur- face of degenerate states in H ⊗ n ) that are easy to sample, but there is little hop e to learn their equations from samples, so prior metho ds do not apply . Y et, in Section 3.2 we giv e cases where an artiﬁcial neural net work can b e trained to determine mem b ership on these high degree h yp ersurfaces, even without the inaccessible deﬁning equation. In Section 2 , w e in v estigate sev eral arc hitectures of feed-forw ard neural net w orks for learn- ing mem b ership on algebraic v arieties, b y using previous knowledge about the deﬁning equa- tions to design net works. In Section 3 , we study several examples by building classiﬁers with neural netw orks for the cases of qubits and qutrits. Finally , in Section 4 , we use these classiﬁers to distinguish quantum en tanglement classes. 2. Network design f or learning algebraic v arieties 2.1. Basics of Artiﬁcial Neural Net w orks. F or a complete introduction to Artiﬁcial Neural Netw orks and related concepts, see any of [ 2 , 3 , 39 , 40 , 56 , 63 , 92 , 96 , 98 ]. Here we giv e a brief ov erview. Inspired by biological neural netw orks, Artiﬁcial Neural Net w orks (ANN) are computing systems whose goal is to reproduce the functionality and basic structure of the human brain [ 71 ]. An artiﬁcial neural netw ork is comp osed of sev eral artiﬁcial neurons, LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 4 regroup ed using a sp eciﬁc arc hitecture, whic h is designed to b e trained to p erform a sp eciﬁc task (such as classiﬁcation and regression) without an y explicit instructions or rules [ 40 ]. In 1943, McCullo ch and Pitts prop osed the ﬁrst mo del of an artiﬁcial neuron [ 71 ]. An artiﬁcial neuron (Figure 1 ) is deﬁned by inputs (even tually coming for other neurons), by w eights (synaptic w eigh ts for each input), a w eigh ted sum (computed from the tw o previous ones), a threshold (or bias), an activ ation function, and an output [ 2 , 3 , 39 , 40 , 63 ]. 𝑤 1 𝑥 1 𝑤 2 𝑥 2 𝑤 𝑛 𝑥 𝑛 ∑ 𝑥 𝑗 𝑤 𝑗 𝑛 𝑗 = 1 𝑦 Input s W eigh t s Sum Act i v a ti on Fun c ti on Thr es hold Output Figure 1. Illustration of an artiﬁcial neuron. The output y = g ( U ) of the neuron is equal to the activ ation function g applied to the weighte d sum (with the threshold θ frequently added to the weigh ted sum with weigh t equal to 1) [ 25 ]. If we denote b y x 1 , . . . , x n the inputs of the neuron, and b y w i the w eight associated with the i -th input, then the weigh ted sum, frequently denoted by U , is deﬁned as follo ws U = w 1 x 1 + w 2 x 2 + · · · + w n x n + θ . (1) W e implemen ted neural net works in Python using the Keras library [ 21 ] with the Nadam solv er for minimizing the loss function (either binary or categorical cross-entrop y) in the learning step [ 28 , 91 ]. This setup allo ws ﬂexible implemen tation of feed-forw ard neural net- w orks and the c hoice of parameters (architecture, activ ation functions, loss function, etc.). Geometrically , if one can ﬁnd a h yp erplane (co dimension 1) that separates the data into t wo classes (suc h as the logical OR function), then the binary classiﬁcation problem can b e solv ed using a single artiﬁcial neuron [ 23 , 39 , 58 , 96 ]. On the other hand, we are in terested in cases where there is not a separating h yp erplane, (suc h as in the case of the logical XOR function [ 11 , 12 , 39 ]). So, we use more complex structures, suc h as the Multi-Lay er P erceptron (MLP) mo del [ 3 , 40 ]. W e used the feed-forward dense lay er conﬁguration. Once activ ation and loss functions are ﬁxed, the remaining parameters to conﬁgure are the n umber and size of the la y ers. The Universal Appr oximation The or em and related studies [ 11 , 24 , 25 , 38 , 39 , 50 , 67 ], sho w that ANNs can approximate an y reasonable function. Ho wev er, the question of pro viding the b est netw ork, in terms of accuracy and computational eﬃciency , for a giv en task or problem, is still op en. The c hoice of the depth and the width of neural net works are often c hosen by trial and error, considering the trade-oﬀ b et ween performance and computational cost. 2.2. Learning algebraic v arieties. Here w e present a mo del of ANN for learning mem- b ership on algebraic v arieties mo deling quan tum en tanglemen t. An algebraic v ariet y is a geometric ob ject deﬁned as the zero lo cus of a set of homogeneous p olynomials. In or- der to teach a mac hine ho w to recognize p oints on algebraic v arieties, w e m ust encode the p olynomial deﬁning equations of these ob jects (or an appro ximation of them) in to the learn- ing mo del. W e w ould lik e to do this as eﬃciently as p ossible (using the least n umber of parameters) to a void o v erﬁtting. It is an instructive exercise to show that ANNs can b e trained to recognize (determine mem b ership) points on linear spaces (of an y co-dimension) essen tially b y linear interpolation; LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 5 see [ 82 ], or an y of [ 2 , 11 , 12 , 29 , 30 , 34 , 40 , 56 , 58 , 63 ]. T o classify p oin ts on algebraic v arieties one can also use ANNs as an alternativ e to p olynomial interpolation. The so-called Polynomial Neural Net w orks (PNN) and w ere prop osed in [ 77 ]. See also an y of [ 11 , 29 , 39 , 52 , 58 , 88 – 90 , 98 ]. Let us collect some well-kno wn results in algebraic geometry relev an t to mo deling p oly- nomials with ANNs. F or simplicit y w e focus on mo deling a single p olynomial. F or sev eral p olynomials one may design a neural net w ork using a cop y of the one w e describ e here for eac h p olynomial. One wa y to determine a p olynomial is to calculate the co eﬃcien ts for ev ery monomial. One might exp ect that this would not b e the most eﬃcient w ay to repre- sen t polynomials that may ha ve additional structure, such as sparseness. A sum of p ow ers represen tation has a c hance to exploit hidden sparseness or other structure. What shap e of ANN with p o wer-function activ ations is necessary to mo del a p olynomial equation of degree d in n v ariables? Suc h an ANN essen tially accomplishes the task of writing the p olynomial as a sum of p ow ers of linear forms, and it is tightly link ed (by ap olarit y) to in terp olating p olynomials. In this case a straightforw ard dimension coun t gives a goo d guess for the appropriate arc hitecture. The Alexander-Hirsc ho witz (AH) Theorem (see [ 13 ] for a mo dern treatmen t) tells us when the naive dimension count fails. In fact, the AH theorem states that a general 1 homogeneous p olynomial p of degree d in n v ariables can b e expressed as the sum of T = d 1 n  d + n − 1 d  e d -th pow ers of linear forms (except for quadratic forms and a few other sp ecial cases), i.e. p ( x 1 , x 2 , . . . , x n ) = T X j =1 n X i =1 a ij x i ! d . Dehomogenizing the AH result (b y setting the last v ariable to 1, for example) one obtains a b ound for 2-lay er neural netw orks mo deling aﬃne hypersurfaces. W e implement T = d 1 n  d + n − 1 d  e neurons with activ ation function g : x 7→ x d in the ﬁrst lay er, and then combine all the outputs in a linear combination using a neuron in the second la yer. Supp ose w e ha ve n inputs corresp onding to the n v ariables of the homogeneous p olynomial p . If w e denote by w i,j the weigh t asso ciated with the i -th input x i and the j -th neuron in the ﬁrst la y er, and b y θ j the threshold of the j -th neuron, then the output s j of the neuron j in the ﬁrst la y er is s j = g ( U j ) = n X i =1 w i,j x i + θ j ! d . The threshold θ j in tro duces an inhomogeneit y whic h w e can remo ve by adding an extra v ariable x n +1 , replacing θ j with θ j x n +1 . Then the AH theorem for n + 1 v ariables, with x n +1 = 1 yields the b ound for non-homogeneous outputs. This idea also app ears in [ 57 ]. The last step of the net w ork design is classiﬁcation: is the input p oint on (or outside) the v ariety deﬁned by the equations mo deled by the netw ork? Note that after the tw o ﬁrst la yers, we should obtain a v alue s (whic h is 0) when the input p oint is on the v ariet y , and a diﬀeren t v alue if the p oint is not. So, this step is equiv alen t to recognizing a real n umber s in an in terv al (whose size dep ends on the training data). Adding a single sigmoid neuron will not solv e the classiﬁcation problem b ecause single neurons only solv e inequalities, not equalities [ 34 , 58 ]. So, we add another la y er before the output lay er to recognize this sp eciﬁc v alue s , and then solv e this binary classiﬁcation problem at the output of the netw ork. 1 By “general” w e mean av oiding a measure-zero set of p ossible counterexamples. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 6 The task of recognizing a real num b er in an in terv al can b e performed using a single lay er with four neurons with LeakyReLU activ ation functions. Thus, b y adding such a lay er after the ﬁrst t w o la yers (mo deling the equations of the v ariety), and b y adding a last la yer (output la yer) with only one neuron (with a sigmoid activ ation function), one can p oten tially learn an y algebraic v ariet y deﬁned b y a set of homogeneous p olynomials (Figure 2 ). c     Input s Fi r s t la y e r O u tp u t la y er Sec ond la y er         c   c   c    󰇛  󰇜     󰇛  󰇜  󰇛  󰇜  󰇛  󰇜  󰇛  󰇜  󰇛  󰇜 Th ir d la y er Figure 2. Represen tation of the netw ork solving the binary classiﬁcation problem related to algebraic v ariet y’s membership. F or practical implemen tations it is imp ortant to kno w if w e can correctly train the net work, and if the optimizer can ﬁnd the right weigh ts and thresholds. F or homogeneous equations of low degree and few v ariables, the netw ork is able to learn the correct weigh ts, and one can reco ver the equation from the weigh ts of the net work. Ho wev er, when the degree and the n um b er of v ariables increase, the num ber of neurons needed in the ﬁrst la yer increases v ery quic kly and it b ecomes harder to conv erge to a set of w eigh ts and thresholds that model the desired equation. T o o v ercome this, in Subsection 2.3 we slightly mo dify the netw ork structure to reduce the num b er of weigh ts and parameters. F or more information on the expressiv e p o wer of deep neural netw orks, see [ 57 ]. R emark 2.1 . The Univ ersal Appro ximation Theorem leads one to ask ho w training a neural net work is diﬀerent from p olynomial ﬁtting. Indeed, neural netw orks are know to b e essen- tially p olynomial regression mo dels, with the eﬀectiv e degree of the polynomial gro wing at eac h hidden lay er [ 19 ]. The corresp ondence b et ween the v alues of the function to be interpo- lated, the basis functions and the basic points on one hand, and the w eigh ts, the activ ations functions and the thresholds (bias) on the other hand can b e made explicit [ 62 ]. Ho wev er, in practical applications, data is often noisy and incomplete, and p olynomial interpolation is generally sub ject to ov erﬁtting [ 102 ], while neural net works are able to p erform when there are noisy or incomplete data, and hav e the ability to generalize from the input data [ 78 ]. This dilemma b etw een ﬁtting the training dataset and being able to generalize the mo dels to non-encountered data is known as the bias-v ariance trade-oﬀ [ 7 ]. Raturi explains in [ 82 ] that solving the same problem as p olynomial interpolation requires muc h less computational time and resources when using neural net w orks. It is also kno wn that neural net w orks can in terp olate and model a function via sigmoidal functions to approximate n samples in any dimension, with arbitrary precision and without training [ 64 ]. 2.3. Hybrid net w orks. Here we in tro duce a hybrid net work arc hitecture for classifying mem b ership on a parametrized algebraic v ariet y and discuss its adv an tages. T raining an ANN, which is an optimization pro cess, doesn’t alw a ys reac h the set of w eigh ts that minimize the loss function. This is due to the existence of local minima, related to the utilization of the non-conv ex x 7→ x d activ ation functions. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 7 The second la y er (Figure 2 ) con tains only one neuron (with the identit y activ ation func- tion), in tro duced to take the linear combination of d -th p o wer of w eighted sums of input v ariables. One w ould like to remov e this la yer, and directly link outputs of the ﬁrst la yer with the third lay er (containing only neurons with LeakyReLU activ ation function). W e call these hybrid networks , b ecause they combine lay ers with both x 7→ x d and LeakyReLU activ ation functions. Moreo ver, the geometric in terpretation of the netw ork is no w diﬀeren t since every neuron in the second la yer (LeakyReLU neurons) tak e as input a diﬀerent linear combination of all d -th p ow er forms. The second la yer com bines diﬀeren t homogeneous p olynomials that are not necessarily equal to the homogeneous p olynomial deﬁning the algebraic v ariet y , but they can b e used to approximate this last as a set of inequalities. The third and last la yer of the net work is the output la y er, which will dep end on the classiﬁcation problem one w ants to solv e (binary classiﬁcation, sev eral classes, etc.). The arc hitecture of the h ybrid netw ork is summarized in Figure 3 . c     I npu t s Fir s t la y er Out put l a y er Sec ond l a y er         c   c   c      󰇛  󰇜    󰇛  󰇜    󰇛  󰇜     󰇛  󰇜 Figure 3. Represen tation of a hybrid net work for learning a homogeneous p olynomial equation of degree d in n v ariables and co eﬃcients in R . This type of netw ork show ed b etter results and quick er learning than the previous version of the net work. The AH theorem still can b e used to give an idea ab out the num b er of neurons in the ﬁrst lay er, and sometimes extra neurons should b e added or remov ed to b o ost the performance of the netw ork during the learning phase. By exp erience, the num ber of neurons in the second lay er can b e chosen quite small (b etw een 4 and 10 for binary classiﬁcation problems) and will dep end on the num b er of neurons in the output la y er (it migh t not b e smaller than the n umber of neurons in the output la yer). 3. Experiments Consider a vector space V and a group G ⊂ GL( V ). The set of all such pairs ( V , G ) for whic h G acts on V with ﬁnitely man y orbits has b een classiﬁed b y Kac [ 26 , 54 ]. Since then Vinberg and others ha ve classiﬁed all the orbits in many of these cases. There are sp ecial cases where G acts on V with inﬁnitely many orbits, y et those orbits may still b e represen ted using ﬁnitely man y parameters, the tame case. A fantastic example of suc h is [ 94 ], whic h ga v e a classiﬁcation of the orbits of 9 dimensional triv ectors utilizing a connection to the Lie algebra e 8 . This classiﬁcation also implies a classiﬁcation for 3 qutrit systems among others. After orbit classiﬁcation, one desires eﬀective metho ds to determine orbit mem b ership. Separating orbits is diﬃcult in general, y et this mathematical problem lies at the core of geometric approaches to V allian t’s version of P versus NP [ 60 ]. One approac h is to use In v ariant Theory to build a set of in v ariants and co v ariants that c haracterize eac h orbit, or each family represented b y parameters [ 46 , 68 ]. In the four-qubit LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 8 case, an inﬁnite tame case, V erstraete et al. ga ve a list of 9 normal forms (dep ending on parameters) that give a parametrization of all SLOCC orbits [ 22 , 93 ]. An algorithm to determine orbit mem b ership for the four-qubit case was prop osed in [ 47 , 48 ]. In general, the complexit y of these inv arian ts grows v ery rapidly with the n umber of particles in the system. Already in the ﬁve-qubit case, a complete description of the algebra of SLOCC-inv arian t p olynomials is out of reach of any computer system [ 69 ]. Therefore, it is worth considering other approac hes. In this section we give examples that serv e as pro of of the concept that artiﬁcial neural netw orks can b e trained to giv e eﬃcien t and eﬀective classiﬁers for quan tum states. 3.1. Detecting separable states. Here w e fo cus the real case, and assume H = R d . A state | ϕ i ∈ H ⊗ n is sep ar able if it can be written as a tensor pro duct of pure states as follo ws | ϕ i = | ϕ 1 i ⊗ · · · ⊗ | ϕ n i , for | ϕ i i ∈ H for all i [ 76 ]. Algebraic geometers call the pro jectiv e v ariety of separable states the Segre v ariety , which is parametrized as: Seg : P H × · · · × P H → P H ⊗ n ([ x 1 ] , [ x 2 ] , . . . , [ x n ]) 7→ [ x 1 ⊗ x 2 ⊗ . . . ⊗ x n ] , where the square brac k ets [ · ] denote the equiv alence class by rescaling, which mak es physical sense to consider since we assume states are unit v ectors [ 46 ]. The following is straightforw ard, essen tially kno wn to Segre in the early 1900’s. Prop osition 3.1. A state | ϕ i ∈ H ⊗ n is sep ar able if and only if either of the fol lowing e quivalent c onditions hold. (1) | ϕ i is in the SLOCC orbit of | 00 · · · 0 i . (2) Al l n of the 1-ﬂattenings F i ( | ϕ i ) : H ⊗ ( n − 1) → H have r ank 1. Separable quantum states ha ve no quantum en tanglement. Detecting separable states th us b ecomes equiv alent to detecting quan tum en tanglemen t. Therefore, one w ay to determine if a state is separable is to compute the compact form of the SVD of ev ery 1-ﬂattening: if an y 1-ﬂattening has more than 1 signiﬁcan t singular v alue, the state is not separable. T o generate training data, w e ﬁrst sample the space of separable states b y pushing uniform samples on a pro duct of spheres through the Segre map: S H × · · · × S H → S H ⊗ n , where S ( · ) denotes the states of unit norm. The class of separable states is lab eled as the class ‘0’ and represents 50% of the training dataset. The other half is the class of entangled states, lab eled ‘1’, and it is constructed by generating random tensors of rank greater than 2, b y summing at least 2 random rank 1 tensors. Note that b y Prop osition 3.1 to certify the training set one can chec k the multilinear rank of the purp orted rank ≥ 2 tensors to ensure they do not hav e rank 1 (and hence would b e mis-classiﬁed). 3.1.1. Networks for sep ar able states. The set of separable states is the zeroset of the 2 × 2 minors of 1-ﬂattenings, and thus is deﬁned by a set of homogeneous polynomials of degree 2. In the 2 × 2 case, there is only one deﬁning equation, the 2 × 2 determinan t. In higher dimensions, sev eral equations of degree 2 deﬁne the Segre v ariety . T o select the num b er of equations, we tak e (at least) the co dimension of the Segre v ariet y in the pro jectiv e space. F or eac h quadratic equation, w e can use the AH dimension count to deduce the n umber of needed neurons, and then to design the Hybrid Net works. In some cases, w e c hose to LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 9 add several extra neurons in the ﬁrst la y er (with quadratic activ ation function) to accelerate the learning process. As explained b efore, the second lay er will only contain neurons with LeakyReLU activ ation functions. The last lay er will con tain only one neuron, with a sigmoid activ ation function in this case (b ecause of the binary classiﬁcation problem). W e also implemented netw orks with only LeakyReLU activ ation functions neurons (ex- cepted for the last lay er). F or all the considered cases, w e used the same netw ork arc hitecture, whic h is comp osed of 4 ﬁrst lay ers and a last output lay er. W e c hose to implemen t a de- creasing structure in the netw ork, with 100 neurons in the ﬁrst la y er, 50 in the second, 25 in the third, and 16 in the fourth. F or eac h case the arc hitecture of the net work is pro vided in the second column of the results tables. 3.1.2. R esults. In T able 1 w e present the result for the 2 × 2, the 2 × 2 × 2, the 2 × 4 , and the 3 × 3 × 3 cases with h ybrid net w orks. In the 2 × 2, the 2 × 2 × 2, and the 3 × 3 × 3 cases, w e used a training dataset of size 56200, a v alidation dataset of size 12800, and a testing dataset of size 32000. In the 2 × 4 case, w e doubled the size of the training and the v alidation datasets only (same size for the testing one). W e reached an av erage accuracy of 93% on the testing datasets for separability classiﬁcation. T ensor size Arc hitecture T raining acc. V alidation acc. T esting acc. Loss 2 × 2 (4,4,1) 96.65% 96.60% 96.63% 0.092 2 × 2 × 2 (21,8,1) 94.57% 94.06% 94.44% 0.15 2 × 4 (1188,8,1) 91.72% 91.60% 91.33% 0.26 3 × 3 × 3 (332,12,1) 94.68% 92.89% 92.94% 0.15 T able 1. Hybrid netw ork architectures and accuracies for eac h tensor size, for separability classiﬁcation. W e used LeakyReLU net w orks to study the same cases as w e did with the h ybrid net w orks, with the addition of the 2 × 5 one, and w e regroup ed the results in T able 2 . In the 2 × 2 and 2 × 2 × 2 cases, w e used a training dataset of size 102400, a v alidation dataset of size 25600, and a testing dataset of size 32000. In the 2 × 4 , 2 × 5 and 3 × 3 × 3 cases, we used a training dataset of size 502600, a v alidation dataset of size 55600, and a testing dataset of size 32000. W e reached an av erage accuracy of 98% on the testing datasets for separabilit y classiﬁcation. These results sho w that ANNs can b e trained to distinguish b et ween separable and en tangled states for small to mo derate sized tensors. T ensor size Arc hitecture T raining acc. V alidation acc. T esting acc. Loss 2 × 2 (100,50,25,16,1) 98.92% 98.78% 98.83% 0.043 2 × 2 × 2 (100,50,25,16,1) 97.80% 97.42% 97.55% 0.074 2 × 4 (100,50,25,16,1) 99.62% 99.50% 99.53% 0.016 2 × 5 (100,50,25,16,1) 98.83% 98.55% 98.55% 0.037 3 × 3 × 3 (100,50,25,16,1) 98.55% 98.01% 97.92% 0.051 T able 2. LeakyReLU net w ork arc hitectures and accuracies for eac h tensor size, for separabilit y classiﬁcation. 3.1.3. Complexity and eﬃciency. F rom an algebraic geometric p oint of view, determining if a tensor is on the Segre v ariety is equiv alen t to deciding if it is a rank one tensor, and this task can b e p erformed using truncated SVD for eac h ﬂattening. The complexity of computing the singular v alues of an m × n matrix is O ( mn min { m, n } ). This is roughly LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 10 min { m, n } times the complexity of reading the matrix. Computing the singular v alues of the i -th ﬂattening of a tensor of format n 1 × · · · × n t has complexity O  n i ( n 1 · · · n t n i ) min { n i , ( n 1 · · · n t n i ) }  = O  ( n 1 · · · n t ) min { n i , ( n 1 · · · n t n i )  . F or “balanced” cases 2 n i ≤ P i n i , thus this complexity is at most O (( n 1 · · · n t ) n i ) , whic h is roughly n i times the complexit y of reading the tensor. Now computing all of the singular v alues of the tensor (in a naive wa y , not informing one computation by the results of another) one w ould exp ect complexity O (( n 1 · · · n t )( P i n i )). The cost of ev aluating a trained netw ork is equal to the cost of a feed-forw ard propagation, which needs to compute all the weigh ted sums (which is equiv alent to compute matrix multiplications) and ev aluate the activ ation function for each neuron. Hence, this complexit y dep ends on the architecture of the net w ork. If w e denote b y l the n umber of la yers (without counting the input la y er) and by m 1 , m 2 , . . . , m l resp ectiv ely the num b ers of neurons in eac h lay er, it has a complexit y roughly equal to O (( n 1 · · · n t ) m 1 + m 1 m 2 + · · · + m l − 1 m l ). So, as the dimensions of the tensor spaces increase, the cost of ev aluating an already trained netw ork should b e more eﬃcien t than the computation of the SVD of ﬂattenings. Moreo ver, ﬂattenings cease to detect tensor ranks ab ov e the size of the ﬂattening, whereas an ANN could be trained to detect higher ranks. Though the cost of training the netw ork cannot b e ignored, it is a one-time cost, and the trained net work can b e used for eﬃcien t computation for an y num b er of tests. In addition, once the tensor rank is larger than the dimension of any of the factors, there are few eﬀectiv e metho ds for determining rank, so a trained ANN classiﬁer may pro vide v aluable insight. 3.2. Detecting degenerate states. F or matrices, the opp osite of r ank-1 is c o-r ank-1 . Re- call that a matrix A ∈ R m × n is de gener ate (rank deﬁcien t or co-rank-1) if and only if any of the following equiv alent conditions hold: (1) There are non-zero vectors u ∈ R m and v ∈ R n suc h that u > Ax = 0 for all x ∈ R n and y > Av = 0 for all y ∈ R m . (2) In the case m = n , the determinant v anishes: det( A ) = 0. (3) Up to Gaussian elimination A has a zero row and a zero column. These conditions carry o ver in the tensor setting, follo wing [ 33 ]. A state | ϕ i ∈ H ⊗ n is de gener ate if one of the following equiv alent conditions hold: (1) There is a pure state x = | x 1 i ⊗ · · · ⊗ | x n i ∈ H ⊗ n suc h that the con traction ( h x 1 | ⊗ · · · ⊗ h x i − 1 | ⊗ h h i | ⊗ h x i +1 | ⊗ · · · ⊗ h x n | ) | ϕ i = 0 for all i and for all | h i i ∈ H (2) The hyperdeterminant v anishes: Det( A ) = 0, where A is the hypermatrix satisfying A I = h I | ϕ i . (3) There is an SLOCC equiv alent state with co ordinates ϕ I = 0 for all I of Hamming distance ≤ 1 a wa y from 00 · · · 0. The h yp erdeterminan t is a SLOCC inv arian t polynomial, and th us is used to characterize quan tum en tanglement. The zero lo cus of the equation deﬁned by the h yp erdeterminant also deﬁnes the dual of the Segre v ariety [ 74 ], and the study of singularities of the asso ciated h yp ersurfaces giv es also a qualitative information ab out quan tum en tanglement [ 42 , 45 ]. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 11 The hyperdeterminant can also b e regarded as a quan titative measure of entanglemen t and can b e used to study entanglemen t in quan tum algorithms for example [ 44 , 53 ]. In a concrete sense the non-degenerate states are the most entangled ones and maximizing the absolute v alue of the h yp erdeterminan t can lead to maximally entangled states [ 18 , 35 , 43 ]. While the h yp erdeterminant is a polynomial test for degeneracy , it has v ery high degree and rarely feasible to compute apart from the smallest cases. W e note that one of us show ed that hyperdeterminants coming from the ro ot system E 8 can b e computed eﬃcien tly [ 49 ]. W e wish to sho w that w e can still learn degenerate states ev en in cases where an expression or ev aluation metho d for the hyperdeterminant is not known explicitly . 3.2.1. A note on uniform sampling. The SLOCC description of degenerate states gives a w ay to uniformly sample the set. W e tak e a random state | ϕ i with co ordinates ϕ I = 0 for all I of Hamming distance ≤ 1 a wa y from (00 · · · 0), we renormalize so that || ϕ || = 1, and then w e apply a random element of the SLOCC group. The degenerate states form the half of the training data and are lab eled as the class ‘0’. On another hand, we generate random tensors in H ⊗ n to build the second half of the training dataset, whic h corresp ond (with high probabilit y 2 ) to non-degenerate states lab eled as the class ‘1’. When an analytical method for ev aluating the h yperdeterminant is av ailable, w e used it to remo ve all the noise from the training data (remo ve all random states that are not non-degenerate). 3.2.2. Networks for de gener ate states. The 2 × n h yp erdeterminant is a homogeneous p olyno- mial whose degree is 4 , 24 , or 128 respectively for n = 3 , 4 , or 5, and its degree is 36 in the 3 × 3 × 3 case (see [ 33 ] for concrete degree formulas). In the 2 × 2 × 2 case, the ﬁrst lay er of the h ybrid netw ork m ust contain at least d 1 8  4+8 − 1 4  e = 42 neurons with x 7→ x 4 activ ation functions. W e chose to implemen t 60 neurons in the ﬁrst la yer for the 3-qubit case. In the tw o other cases, the num b er of needed neurons is w ay to o high to hop e for an implementation using Keras. One can still use few er neurons to appro ximate the hyperdeterminant using h ybrid net works, and ev en if the accuracy will not b e v ery close to 100%, on can still reach 60% or more and use the idea presented in Section 4.1 to hav e a b etter ov erall accuracy . The second p ossibilit y is to use only LeakyReLU activ ation functions (except for the output la yer), as it w as done in Subsection 3.1 . W e also used the same decreasing structure with four lay ers, except in the 4-qubit case, when we chose to double the num b er of neurons in the three ﬁrst lay ers. The 2 × 2 non-degenerate states are in fact non-separable states. This case is thus already solv ed by the 2 × 2 separabilit y classiﬁers presented in Section 3.1 . 3.2.3. R esults. In T able 3 we present the result for the 2 × 2 × 2 case with the hybrid netw ork. W e used a training dataset of size 202600, a v alidation dataset of size 25600, and a testing dataset of size 32000. W e reached 92% of accuracy for the testing dataset. In T able 4 w e present the results for the 2 × 2 × 2, 2 × 4 , 2 × 5 , and 3 × 3 × 3 cases with netw orks using LeakyReLU activ ation functions. In the 2 × 2 × 2 case, we used a training dataset of size 502600, a v alidation dataset of size 55600, and a testing dataset of size 32000. In the 2 × 4 2 The justiﬁcation of the term “high probability” is the following. F or real n umbers, the set of degenerate tensors has co dimension at least 1 in the ambien t space R d n and thus it has measure zero. Thus, the probabilit y is zero that a tensor chosen randomly from R d n lands in a measure zero set. When ﬂoating p oint precision is used one expects that these con tinuous concepts are still w ell-approximated in the discrete case, ev en despite the fact that a non-empty subset (the exceptions) never has measure zero in this case. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 12 T ensor size Arc hitecture T raining acc. V alidation acc. T esting acc. Loss 2 × 2 × 2 (60,10,1) 92.49% 92.18% 92.09% 0.1837 T able 3. Hybrid netw ork architectures and accuracies for eac h tensor size, for degenerate and non-degenerate states classiﬁcation. and 2 × 5 cases, w e used a training dataset of size 252400, a v alidation dataset of size 55600, and a testing dataset of size 52000. In the 3 × 3 × 3 case, we used a training dataset of size 352400, a v alidation dataset of size 55600, and a testing dataset of size 52000. W e reached an av erage accuracy of 96% on the testing datasets for classiﬁcation of degenerate states. These results are quite in teresting, esp ecially in the 5-qubit case. In fact, as men tioned b efore, no explicit expression or eﬃcien t ev aluation metho d for the 2 × 5 h yp erdeterminant is av ailable, and it is exp ected to b e in tractable. The netw ork has 98% accuracy for the testing dataset, whic h means that we can determine with v ery high accuracy if a tensor is degenerate or not, whic h is equiv alen t to determining if the h yp erdeterminant v anishes or not. In the con text of qualitative characterization of entanglemen t, one only needs to kno w if the state annihilates or not the hyperdeterminant. In this sense, we pro vide an original result for the ev aluation of the n ullit y of the h yperdeterminant for 2 × 5 real tensors, that can b e used to in vestigate quan tum entanglemen t for 5-qubit systems (see Section 4.3 ). T ensor size Arc hitecture T raining acc. V alidation acc. T esting acc. Loss 2 × 2 × 2 (100,50,25,16,1) 93.44% 92.53% 92.74% 0.1629 2 × 4 (200,100,50,16,1) 99.50% 95.95% 95.94% 0.01791 2 × 5 (100,50,25,16,1) 99.95% 98.74% 98.83% 0.001533 3 × 3 × 3 (100,50,25,16,1) 98.18% 96.78% 96.83% 0.04770 T able 4. LeakyReLU net w ork arc hitectures and accuracies for eac h tensor size, for degenerate and non-degenerate states classiﬁcation. 3.3. Border-rank classiﬁcation. A state | ϕ i ∈ H ⊗ n is said to hav e r ank ≤ R if there is an expression | ϕ i = P R r =1 λ i | ϕ i i with λ i ∈ H and | ϕ i i ∈ H ⊗ n [ 16 ]. It is w ell kno wn that when n > 2 rank is not a closed condition, and it is not semi-contin uous. The ﬁrst example is 3-qubits, where the W -state | 100 i + | 010 i + | 001 i , which has rank 3, and is in the closure of the generic orbit, that of | 000 i + | 111 i , which has rank 2 [ 46 ]. In another context, Bini [ 10 ] deﬁned the notion of b order rank to regain semi-contin uit y . One says that a state | ϕ i ∈ H ⊗ n has b or der r ank ≤ R if there exists a family of rank R states {| ϕ  i |  > 0 } and lim  → 0 | ϕ  i = | ϕ i . Equiv alently , the set of b order rank ≤ R states, denoted σ R is the Zariski closure of the states of rank ≤ R . By construction, when H = C d w e hav e a c hain σ 1 ( σ 2 ( · · · ( σ g = H ⊗ n whic h ends at H ⊗ n b ecause the set of separable states is linearly non-degenerate in H ⊗ n . The minimal integer g for whic h σ g = H ⊗ n is called the generic r ank . A simple dimension coun t gives the exp e cte d generic r ank : e =  d n n ( d − 1) + 1  . F or complex tensors of format d × n the only known case when the generic rank diﬀers from the exp ected rank is 3 × 3 × 3 tensors or 3-qutrits ( d = 3 , n = 3 in the formulation ab o ve). Here the generic rank is 5 and not 4 as exp ected. F or 2 × 2 × 2 × 2 tensors or 4 qubits (the LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 13 case d = 2 , n = 4 ab ov e) the generic rank equals the exp ected generic rank of 4, ev en though it is kno wn that the set of rank 3 tensors is defective. It is conjectured that these are the only such cases (see [ 1 , 4 , 20 ]). Let σ ◦ s denote the states of rank exactly s . When H = R d there can b e more than one semi-algebraic set σ ◦ s that is full dimensional. These ranks are called typic al r anks . Note that for complex n umbers there is only one t ypical rank, whic h is the generic r ank . Given a probabilit y measure µ on H ⊗ n the measure of σ ◦ s represen ts the probabilit y of a random state having rank s . The following example from [ 27 ] illustrates illustrates this situation. Example 1. Set n = 3, d = 2. Both 2 and 3 are typica l ranks, and moreov er a p ositiv e v olume set of tensors with rank 3 ha ve no optimal low-rank appro ximation. The 2 × 2 × 2 h yp erdeterminant D separates R 2 × 2 × 2 in to regions of constan t typical rank. W e formed a sample of tensors of real rank at most 3 as a normalized sum of tensor pro ducts of v ectors with entries uniformly distributed in [ − . 5 , . 5]. F or this distribution we found: (1) with approximate frequency 86.6% D ( ϕ ) > 0, in which case the R -rank is 2, (2) with approximate frequency 13.4% D ( ϕ ) < 0, in which case R -rank is 3, (3) with frequency 0% D ( ϕ ) = 0, in whic h case the R -rank can b e 0,1,2,3. ♦ 3.3.1. A note on uniform sampling. W e construct points on algebraic v arieties by utilizing rational parameterizations of the form ϕ : U → V , with ϕ a rational function, and U , V subsets of a normed linear space. In general, if S is a uniform sample of U , ϕ ( S ) will not b e a uniform sample in ϕ ( U ). Instead of trying to uniformly sample the image of these v arieties [ 31 , 41 , 80 ], we provide training data that is constructed in the wa y w e think that one migh t construct the test data, or ho w one migh t imagine the data is pro duced from a state constructed in a lab. Whether this is a reasonable assumption is op en for debate. On one hand, it is alwa ys p ossible (by forcing incoherence, for instance) for an adv ersarial en tit y to construct a data set on the same v ariety that fo ols our classiﬁer. On the other hand, we can sa y that if we train our neural netw ork on data pro duced by a certain pro cess, we can b e conﬁden t that if the v alidation data is constructed b y the same process, then the accuracy n umbers we rep ort reﬂect a reasonable measure of conﬁdence in the classiﬁer. 3.3.2. R esults. In the 2 × 2 × 2 case, w e trained our net works to recognize the exact rank of tensors. In fact, for building the training and v alidation datasets, w e pro vided tensors from Segre v ariet y for rank one, tensors that are SLOCC equiv alen ts to the biseparable and GHZ states for rank 2, and states that are SLOCC equiv alen ts to the W state for rank 3, following the classiﬁcation of 3-qubits (see T able 7 ). W e also took in to account the t ypical rank (see Example 1 ) and used the sign of the hyperdeterminant to separate rank 2 and 3. In the 2 × 4 and 2 × 5 cases, we generated tensors for each class ‘ k ’ by summing k + 1 rank one tensors (and renormalized). The class ‘0’ corresp onds to rank one tensors, and the class ‘ r ’ corresp ond to tensors with b order rank r + 1. The training dataset is equally divided suc h that it contains the same num b er of samples for eac h class. In the 2 × 4 , w e generated tensors up to b order rank 4, and up to b order rank 5 in the 2 × 5 case. By generating v ectors for each class this w ay , we in tro duce noise in the data, esp ecially in the real case. How ev er, ev en in the presence of noise, the net work is able to learn and predict b order rank of states with go o d accuracy (84% in the 4-qubit case, and 80% in the 5-qubit case, on the testing dataset, which is also generated by the same pro cess). LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 14 F or example, we rep eatedly ev aluated the 5-qubit netw ork on the input states SLOCC equiv alent to | W 5 i = | 00001 i + | 00010 i + | 00100 i + | 01000 i + | 10000 i (see Section 4.1 ), and most of the time we (correctly) obtain the class ‘1’ (see Figure 4 ), indicating b order rank 2. T able 5 con tains our results for the 2 × 2 × 2 rank classiﬁcation with the hybrid net w ork. W e used a training dataset of size 102400, a v alidation dataset of size 25600, and a testing dataset of size 32000. W e reached 87% accuracy for the testing dataset. T ensor size Arc hitecture T raining acc. V alidation acc. T esting acc. Loss 2 × 2 × 2 (169,25,3) 88.19% 88.03% 87.95% 0.3028 T able 5. Hybrid net work arc hitectures and accuracies for 3-qubits, for rank classiﬁcation. T able 6 con tains results for the 2 × 2 × 2, 2 × 4 , and 2 × 5 cases using the LeakyReLU net w ork. In the 2 × 2 × 2 rank classiﬁcation problem, we used a training dataset of size 502500, a v alidation dataset of size 55600, and a testing dataset of size 52000. In the 2 × 4 and 2 × 5 b order rank classiﬁcation problems, we respectively used a training dataset of size 502400 and 802400, a v alidation dataset of size 55600 and a testing dataset of size 52000. T ensor size Arc hitecture T raining acc. V alidation acc. T esting acc. Loss 2 × 2 × 2 (200,100,50,25,3) 94.19% 94.07% 93.79% 0.1674 2 × 4 (200,100,50,25,3) 85.49% 84.45% 84.47% 0.3144 2 × 5 (200,100,50,25,3) 81.39% 79.88% 79.77% 0.4230 T able 6. LeakyReLU net w ork arc hitectures and accuracies for eac h tensor size, for rank and b order-rank classiﬁcation. 4. How to use the classifier after training 4.1. Predictions for a single quan tum state. W e study quan tum entanglemen t from a qualitativ e p oin t of view. Quan tum states are regroup ed in to classes, whic h correspond to SLOCC orbits. When the n um b er of orbits is inﬁnite, we talk ab out families dep ending on parameters [ 47 ]. An y state b elonging to a sp eciﬁc class will b e equiv alent to another state in the same class by the action of an element of this group. Moreov er, other properties, suc h as separabilit y , degeneracy , and tensor rank, are in v arian ts under the SLOCC action [ 46 ]. Hence, our classiﬁers should give the same answ er for SLOCC equiv alent states. Ho wev er, the accuracy of our classiﬁers is not 100%, and thus the probability of misclas- siﬁcation is non-zero, which giv es little conﬁdence for a single test. In order to signiﬁcantly reduce misclassiﬁcation for a sp eciﬁc quantum state (or set of states) we generate SLOCC equiv alent states and look at the most frequent answ er of the neural net work classiﬁer. A histogram of these results gives a graphical representation of the neural net work output distribution. Examples are inv estigated in the next sections. W e rep ort the results after applying the classiﬁer to enough p oints of the orbit so that the consensus was reac hed (the answ er was the same for larger data samples). R emark 4.1 . In the case of an exp erimental implemen tation of our classiﬁers, this pro cess of generating datasets for predictions should b e put in to persp ective with the impossibility of generating several copies (due to the non-cloning theorem [ 76 ]) of the same state, b efore applying a random lo cal SLOCC transformation [ 61 , 104 ]. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 15 4.2. Three-qubit en tanglemen t classiﬁcation. The 3-qubit case is simplest case that illustrates the existence of non-equiv alen t entangled states, such as the | GH Z i and | W i states, as it was noted in [ 32 ]. The entanglemen t classiﬁcation of 3-qubit systems (see T able 7 ) can b e describ ed b y join, secant and tangential v arieties [ 46 ], and by using dual v arieties as well [ 46 , 72 – 74 ]. Rank can also b e used as an algebraic measure of entanglemen t to distinguish b etw een sev eral 3-qubits entanglemen t classes [ 15 ]. Normal forms Class Rank Ca yley’s ∆ 222 | 000 i + | 111 i GHZ 2 6 = 0 | 001 i + | 010 i + | 100 i W 3 0 | 001 i + | 111 i Bi-Separable C-AB 2 0 | 010 i + | 111 i Bi-Separable B-CA 2 0 | 100 i + | 111 i Bi-Separable A-BC 2 0 | 000 i Separable 1 0 T able 7. Entanglemen t classiﬁcation of 3-qubit systems under SLOCC. Our binary classiﬁers built in Section 3 can also b e used to distinguish b et ween 3-qubit en tanglement classes. In fact, separable states can b e detected using the binary classiﬁer for tensors on the Segre v ariet y , presented in Section 3.1 . All states that are SLOCC equiv alen ts to the | GH Z i state corresp ond to non-degenerate states, and th us can be recognized using the binary classiﬁer for tensors on the dual v ariet y of the Segre v ariet y , presented in Section 3.2 . Finally , to distinguish b etw een Bi-Separable states and states that are SLOCC equiv alen ts to | W i , w e exploit the rank of tensors, by using the rank classiﬁer, presented in Section 3.3 , as a predictor for distinguishing b et ween rank 2 and rank 3 tensors. See Figures 5 , 6 , 7 , 8 . 4.3. En tanglemen t of ﬁve-qubit systems. In the ﬁv e-qubit case, since no complete or parametrized classiﬁcation is known, distinguishing en tanglemen t classes is a diﬃcult task. Some prior w ork focused on using ﬁlters [ 79 ] or using p olynomial in v ariants [ 69 ]. In these pa- p ers, they considered four 5-qubit systems | Φ 1 i , | Φ 2 i , | Φ 3 i , and | Φ 4 i (see Equations 2 , 3 , 4 , 5 ). | Φ 1 i = 1 √ 2 ( | 00000 i + | 11111 i ) (2) | Φ 2 i = 1 2 ( | 11111 i + | 11100 i + | 00010 i + | 00001 i ) (3) | Φ 3 i = 1 √ 6  √ 2 | 11111 i + | 11000 i + | 00100 i + | 00010 i + | 00001 i  (4) | Φ 4 i = 1 2 √ 2  √ 3 | 11111 i + | 10000 i + | 01000 i + | 00100 i + | 00010 i + | 00001 i  (5) These states do not b elong to the same entanglemen t classes [ 69 , 79 ]. W e ask if these states are degenerate or not. Our classiﬁer for 5-qubit degenerate states predicts that each of these states is degenerate, hence the 2 × 5 h yp erdeterminant should v anish on each (see Figure 9 ). On the other hand, the states | δ 1 i and | δ 2 i (see Equations 6 and 7 ) are kno wn to b e non-degenerate [ 42 , 45 ]. Our classiﬁer also predicts they are non-degenerate, see Figure 10 . | δ 1 i = 1 √ 14  | 00000 i + √ 3 | 00011 i + | 00100 i + | 01000 i + | 01001 i + √ 2 | 01111 i + | 10001 i + | 10110 i + | 11000 i + | 11011 i + | 11101 i  (6) | δ 2 i = 1 √ 11  | 00000 i + | 00100 i + | 00111 i + | 01010 i − | 01101 i + | 10001 i + | 10011 i + | 10111 i − | 11000 i + | 11110 i  (7) LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 16 5. A cknowledgements This work was partially supp orted b y the R ´ egion Bourgogne F ranc he-Comt ´ e, pro ject PHYF A (contract 20174-06235), the F renc h “In v estissemen ts d’Av enir” programme, pro ject ISITE-BF C (con tract ANR-15-IDEX-03) and by “BQR Mobilit ´ e Do ctorante” at U.T.B.M. The ﬁrst author thanks the Department of Mathematics and Statistics at Auburn Univ er- sit y for its hospitalit y ,and Dr. Mark Carp enter (Auburn Universit y) and Dr. F rabrice Lauri (U.T.B.M.) for lectures and discussion about Mac hine Learning. The authors w an t to thank Dr. F r´ ed ´ eric Holwec k for his advice, commen ts and discussions as w ell as the review ers for their relev an t comments and remarks, whic h impro ved the exp osition of the article. References [1] H. Ab o, G. Ottaviani, and C. P eterson, Induction for se c ant varieties of Se gr e varieties , T rans. Amer. Math. So c. 361 (2009), no. 2, 767–792. [2] M. An thony and P . L Bartlett, Neur al network le arning: The or etic al foundations , Cam bridge Univ ersit y Press, 2009. [3] A. Barron and R. Barron, Statistic al le arning networks: A unifying view , Pro ceedings of the 20th Symp osium Computer Science and Statistics (1988). [4] K. Baur, J. Draisma, and W. A. de Graaf, Se c ant dimensions of minimal orbits: c omputations and c onje ctur es , Exp eriment. Math. 16 (2007), no. 2, 239–250. [5] M. J. S. Beac h, I. D. Vlugt, A. Golub ev a, P . Huembeli, B. Kulc hytskyy, X. Luo, R. G. Melko, E. Merali, and G. T orlai, QuCumb er: wavefunction r e c onstruction with neur al networks , SciPost Ph ys. 7 (2019), 9. [6] E. C. Behrman and J. E. Steck, Dynamic le arning of p airwise and thr e e-way entanglement , 2011 Third W orld Congress on Nature and Biologically Inspired Computing, 2011, pp. 99–104. [7] M. Belkin, D. Hsu, S. Ma, and S. Mandal, R e c onciling mo dern machine-le arning pr actic e and the classic al bias–varianc e tr ade-oﬀ , PNAS 116 (2019), no. 32, 15849–15854. [8] R. Berko vits, Extr acting many-p article entanglement entr opy fr om observables using sup ervise d ma- chine le arning , Phys. Rev. B 98 (2018), no. 24. [9] J. Biamonte, P . Wittek, N. Pancotti, P . Reb entrost, N. Wieb e, and S. Lloyd, Quantum machine le arn- ing , Nature 549 (2017), no. 7671, 195–202 (English (US)). [10] D. Bini, M. Cap ov ani, F. Romani, and G. Lotti, O ( n 2 . 7799 ) c omplexity for n × n appr oximate matrix multiplic ation , Inform. Process. Lett. 8 (1979), no. 5, 234–235. [11] C. M. Bishop, Neur al networks for p attern r e c o gnition , Oxford Universit y Press, Inc., USA, 1995. [12] P . Borne, M. Benrejeb, and J. Hagg` ege, L es r´ ese aux de neur ones: pr ´ esentation et applic ations , V ol. 15, Editions OPHR YS, 2007. [13] M. C. Bram billa and G. Ottaviani, On the alexander–hirschowitz the or em , Journal of Pure and App lied Algebra 212 (2008), no. 5, 1229–1251. [14] P . Breiding, S. Kali ˇ snik, B. Sturmfels, and M. W einstein, L e arning algebr aic varieties fr om samples , Revista Matem´ atica Complutense 31 (2018), no. 3, 545–593. [15] J.-L. Brylinski, A lgebr aic me asur es of entanglement , Mathematics of quantum computation, 2002, pp. 19–40. [16] E. Carlini, N. Grieve, and L. Oeding, F our le ctur es on se c ant varieties , Connections betw een algebra, com binatorics, and geometry, 2014, pp. 101–146. [17] M. Che, L. Qi, Y. W ei, and G. Zhang, Ge ometric me asur es of entanglement in multip artite pur e states via c omplex-value d neur al networks , Neuro computing 313 (2018), 25–38. [18] L. Chen and D. [Pleaseinsert“PrerenderUnico de–˝intopream ble]. ¯ Dok ovi ´ c, Pr o of of the Gour-Wal lach c onje ctur e , Phys. Rev. A 88 (2013), no. 4. [19] X. Cheng, B. Khomtc houk, N. Matloﬀ, and P . Mohant y, Polynomial r e gr ession as an alternative to neur al nets (2018). . [20] L. Chiantini, G. Otta viani, and N. V annieuw enhov en, An algorithm for generic and low-r ank sp e ciﬁc identiﬁability of c omplex tensors , SIAM J. Math. Anal. 35 (2014), no. 4, 1265–1287. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 17 [21] F. Chollet et al., Ker as , 2015. Av ailable at https://keras.io . [22] O. Ch terental and D. Z. Djoko vic, Normal forms and tensor r anks of pur e states of four qubits (2006). arXiv:quant-ph/0612184 . [23] G. Cirrincione and M. Cirrincione, Neur al-b ase d ortho gonal data ﬁtting: The EXIN neur al networks , V ol. 66, John Wiley & Sons, 2011. [24] B. C. Cs´ aji et al., Appr oximation with artiﬁcial neur al networks , F aculty of Sciences, Etvs Lornd Univ ersity , Hungary 24 (2001), no. 48, 7. [25] G. Cybenko, Appr oximation by sup erp ositions of a sigmoidal function , Mathematics of Control, Signals, and Systems 5 (1989), no. 4, 455–455. [26] J. Dadok and V. Kac, Polar r epr esentations , J. Algebra 92 (1985), no. 2, 504–524. [27] V. de Silv a and L.-H. Lim, T ensor r ank and the il l-p ose dness of the b est low-r ank appr oximation pr oblem , SIAM J. Matrix Anal. Appl. 30 (2008), 1084–1127. [28] T. Dozat, Inc orp or ating Nester ov momentum into Adam , 2016. Av ailable at http://cs229.stanford. edu/proj2015/054_report.pdf . [29] G. Dreyfus, J-M Martinez, M. Sam uelides, M. B Gordon, F. Badran, S. Thiria, and L. H ´ erault, R´ ese aux de neur ones-m´ etho dolo gie et applic ations , Eyrolles, 2002. [30] R. O Duda, P . E Hart, et al., Pattern classiﬁc ation and sc ene analysis , V ol. 3, Wiley New Y ork, 1973. [31] E. Dufresne, P . Edw ards, H. Harrington, and J. Hauenstein, Sampling r e al algebr aic varieties for top o- lo gic al data analysis , 2019 18th IEEE In ternational Conference On Mac hine Learning And Applications (ICMLA), 2019, pp. 1531–1536. [32] W. D ¨ ur, G. Vidal, and J. I. Cirac, Thr e e qubits c an b e entangle d in two ine quivalent ways , Ph ys. Rev. A 62 (2000), 062314. [33] I. M. Gelfand, M. M. Kapranov, and A. V. Zelevinsky, Discriminants, r esultants, and multidimensional determinants , Mathematics: Theory & Applications, Boston: Birkh¨ auser, Boston, MA, 1994. [34] G. J. Gibson and C. F. N. Co wan, On the de cision r e gions of multilayer p er c eptr ons , Pro c. IEEE 78 (1990), no. 10, 1590–1594. [35] G. Gour and N. R W allac h, A l l maximal ly entangle d four-qubit states , J. Math. Ph ys. 51 (2010), no. 11, 112201. [36] Lishen. Go vender, Determination of quantum entanglement c oncurr enc e using multilayer p er c eptr on neur al networks. , Ph.D. Thesis, 2017. [37] J. Gray, L. Banc hi, A. Bay at, and S. Bose, Machine-le arning-assiste d many-b o dy entanglement me a- sur ement , Phys. Rev. Lett. 121 (2018), no. 15. [38] B. Hanin, Universal function appr oximation by de ep neur al nets with b ounde d width and R eLU activa- tions , Mathematics 7 (2019), no. 10, 992. [39] M. H Hassoun et al., F undamentals of artiﬁcial neur al networks , MIT press, 1995. [40] S. Haykin, Neur al networks: a c ompr ehensive foundation , Prentice hall, 1999. [41] V. Hern´ andez-Mederos and J. Estrada-Sarlab ous, Sampling p oints on r e gular p ar ametric curves with c ontr ol of their distribution , Computer Aided Geometric Design 20 (2003), no. 6, 363–382. [42] F. Holw eck and H. Jaﬀali, Thr e e-qutrit entanglement and simple singularities , J. Physics A 49 (2016), no. 46, 465301. [43] F. Holw eck, H. Jaﬀali, P . Lev a y, and J.-G. Luque, Maximal ly entangle d symmetric states , (in prepara- tion) (2019). [44] F. Holwec k, H. Jaﬀali, and I. Nounouh, Gr over’s algorithm and the se c ant varieties , Quantum Infor- mation Pro cessing 15 (2016), no. 11, 4391–4413. [45] F. Holw eck, J.-G. Luque, and M. Planat, Singularity of typ e d 4 arising fr om four qubit systems , J. Ph ysics A 47 (2014), no. 13, 135301. [46] F. Holw ec k, J.-G. Luque, and J.-Y. Thib on, Ge ometric descriptions of entangle d states by auxiliary varieties , J. Math. Ph ys. 53 (2012), no. 10, 102203. [47] F. Holw eck, J.-G. Luque, and J.-Y. Thibon, Entanglement of four-qubit systems: A ge ometric atlas with p olynomial c omp ass ii (the tame world) , J. Math. Phys. 58 (2017), no. 2, 022201. [48] F. Holw eck, J.-G. Luque, and J.-Y. Thib on, Entanglement of four qubit systems: A ge ometric atlas with p olynomial c omp ass i (the ﬁnite world) , J. Math. Phys. 55 (2014), no. 1, 12202. [49] F. Holwec k and L. Oeding, Hyp er determinants fr om the E 8 Discriminant (2018). . LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 18 [50] K. Hornik, Appr oximation c ap abilities of multilayer fe e dforwar d networks , Neural Net works 4 (1991), no. 2, 251–257. [51] P . Huggins, B. Sturmfels, J. Y u, and D. Y uster, The hyp er determinant and triangulations of the 4-cub e , Math. Comp. 77 (2008), no. 263, 1653–1679. [52] A. G. Iv akhnenk o, Polynomial the ory of c omplex systems , IEEE T ransactions on Systems, Man, and Cyb ernetics 4 (1971), 364–378. [53] H. Jaﬀali and F. Holw eck, Quantum entanglement involve d in Gr over’s and Shor’s algorithms: the four-qubit c ase , Quantum Information Pro cessing 18 (2019), no. 5, 133. [54] V. Kac, Some r emarks on nilp otent orbits , J. Algebra 64 (1980), no. 1, 190–213. [55] I. Kerenidis, J. Landman, A. Luongo, and A. Prak ash, q-me ans: A quantum algorithm for unsup ervise d machine le arning , Adv ances in neural information pro cessing systems, 2019, pp. 4136–4146. [56] T. Khanna, F oundations of neur al networks , Reading: Addison W esley (1990). [57] J. Kileel, M. T rager, and J. Bruna, On the expr essive p ower of de ep p olynomial neur al networks , Adv ances in neural information pro cessing systems, 2019, pp. 10310–10319. [58] S. Y. Kung, K. Diamantaras, W. D. Mao, and J. S. T aur, Gener alize d p er c eptr on networks with non- line ar discriminant functions , Neural Net works (1992), 245–279. [59] J. M. Landsb erg, T ensors: ge ometry and applic ations , Graduate Studies in Mathematics, vol. 128, American Mathematical So ciet y, Providence, RI, 2012. [60] , Ge ometry and c omplexity the ory , Cam bridge Studies in Adv anced Mathematics, Cambridge Univ ersity Press, 2017. [61] B P Lany on and N K Langford, Exp erimental ly gener ating and tuning r obust entanglement b etwe en photonic qubits , New Journal of Physics 11 (2009jan), no. 1, 013008. [62] H.-X. Li and E. S. Lee, Interp olation functions of fe e dforwar d neur al networks , Computers & Mathe- matics With Applications 46 (2003), no. 12, 1861–1874. [63] R. P . Lippmann, An intr o duction to c omputing with neur al nets , ACM Sigarch Computer Architecture News 16 (1988), no. 1, 7–25. [64] B. Llanas and F. J. Sainz, Constructive appr oximate interp olation by neur al networks , J. Comput. Appl. Math. 188 (2006), no. 2, 283–308. [65] S. Llo yd, M. Mohseni, and P . Reb entrost, Quantum princip al c omp onent analysis , Nature Ph ysics 10 (2014), no. 9, 631–633. [66] S. Lu, S. Huang, K. Li, J. Li, J. Chen, D. Lu, Z. Ji, Y. Shen, D. Zhou, and B. Zeng, Sep ar ability- entanglement classiﬁer via machine le arning , Phys. Rev. A 98 (2018), no. 1. [67] Z. Lu, H. Pu, F. W ang, Z. Hu, and L. W ang, The expr essive p ower of neur al networks: A view fr om the width , neural information pro cessing systems (2017), 6231–6239. [68] J.-G. Luque and J.-Y. Thib on, Polynomial invariants of four qubits , Phys. Rev. A 67 (2003), no. 4, 42303. [69] , A lgebr aic invariants of ﬁve qubits , J. Physics A 39 (2006), no. 2, 371–377. [70] Y.-C. Ma and M.-H. Y ung, T r ansforming Bel l’s ine qualities into state classiﬁers with machine le arning , np j Quan tum Information 4 (2018), no. 1, 34. [71] W. S. McCulloch and W. Pitts, A lo gic al c alculus of the ide as immanent in nervous activity , Bull. Math. Biol. 52 (1990), no. 4, 99–115. [72] A. Miy ake, Classiﬁc ation of multip artite entangle d states by multidimensional determinants , Ph ys. Rev. A 67 (2003), 012108. [73] A. Miy ake and F. V erstraete, Multip artite entanglement in 2 × 2 × n quantum systems , Phys. Rev. A 69 (2004), no. 1, 12101. [74] A. Miyak e and M. W adati, Multip artite entanglement and hyp er determinants , Quantum Information & Computation 2 (2002), no. 7, 540–555. [75] H. P . Nautrup, N. Delfosse, V. Dunjko, H. J Briegel, and N. F riis, Optimizing quantum err or c orr e ction c o des with r einfor c ement le arning , Quantum 3 (2019), 215. [76] M. A. Nielsen and I. L. Chuang, Quantum c omputation and quantum information: 10th anniversary e dition , 10th ed., Cambridge Universit y Press, New Y ork, NY, USA, 2011. [77] S.-K. Oh, W. P edrycz, and B.-J. P ark, Polynomial neur al networks ar chite ctur e: analysis and design , Computers & Electrical Engineering 29 (2003), no. 6, 703 –725. LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 19 [78] H. C. Ong, C., L. Chee Kang, and Y. Y eow W ui, Non line ar appr oximations using multi-layer e d p er- c eptions and p olynomial r e gr essions , Pro ceedings of the 2nd IMT-GT Regional Conference on Mathe- matics, Statistics and Applications, Univ ersiti Sains Malaysia, Penang, June 13-15 (2006). [79] A. Osterloh and J. Siew ert, Entanglement monotones and maximal ly entangle d states in multip artite qubit systems , International Journal of Quan tum Information 4 (2006), no. 3, 531–540. [80] L. P agani and P . J. Scott, Curvatur e b ase d sampling of curves and surfac es , Computer Aided Geometric Design 59 (2018), 32–48. [81] Y. Quek, S. F ort, and H. K. Ng, A daptive quantum state tomo gr aphy with neur al networks. (2018). arXiv:1812.06693 . [82] R. Raturi, L ar ge data analysis via interp olation of functions: Interp olating p olynomials vs artiﬁcial neur al networks , American Journal of Intelligen t Systems 8 (2018), no. 1, 6–11. [83] P . Reb en trost, M. Mohseni, and S. Lloyd, Quantum supp ort ve ctor machine for big data classiﬁc ation , Ph ys. Rev. Lett. 113 (2014), no. 13, 130503. [84] M. Sc huld, I. Sinayskiy, and F. P etruccione, The quest for a quantum neur al network , Quantum Infor- mation Pro cessing 13 (2014), no. 11, 2567–2586. [85] , An intr o duction to quantum machine le arning , Con temp orary Ph ysics 56 (2015), no. 2, 172– 185. [86] C. Segre, Sul le c orrisp ondenze quadriline ari tr a forme di 1.a sp e cie e su alcune lor o r appr esentazioni sp aziali , Annali Mat. pura ed applicata 29 (1920), 105–140 (Italian). [87] Y.-B. Sheng and L. Zhou, Distribute d se cur e quantum machine le arning , Science Bulletin 62 (2017), no. 14, 1025–1029. [88] Y. Shin and J. Ghosh, Appr oximation of multivariate functions using ridge p olynomial networks , Pro- ceedings 1992 IJCNN in ternational joint conference on neural netw orks, 1992, pp. 380–385. [89] , Ridge p olynomial networks , IEEE T ransactions on Neural Netw orks 6 (1995), no. 3, 610–622. [90] D. F. Sp ech t, Gener ation of p olynomial discriminant functions for p attern r e c o gnition , IEEE T ransac- tions on Electronic Computers 16 (1967), no. 3, 308–319. [91] I. Sutskev er, J. Martens, G. E. Dahl, and G. E. Hinton, On the imp ortanc e of initialization and momentum in de ep le arning , Proceedings of the 30th international conference on mac hine learning, 2013, pp. 1139–1147. [92] V R. V emuri, A rtiﬁcial neur al networks: c onc epts and c ontr ol applic ations , Los Alamitos, CA: IEEE Computer So ciet y Press (1992). [93] F. V erstraete, J Dehaene, B D. Mo or, and H. V erschelde, F our qubits c an b e entangle d in nine diﬀer ent ways , Ph ys. Rev. A 65 (2002), no. 5, 52112. [94] ` E. B. Vinberg and A. G. ` Ela ˇ svili, A classiﬁc ation of the thr e e-ve ctors of nine-dimensional sp ac e , T rudy Sem. V ektor. T enzor. Anal. 18 (1978), 197–233. [95] B. W ang, L e arning to dete ct entanglement (2017). . [96] L. W ang and D. L. Alkon, Artiﬁcial neur al networks: oscil lations, chaos, and se quenc e pr o c essing , Neural Net works T echnology Series, IEEE Computer Society Press, 1993. [97] J. W eyman and A. Zelevinsky, Singularities of hyp er determinants , Ann. Inst. F ourier (Grenoble) 46 (1996), no. 3, 591–644. [98] B. Widrow and M. A. Lehr, 30 ye ars of adaptive neur al networks: p er c eptr on, madaline, and b ackpr op- agation , Pro c. IEEE 78 (1990), no. 9, 1415–1442. [99] N. Wiebe, D. Braun, and S. Lloyd, Quantum algorithm for data ﬁtting , Phys. Rev. Lett. 109 (2012), 050505. [100] N. Wieb e, A. Kapo or, and K. M. Svore, Quantum de ep le arning , Quan tum Info. Comput. 16 (2016), no. 7-8, 541–587. [101] J. Wi ´ sniewsk a and M. Saw erw ain, Dete cting entanglement in quantum systems with artiﬁcial neur al network , In telligent information and database systems, 2015, pp. 358–367. [102] I. H. Witten, E. F rank, M. A. Hall, and C. J. P al, Data mining, fourth e dition: Pr actic al machine le arning to ols and te chniques , 4th ed., Morgan Kaufmann Publishers Inc., San F rancisco, CA, USA, 2016. [103] L. W ossnig and S. Severini, Quantum machine le arning: Chal lenges and opp ortunities , Bulletin of the American Ph ysical So ciety (2019). LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 20 [104] W. Xu, X. Zhao, and G. Long, Eﬃcient gener ation of multi-photon w states by joint-me asur ement , Progress in Natural Science 18 (2008), no. 1, 119 –122. [105] M. Y ang, C.-l. Ren, Y.-c. Ma, Y. Xiao, X.-J. Y e, L.-L. Song, J.-S. Xu, M.-H. Y ung, C.-F. Li, and G.-C. Guo, Exp erimental simultane ous le arning of multiple nonclassic al c orr elations , Phys. Rev. Lett. 123 (2019), no. 19, 190401. Histograms 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 1 . 0 0 0 , V a r i a n c e σ = 0 . 7 0 1 Figure 4. Histogram of the border rank classiﬁer predictions for 10000 p oin ts SLOCC equiv alent to the state | W 5 i . The plot predicts that the state is of b order rank 2 (class ‘1’). 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 0 0 0 , V a r i a n c e σ = 0 . 0 0 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 0 3 1 , V a r i a n c e σ = 0 . 1 4 1 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 1 . 2 5 1 . 5 0 1 . 7 5 2 . 0 0 Bi n a r y c l a sse s 0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 0 1 3 , V a r i a n c e σ = 0 . 1 5 8 Figure 5. Histogram predictions for 1000 points SLOCC equiv alen t to | 000 i using our trained classiﬁers for (in order, from left to right) separable states, degenerate states and tensor rank. Being class ‘0’ in eac h plot resp ectiv ely predicts that the state is separable, degenerate, and of rank one. Email addr ess : hamza.jaffali@utbm.fr Femto-ST/UTBM, Universit ´ e de Bourgogne Franche-Comt ´ e, 90010, Belfor t, France Email addr ess : oeding@auburn.edu Dep ar tment of Ma thema tics and St a tistics, Auburn University, A uburn, AL, USA LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 21 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 9 9 1 , V a r i a n c e σ = 0 . 0 8 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 0 1 2 , V a r i a n c e σ = 0 . 0 8 3 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 1 . 2 5 1 . 5 0 1 . 7 5 2 . 0 0 Bi n a r y c l a sse s 0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 9 4 1 , V a r i a n c e σ = 0 . 2 3 5 Figure 6. Histogram predictions for 1000 points that are SLOCC equiv alent to 1 √ 2 ( | 000 i + | 011 i ) using our trained classiﬁer for (in order, from left to righ t) separable states, degenerate states and tensor rank classiﬁers. The plots predict the state is entangled (class ‘1’ on the left), degenerate (class ‘0’ in the middle), and of rank tw o (class ‘1’ on the right). 0 . 6 0 . 8 1 . 0 1 . 2 1 . 4 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 1 . 0 0 0 , V a r i a n c e σ = 0 . 0 0 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 0 1 1 , V a r i a n c e σ = 0 . 0 8 5 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 1 . 2 5 1 . 5 0 1 . 7 5 2 . 0 0 Bi n a r y c l a sse s 0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 1 . 9 5 4 , V a r i a n c e σ = 0 . 2 5 2 Figure 7. Histogram predictions for p oints that are SLOCC equiv alen t to the W-state 1 √ 3 ( | 001 i + | 010 i + | 100 i ) using our trained classiﬁers for (in order, from left to righ t) separable states, degenerate states and tensor rank. The left plot and middle plots use 1000 p oin ts and resp ectiv ely predict that the state is en tangled (class ‘1’) and degenerate (class ‘0’). The right plot with 10000 equiv alen t p oints predicts that the state is of rank three (class ‘2’). 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 9 0 8 , V a r i a n c e σ = 0 . 2 8 1 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 9 9 2 , V a r i a n c e σ = 0 . 0 7 0 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 1 . 2 5 1 . 5 0 1 . 7 5 2 . 0 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0 8 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 1 . 1 6 7 , V a r i a n c e σ = 0 . 4 0 6 Figure 8. Histogram predictions for p oints that are SLOCC equiv alen t to the GHZ-state 1 √ 2 ( | 000 i + | 111 i ) using our trained classiﬁers for (in order, from left to righ t) separable states, degenerate states and tensor rank. The left plot and middle plots use 1000 p oints and resp ectively predict that the state is en tangled (class ‘1’) and non-degenerate (class ‘1’). The righ t plot with 10000 equiv alent p oin ts predicts that the state is of rank tw o (class ‘1’). LEARNING ALGEBRAIC MODELS OF QUANTUM ENT ANGLEMENT 22 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 3 9 2 , V a r i a n c e σ = 0 . 4 8 7 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 3 8 9 , V a r i a n c e σ = 0 . 4 8 6 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 3 3 8 , V a r i a n c e σ = 0 . 4 7 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 3 3 3 , V a r i a n c e σ = 0 . 4 7 0 Figure 9. Histograms for the degenerate states classiﬁer for 10000 p oints resp ectiv ely SLOCC equiv alent to the states | Φ 1 i , | Φ 2 i , | Φ 3 i , and | Φ 4 i , from left to righ t. Classes ‘0’ and ‘1’ resp ectively refer to degenerate and non- degenerate states. Here all states are predicted to b e degenerate. 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 6 3 1 , V a r i a n c e σ = 0 . 4 8 1 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bi n a r y c l a sse s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Pr o p o r t i o n M e a n v a l u e μ = 0 . 6 6 5 , V a r i a n c e σ = 0 . 4 7 1 Figure 10. Histograms for the degenerate states classiﬁer on 10000 p oints resp ectiv ely SLOCC equiv alent to | δ 1 i (left) and | δ 2 i (right). Classes ‘0’ and ‘1’ resp ectiv ely refer to degenerate and non-degenerate states. Here b oth states are predicted to b e non-degenerate.

Learning Algebraic Models of Quantum Entanglement

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment