Quantum-Inspired Support Vector Machine

Support vector machine (SVM) is a particularly powerful and flexible supervised learning model that analyzes data for both classification and regression, whose usual algorithm complexity scales polynomially with the dimension of data space and the nu…

Authors: Chen Ding, Tian-Yi Bao, He-Liang Huang

Quantum-Inspired Support Vector Machine
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 1 Quantum-Inspired Support V ector Machine Chen Ding, T ian-Y i Bao, and He-Liang Huang ∗ Abstract —Support vector machine (SVM) is a particularly powerful and flexible supervised learning model that analyzes data for both classification and regression, whose usual algorithm complexity scales polynomially with the dimension of data space and the number of data points. T o tackle the big data challenge, a quantum SVM algorithm was proposed, which is claimed to achieve exponential speedup for least squares SVM (LS- SVM). Here, inspired by the quantum SVM algorithm, we present a quantum-inspired classical algorithm for LS-SVM. In our approach, an impr oved fast sampling technique, namely indirect sampling, is proposed f or sampling the kernel matrix and classifying. W e first consider the LS-SVM with a linear kernel, and then discuss the generalization of our method to non-linear kernels. Theoretical analysis shows our algorithm can make classification with arbitrary success probability in logarithmic runtime of both the dimension of data space and the number of data points for low rank, low condition number and high dimensional data matrix, matching the runtime of the quantum SVM. Index T erms —Quantum-inspired algorithm, machine learning, support vector machine, exponential speedup, matrix sampling. I . I N T R O D U C T I O N S INCE the 1980s, quantum computing has attracted wide attention due to its enormous advantages in solving hard computational problems [1], such as integer factorization [2]– [4], database searching [5], [6], machine learning [7]–[11] and so on [12], [13]. In 1997, Daniel R. Simon offered compelling evidence that the quantum model may hav e significantly more complexity theoretic power than the probabilistic T uring machine [14]. Howe ver , it remains an interesting question where is the border between classical computing and quantum computing. Although many proposed quantum algorithms ha ve exponential speedups over the existing classical algorithms, is there any way we can accelerate such classical algorithms to the same complexity of the quantum ones? In 2018, inspired by the quantum recommendation sys- tem algorithm proposed by Iordanis Kerenidis and Anupam Prakash [15], Ewin T ang designed a classical algorithm to produce a recommendation algorithm that can achieve an This work was supported by the Open Research Fund from State Key Laboratory of High Performance Computing of China (Grant No. 201901-01), National Natural Science Foundation of China under Grants No. 11905294, and China Postdoctoral Science Foundation. ( Corresponding author: He- Liang Huang. Email: quanhhl@ustc.edu.cn ) Chen Ding is with CAS Centre for Excellence and Synergetic Innovation Centre in Quantum Information and Quantum Physics, Univ ersity of Science and T echnology of China, Hefei, Anhui 230026, China. T ian-Y i Bao is with Department of Computer Science, Uni versity of Oxford, W olfson Building, Parks Road, OXFORD, O X1 3QD, UK. He-Liang Huang is with Hefei National Laboratory for Physical Sciences at Microscale and Department of Modern Physics, University of Science and T echnology of China, Hefei, Anhui 230026, China, and also with CAS Centre for Excellence and Synergetic Innovation Centre in Quantum Information and Quantum Physics, University of Science and T echnology of China, Hefei, Anhui 230026, China. exponential improvement on previous algorithms [16], which is a breakthrough that shows how to apply the subsampling strategy based on Alan Frieze, Ravi Kannan, and Santosh V empala’ s 2004 algorithm [17] to find a low-rank approxi- mation of a matrix. Subsequently , T ang continued to use same techniques to dequantize two quantum machine learning algo- rithms, quantum principal component analysis [18] and quan- tum supervised clustering [19], and shows classical algorithms could also match the bounds and runtime of the corresponding quantum algorithms, with only polynomial slowdo wn [20]. Later , András Gilyén et al. [21] and Nai-Hui Chia et al. [22] independently and simultaneously proposed a quantum- inspired matrix inv erse algorithm with logarithmic complexity of matrix size, which eliminates the speedup advantage of the famous Harrow-Hassidim-Lloyd (HHL) algorithm [23] on certain conditions. Recently , Juan Miguel Arrazola et al. studied the actual performance of quantum-inspired algorithms and found that quantum-inspired algorithms can perform well in practice under given conditions. Howe ver , the conditions should be further reduced if we want to apply the algorithms to practical datasets [24]. All of these works giv e a very promising future for designing the quantum-inspired algorithm in the machine learning area, where matrix inv erse algorithms are universally used. Support vector machine (SVM) is a data classification algo- rithm which is commonly used in machine learning area [25], [26]. Extensiv e studies hav e been conducted on SVMs to boost and optimize their performance, such as the sequential minimal optimization algorithm [27], the cascade SVM algorithm [28], and the SVM algorithms based on Markov sampling [29], [30]. These algorithms offer promising speedup either by changing the way of training a classifier, or by reducing the size of training sets. Howe ver , the time complexity of current SVM algorithms are all polynomial of data sizes. In 2014, Patrick Rebentrost, Masoud Mohseni and Seth Lloyd proposed the quantum SVM algorithm [31], which can achie ve an exponential speedup compared to the classical SVMs. The time complexity of quantum SVM algorithm is polynomial of the logarithm of data sizes. Inspired by the quantum SVM al- gorithm, T ang’ s methods [16] and András Gilyén et al. ’ s work [21], we propose a quantum-inspired classical SVM algorithm, which also shows exponential speedup compared to previous classical SVM for low rank, low condition number and high dimensional data matrix. Both quantum SVM algorithm [31] and our quantum-inspired SVM algorithm are least squares SVM (LS-SVM), which reduce the optimization problem to finding the solution of a set of linear equations. Our algorithm is a dequantization of the quantum SVM algorithm [31]. In quantum SVM algorithm, the labeled data vectors ( 𝑥 𝑗 for 𝑗 = 1 , .. ., 𝑚 ) are mapped to quantum vec- tors | 𝑥 𝑗 i = 1 / | 𝑥 𝑗 | Í ( 𝑥 𝑗 ) 𝑘 | 𝑘 i via a quantum random access JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 2 memory (qRAM) and the kernel matrix is prepared using quantum inner product e valuation [19]. Then the solution of SVM is found by solving a linear equation system related to the quadratic programming problem of SVM using the quantum matrix in version algorithm [23]. In our quantum- inspired SVM, the labeled vectors are stored in an arborescent data structure which provides the ability to random sampling within logarithmic time of the vector lengths. By performing sampling on these labeled vectors both by their numbers and lengths to get a much smaller dataset, we then find the approximate singular value decomposition of the kernel matrix. And finally , we solve the optimization problem and perform classification based on the solved parameters. Our methods, particularly the sampling technique, is based on [16], [21]. Howe ver , the previous sampling techniques cannot be simply copied to solve the SVM tasks, since we don’t have an efficient direct sampling access to the kernel matrix we want to perform matrix in version on (see Section II- B for a more detailed explanation). Hence we hav e de veloped an indirect sampling technique to solve such problem. In the whole process, we need to av oid the direct multiplication on the vectors or matrices with the same size as the kernel, in case losing the exponential speedup. W e first consider the LS-SVM with linear kernels, no regularization terms and no bias of the classification hyperplane, which could be regarded as the prototype for quantum-inspired techniques applied in various SVMs. Then we sho w that the regularization terms can be easily included in the algorithm in Section III. Finally , we discuss the generalization of our method to non-linear kernels in Section VII-C and the general case without the constraint on biases of classification hyperplanes in Section VII-D. Theoretical analysis shows that our quantum-inspired SVM can achiev e exponential speedup over existing classical algorithms under several conditions. Experiments are carried out to demonstrate the feasibility of our algorithm. The indirect sampling dev eloped in our work opens up the possibility of a wider application of the sampling methods into the field of machine learning. I I . P R E L I M I N A R Y A. Notations W e list some matrix-related notations used in this paper . B. Least squares SVM Suppose we hav e 𝑚 data points { ( 𝑥 𝑗 , 𝑦 𝑗 ) : 𝑥 𝑗 ∈ R 𝑛 , 𝑦 𝑗 = ± 1 } 𝑗 = 1 , . .. , 𝑚 , where 𝑦 𝑗 = ± 1 depending on the class which 𝑥 𝑗 belongs to. Denote ( 𝑥 1 , . .. , 𝑥 𝑚 ) by 𝑋 and ( 𝑦 1 , . . . , 𝑦 𝑚 ) 𝑇 by 𝑦 . A SVM finds a pair of parallel hyperplanes 𝑥 · 𝑤 + 𝑏 = ± 1 that divides the points into two classes depending on the given data. Then for any new input points, it can make classification by its relativ e position with the hyperplanes. W e make the following assumption on the dataset so as to simplify the problem: Assume these data points are equally distributed on both sides of a hyperplane that passes through the origin and their labels are di vided by such hyperplane. Thus we assume 𝑏 = 0 . An generalized method for 𝑏 ≠ 0 is discussed in Section VII-D. T ABLE I T H E N O T A T I O NS Symbol Meaning 𝐴 matrix 𝐴 𝑦 vector 𝑦 or matrix 𝑦 with only one column 𝐴 + pseudo inv erse of 𝐴 𝐴 𝑇 transpose of 𝐴 𝐴 + 𝑇 transpose of pseudo in verse of 𝐴 𝐴 𝑖 , ∗ 𝑖 -th ro w of 𝐴 𝐴 ∗ , 𝑗 𝑗 -th column of 𝐴 k 𝐴 k 2-operator norm of 𝐴 k 𝐴 k 𝐹 Frobenius norm of 𝐴 𝑄 ( · ) time complexity for querying an element of · 𝐿 ( ·) time complexity for sampling an element of · According to [26], the optimization problem of LS-SVM with linear kernel is min 𝑤 , 𝑏 ,𝑒 L 1 ( 𝑤 , 𝑏 , 𝑒 ) = 1 2 𝑤 𝑇 𝑤 + 𝛾 2 𝑚  𝑘 = 1 𝑒 2 𝑘 , subject to 𝑦 𝑘 ( 𝑤 𝑇 𝑥 𝑘 + 𝑏 ) = 1 − 𝑒 𝑘 , 𝑘 = 1 , . . . , 𝑚 . T ake 𝑏 = 0 , we get min 𝑤 , 𝑒 L 2 ( 𝑤 , 𝑒 ) = 1 2 𝑤 𝑇 𝑤 + 𝛾 2 𝑚  𝑘 = 1 𝑒 2 𝑘 , subject to 𝑦 𝑘 𝑤 𝑇 𝑥 𝑘 = 1 − 𝑒 𝑘 , 𝑘 = 1 , . . . , 𝑚 . One defines the Lagrangian ℒ ( 𝑤 , 𝑒 , 𝜇 ) = L 2 ( 𝑤 , 𝑒 ) − 𝑚  𝑘 = 1 𝜇 𝑘 ( 𝑦 𝑘 𝑤 𝑇 𝑥 𝑘 − 1 + 𝑒 𝑘 ) . The condition for optimality 𝜕 ℒ 𝜕 𝑤 = 0 → 𝑤 = 𝑚  𝑘 = 1 𝜇 𝑘 𝑦 𝑘 𝑥 𝑘 , 𝜕 ℒ 𝜕 𝑒 𝑘 = 0 → 𝜇 𝑘 = 𝛾 𝑒 𝑘 , 𝑘 = 1 , . . . , 𝑚 , 𝜕 ℒ 𝜕 𝜇 𝑘 = 0 → 𝑦 𝑘 𝑤 𝑇 𝑥 𝑘 − 1 + 𝑒 𝑘 = 0 , 𝑘 = 1 , . . . , 𝑚 can be written as the solution to the following set of linear equations 𝑍 𝑇 𝑍 𝜇 + 𝛾 − 1 𝜇 = 1 , where 𝑍 = ( 𝑥 1 𝑦 1 , . . . , 𝑥 𝑚 𝑦 𝑚 ) . Let 𝛼 𝑘 = 𝜇 𝑘 𝑦 𝑘 , we have ( 𝑋 𝑇 𝑋 + 𝛾 − 1 𝐼 ) 𝛼 = 𝑦 . (1) Once 𝛼 is solved, the classification hyperplane will be 𝑥 𝑇 𝑋 𝛼 = 0 . Given query point 𝑥 , we ev aluate sgn ( 𝑥 𝑇 𝑋 𝛼 ) to make classification. W e use our sampling techniques in solving Equation (1) and ev aluating sgn ( 𝑥 𝑇 𝑋 𝛼 ) to av oid time complexity overhead of poly ( 𝑚 ) or poly ( 𝑛 ) , which will kill the wanted exponential speedup. Note that the quantum-inspired algorithm for linear equations [21], [22] may inv erse a low-rank matrix in loga- rithmic runtime. Howe ver , such algorithm cannot be in vok ed directly to solve Equation (1) here, since the complexity of direct computing the matrix 𝑋 𝑇 𝑋 + 𝛾 − 1 𝐼 is polynomial, which would once again kill the exponential speedup. Thus we JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 3 need to dev elop the indirect sampling technique to efficiently perform matrix inv ersion on 𝑋 𝑇 𝑋 + 𝛾 − 1 𝐼 with only sampling access of 𝑋 . C. The sampling technique W e show the definition and idea of our sampling method to get indices, elements or submatrices, which is the key technique used in our algorithm, as well as in [16], [17], [21]. Definition 1 (Sampling on vectors) . Suppose 𝑣 ∈ C 𝑛 , define 𝑞 ( 𝑣 ) as a pr obability distribution that: 𝑥 ∼ 𝑞 ( 𝑣 ) : P [ 𝑥 = 𝑖 ] = | 𝑣 𝑖 | 2 k 𝑣 k 2 . Picking an index according to the pr obability distribution 𝑞 ( 𝑣 ) is called a sampling on 𝑣 . Definition 2 (Sampling the indices from matrices) . Suppose 𝐴 ∈ C 𝑛 × 𝑚 , define 𝑞 ( 𝐴 ) as a 2-dimensional pr obability distri- bution that: ( 𝑥 , 𝑦 ) ∼ 𝑞 ( 𝑣 ) : P [ 𝑥 = 𝑖 , 𝑦 = 𝑗 ] = | 𝐴 𝑖 𝑗 | 2 k 𝐴 k 2 𝐹 . Picking a pair of indices ( 𝑖 , 𝑗 ) accor ding to the pr obability distribution 𝑞 ( 𝐴 ) is called a sampling on 𝐴 . Definition 3 (Sampling the submatrices from matrices) . Sup- pose the tar get is to sample a submatrix 𝑋 00 ∈ C 𝑐 × 𝑟 fr om 𝑋 ∈ C 𝑛 × 𝑚 . F irst we sample 𝑟 times on the vector ( k 𝑋 ∗ , 𝑗 k ) 𝑗 = 1 , . .. , 𝑚 and get column indices 𝑗 1 , . .. , 𝑗 𝑟 . The columns 𝑋 ∗ , 𝑗 1 , . .. , 𝑋 ∗ , 𝑗 𝑟 form submatrix 𝑋 0 . Then we sample 𝑐 times on the 𝑗 -th column of 𝑋 and get r ow indices 𝑖 1 , . .. , 𝑖 𝑐 . In each time the 𝑗 is sampled uniformly at random fr om 𝑗 1 , . .. , 𝑗 𝑟 . The r ows 𝑋 0 𝑖 1 , ∗ , . .. , 𝑋 0 𝑖 𝑐 form submatrix 𝑋 00 . The matrices 𝑋 0 and 𝑋 00 ar e normalized so that E [ 𝑋 0 𝑋 0 𝑇 ] = 𝑋 𝑋 𝑇 and E [ 𝑋 00 𝑇 𝑋 00 ] = 𝑋 0 𝑇 𝑋 0 . The process of sampling the submatrices from matrices (as described in Def. 3) is shown in Fig. 1. T o put it simple, it is taking se veral ro ws and columns out of the matrix by a random choice decided by the “importance” of the elements. Then normalize them so that they are unbiased from the original rows and columns. T o achiev e fast sampling, we usually store vectors in an arborescent data structure (such as binary search tree) as suggested in [16] and store matrices by a list of their row trees or column trees. Actually , the sampling is an analog of quan- tum states measurements. It only rev eals a low-dimensional projection of vectors and matrices in each calculation. Rather than computing with the whole vector or matrix, we choose the most representative elements of them for calculation with a high probability (we choose the elements according to the probability of their squares, which is also similar to the quantum measurement of quantum states.). The sampling technique we use has the advantage of unbiasedly representing the original vector while consuming less computing resources. W e note that there are other kinds of sampling methods for SVM such as the Markov sampling [29], [30]. Different sampling methods may work well on different scenarios. Our algorithm is designed for low-rank datasets, while the algo- rithms based on Marko v sampling [29], [30] may work well on the datasets that the columns form a uniformly er godic Marko v chain. In our algorithm, to achieve exponential speedup, the sampling technique is different from Markov sampling: (i) W e sample both the ro ws and columns of matrix, rather than only sampling columns. (ii) W e sample each elements according to norm-squared probability distribution. (iii) In each dot product calculation (Alg. 1), we use sampling technique to av oid operations with high complexity . D. The preliminary algorithms W e in vok e two algorithms employing sampling techniques for saving complexity from [21]. They are treated as oracles that outputs certain outcomes with controlled errors in the main algorithm. Lemma 1 and Lemma 2 shows their correct- ness and efficiency . For the sake of conv enience, some minor changes on the algorithms and lemmas are made. 1) T race inner pr oduct estimation: Alg. 1 achiev es calcula- tion of trace inner products with logarithmic time on the sizes of the matrices. Algorithm 1 Trace Inner Product Estimation. Input: 𝐴 ∈ C 𝑚 × 𝑛 that we hav e sampling access in complexity 𝐿 ( 𝐴 ) and 𝐵 ∈ C 𝑛 × 𝑚 that we hav e query access in complexity 𝑄 ( 𝐵 ) . Relati ve error bound 𝜉 and success probability bound 1 − 𝜂 . Goal: Estimate Tr [ 𝐴 𝐵 ] . 1: Repeat step 2 d 6 log 2 ( 2 𝜂 ) e times and take the median of 𝑌 , noted as 𝑍 . 2: Repeat step 3 d 9 𝜉 2 e times and calculate the mean of 𝑋 , noted as 𝑌 . 3: Sample 𝑖 from row norms of 𝐴 . Sample 𝑗 from 𝐴 𝑖 . Let 𝑋 = k 𝐴 k 2 𝐹 𝐴 𝑖 𝑗 𝐵 𝑗 𝑖 . Output: 𝑍 . Lemma 1 [21]. Suppose that we have length-squar e sampling access to 𝐴 ∈ C 𝑚 × 𝑛 and query access to the matrix 𝐵 ∈ C 𝑛 × 𝑚 in comple xity 𝑄 ( 𝐵 ) . Then we can estimate T r [ 𝐴 𝐵 ] to pr ecision 𝜉 k 𝐴 k 𝐹 k 𝐵 k 𝐹 with pr obability at least 1 − 𝜂 in time 𝑂  log ( 1 / 𝜂 ) 𝜉 2 ( 𝐿 ( 𝐴 ) + 𝑄 ( 𝐵 ) )  . Algorithm 2 Rejection sampling. Input: 𝐴 ∈ C 𝑚 × 𝑛 that we hav e length-square sampling access and 𝑏 ∈ C 𝑛 that we have norm access and 𝑦 = 𝐴 𝑏 that we have query access. Goal: Sample from length-square distribution of 𝑦 = 𝐴 𝑏 . 1: T ake 𝐷 ≥ k 𝑏 k 2 . 2: Sample a row index 𝑖 by row norm squares of 𝐴 . 3: Query | 𝑦 𝑖 | 2 = | 𝐴 𝑖 , ∗ 𝑏 | 2 and calculate | 𝐴 𝑖 , ∗ 𝑏 | 2 𝐷 k 𝐴 𝑖 , ∗ k 2 . 4: Sample a real number 𝑥 uniformly distributed in [ 0 , 1 ] . If 𝑥 < | 𝐴 𝑖 , ∗ 𝑏 | 2 𝐷 k 𝐴 𝑖 , ∗ k 2 , output 𝑖 , else, go to step 2. Output: The row index 𝑖 . JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 4 X= -0.25 +0.21 -0.21 +0.30 +0.26 +0.24 +0.06 -0.15 +0.08 -0.06 +0.07 -0.09 -0.09 -0.07 -0.03 +0.04 +0.00 -0.00 -0.01 +0.01 -0.00 +0.00 -0.00 +0.00 -0.05 +0.04 -0.05 +0.05 +0.04 +0.04 +0.01 -0.04 +0.15 -0.13 +0.12 -0.18 -0.17 -0.16 -0.04 +0.08 -0.22 +0.18 -0.19 +0.27 +0.23 +0.22 +0.06 -0.14 -0.08 +0.05 -0.07 +0.08 +0.08 +0.07 +0.02 -0.04 +0.10 -0.09 +0.08 -0.12 -0.10 -0.09 -0.02 +0.05 i 1 i 2 i 4 i 3 j 3 j 2 j 1 sample columns +0.21 -0.25 -0.21 -0.21 -0.06 +0.08 +0.07 +0.07 -0.00 +0.00 -0.01 -0.01 +0.04 -0.05 -0.05 -0.05 -0.13 +0.15 +0.12 +0.12 +0.18 -0.22 -0.19 -0.19 +0.05 -0.08 -0.07 -0.07 -0.09 +0.10 +0.08 +0.08 i 1 i 2 i 3 i 4 renormalization +0.31 -0.31 -0.31 -0.31 -0.10 +0.10 +0.1 1 +0.1 1 -0.00 +0.01 -0.01 -0.01 +0.06 -0.06 -0.07 -0.07 -0.19 +0.19 +0.18 +0.18 +0.28 -0.28 -0.29 -0.29 +0.08 -0.10 -0.10 -0.10 -0.14 +0.13 +0.12 +0.12 X  = = +0.31 -0.31 -0.31 -0.31 -0.10 +0.10 +0.1 1 +0.1 1 -0.00 +0.01 -0.01 -0.01 +0.06 -0.06 -0.07 -0.07 -0.19 +0.19 +0.18 +0.18 +0.28 -0.28 -0.29 -0.29 +0.08 -0.10 -0.10 -0.10 -0.14 +0.13 +0.12 +0.12 sample rows +0.28 -0.28 -0.29 -0.29 +0.31 -0.31 -0.31 -0.31 -0.10 +0.10 +0.1 1 +0.1 1 j 1 j 3 j 2 renormalization +0.29 -0.28 -0.29 -0.29 +0.29 -0.29 -0.29 -0.29 -0.27 +0.28 +0.30 +0.30 X   = Fig. 1. A demonstration of sampling submatrices from matrices (The process described in Def. 3, which is also Step 2 and Step 3 in Alg. 3.). W e sample columns from 𝑋 to get 𝑋 0 and sample ro ws from 𝑋 0 to get 𝑋 0 0 . Note that 𝑋 0 and 𝑋 0 0 are normalized such that E [ 𝑋 0 𝑋 0 𝑇 ] = 𝑋 𝑋 𝑇 and E [ 𝑋 0 0 𝑇 𝑋 0 0 ] = 𝑋 0 𝑇 𝑋 0 . 2) Rejection sampling: Alg. 2 achie ves sampling of a vector that we do not have full query access in time logarithmic of its length. Lemma 2 [21]. Suppose that we have length-squar e sampling access to 𝐴 ∈ C 𝑚 × 𝑛 having normalized r ows, and we ar e given 𝑏 ∈ C 𝑛 . Then we can implement queries to the vector 𝑦 : = 𝐴 𝑏 ∈ C 𝑛 with complexity 𝑄 ( 𝑦 ) = 𝑂 ( 𝑛𝑄 ( 𝐴 ) ) and we can length-squar e sample fr om 𝑞 ( 𝑦 ) with comple xity 𝐿 ( 𝑦 ) such that E [ 𝐿 ( 𝑦 ) ] = 𝑂  𝑛 k 𝑏 k 2 k 𝑦 k 2 ( 𝐿 ( 𝐴 ) + 𝑛𝑄 ( 𝐴 ) )  . I I I . Q U A N T U M - I N SP I R E D S V M A L G O R I T H M W e show the main algorithm (Alg. 3) that makes classifica- tion as the classical SVMs do. Note that actual calculation only happens when we use the expression "calculate" in this algorithm. Otherwise it will lose the exponential-speedup advantage for operations on large vectors or matrices. 𝛾 is temporarily taken as ∞ . Fig. 2 shows the algorithm process. Algorithm 3 Quantum-inspired SVM Algorithm. Input: 𝑚 training data points and their labels { ( 𝑥 𝑗 , 𝑦 𝑗 ) : 𝑥 𝑗 ∈ R 𝑛 , 𝑦 𝑗 = ± 1 } 𝑗 = 1 , . .. , 𝑚 , where 𝑦 𝑗 = ± 1 depending on the class to which 𝑥 𝑗 belongs. Error bound 𝜖 and success probability bound 1 − 𝜂 . 𝛾 set as ∞ . Goal 1: Find ˜ 𝛼 that k ˜ 𝛼 − 𝛼 k ≤ 𝜖 k 𝛼 k with success probability at least 1 − 𝜂 , in which 𝛼 = ( 𝑋 𝑇 𝑋 ) + 𝑦 . Goal 2: For any giv en 𝑥 ∈ R 𝑛 , find its class. 1: Init: Set 𝑟 , 𝑐 as described in (6) and (7). 2: Sample columns: Sample 𝑟 column indices 𝑖 1 , 𝑖 2 , . .. , 𝑖 𝑟 according to the column norm squares k 𝑋 ∗ , 𝑖 k 2 k 𝑋 k 2 𝐹 . Define 𝑋 0 to be the matrix whose 𝑠 -th column is k 𝑋 k 𝐹 √ 𝑟 𝑋 ∗ , 𝑖 𝑠 k 𝑋 ∗ , 𝑖 𝑠 k . Define 𝐴 0 = 𝑋 0 𝑇 𝑋 0 . Algorithm 3 Quantum-inspired SVM Algorithm. 3: Sample rows: Sample 𝑠 ∈ [ 𝑟 ] uniformly , then sample a row index 𝑗 distributed as | 𝑋 0 𝑗 𝑠 | 2 k 𝑋 0 ∗ , 𝑠 k 2 . Sample a total number of 𝑐 row indices 𝑗 1 , 𝑗 2 , . .. , 𝑗 𝑐 this way . Define 𝑋 00 whose 𝑡 -th row is k 𝑋 k 𝐹 √ 𝑐 𝑋 0 𝑗 𝑡 , ∗ k 𝑋 0 𝑗 𝑡 , ∗ k . Define 𝐴 00 = 𝑋 00 𝑇 𝑋 00 . 4: Spectral decomposition: Calculate the spectral decompo- sition of 𝐴 00 . Denote here by 𝐴 00 = 𝑉 00 Σ 2 𝑉 00 𝑇 . Denote the calculated eigenv alues by 𝜎 2 𝑙 , 𝑙 = 1 , . . . , 𝑘 . 5: Appr oximate eigen vectors: Let 𝑅 = 𝑋 0 𝑇 𝑋 . Define ˜ 𝑉 𝑙 = 𝑅 𝑇 𝑉 0 0 𝑙 𝜎 2 𝑙 for 𝑙 = 1 , . . ., 𝑘 , ˜ 𝑉 = ( ˜ 𝑉 𝑙 ) 𝑙 = 1 , .. ., 𝑘 . 6: Estimate matrix elements: Calculate ˜ 𝜆 𝑙 = ˜ 𝑉 𝑇 𝑙 𝑦 to pre- cision 3 𝜖 𝜎 2 𝑙 16 √ 𝑘 k 𝑦 k by Alg. 1, each with success probability 1 − 𝜂 4 𝑘 . Let 𝑢 = Í 𝑘 𝑙 = 1 ˜ 𝜆 𝑙 𝜎 4 𝑙 𝑉 00 𝑙 . 7: Find query access: Find query access of ˜ 𝛼 = ˜ 𝑅 𝑇 𝑢 by ˜ 𝛼 𝑝 = 𝑢 𝑇 ˜ 𝑅 ∗ , 𝑝 , in which ˜ 𝑅 𝑖 𝑗 is calculated to pre- cision 𝜖 𝜅 2 4 k 𝑋 k 𝐹 by Alg. 1, each with success probability 1 − 𝜂 4 d 864 / 𝜖 2 log ( 8 / 𝜂 ) e . 8: Find sign: Calculate 𝑥 𝑇 𝑋 ˜ 𝛼 to precision 𝜖 4 k 𝛼 k k 𝑥 k with success probability 1 − 𝜂 4 by Alg. 1. T ell its sign. Output: The answer class depends on the sign. Positive corresponds to 1 while negativ e to − 1 . The following theorem states the accuracy and time com- plexity of quantum-inspired support vector machine algorithm, from which we conclude the time complexity 𝑇 depends polylogarithmically on 𝑚 , 𝑛 and polynomially on 𝑘 , 𝜅 , 𝜖 , 𝜂 . It is to be proved in section IV and section V. Theorem 1. Given parameters 𝜖 > 0 , 0 < 𝜂 < 1 , and given the data matrix 𝑋 with size 𝑚 × 𝑛 , rank 𝑘 , norm 1 , and condition number 𝜅 , the quantum-inspired SVM algorithm will find the classification expr ession 𝑥 𝑇 𝑋 𝛼 for any vector 𝑥 ∈ C 𝑛 with err or less than 𝜖 𝜅 2 √ 𝑚 k 𝑥 k , success pr obability higher than JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 5 X X 0 X 00 A = X T X A 0 = X 0 T X 0 A 00 = X 00 T X 00 R = X 0 T X σ 2 l , V 00 l ˜ V l = 1 σ 2 l R T V 00 l u = P k l =1 λ l σ 4 l V 00 l ˜ α = R T u ≈ A − 1 y sgn( x T X ˜ α ) n × m n × r c × r m × m r × r r × r Sample Columns Sample Ro ws Step 2 Step 3 r × m Step 4 Step 5 Step 6 Step 8 reduce columns reduce ro ws E [ A 00 ] = A 0 E [ X 0 X 0 T ] = X X T Step 7 Fig. 2. The quantum-inspired SVM algorithm. In the algorithm, the subsampling of 𝐴 is implemented by subsampling the matrix 𝑋 (Step 1-3), which is called the indirect sampling technique. After the indirect sampling, we perform the spectral decomposition (Step 4). Then we estimate the approximation of the eigenv ectors ( ˜ 𝑉 𝑙 ) of 𝐴 (Step 5). Finally , we estimate the classification expression (Step 6-8). 1 − 𝜂 and time complexity 𝑇 ( 𝑚 , 𝑛, 𝑘 , 𝜅 , 𝜖 , 𝜂 ) . 𝑇 = 𝑂 ( 𝑟 log 2 𝑚 + 𝑐𝑟 log 2 𝑛 + 𝑟 3 + k 𝑋 k 2 𝐹 𝑘 2 𝜖 2 log 2 ( 8 𝑘 𝜂 ) ( log 2 ( 𝑚 𝑛 ) + 𝑘 ) + 1 𝜖 2 log 2 1 𝜂 ( log 2 ( 𝑚 𝑛 ) + 𝑟 𝑘 log 2 ( 2 𝜂 1 ) k 𝑋 k 4 𝐹 𝜖 2 1 𝑟 log 2 ( 𝑚 𝑛 ) ) ) , in which 𝜖 1 = 𝜖 k 𝑥 k 2 √ 𝑟 d 36 𝜖 2 e d 6 log 2 ( 16 𝜂 ) e , 𝜂 1 = 𝜂 8 𝑟 d 36 𝜖 2 e d 6 log 2 ( 16 𝜂 ) e . In Alg. 3, 𝛾 is set as ∞ , which makes the coefficient matrix 𝐴 = 𝑋 𝑇 𝑋 . Notice that the eigen vectors of 𝑋 𝑇 𝑋 + 𝛾 − 1 𝐼 and 𝑋 𝑇 𝑋 are the same, and the dif ference of their eigenv alues are 𝛾 − 1 . Thus the algorithm can be easily extended to be applied to the coef ficient matrix 𝑋 𝑇 𝑋 + 𝛾 − 1 𝐼 with arbitrary 𝛾 , by just simply adding 𝛾 − 1 to the calculated eigen values in Step 4. I V . A C C U R A C Y W e prov e that the error of computing the classification expression 𝑥 𝑇 𝑋 ˜ 𝛼 in the quantum-inspired SVM algorithm will not exceed 𝜖 𝜅 2 √ 𝑚 k 𝑥 k . W e take 𝛾 = ∞ in the analysis because adding 𝛾 − 1 to the eigen v alues won’t cause error and thus the analysis is the same in the case of 𝛾 ≠ ∞ . W e first show how to break the total error into multiple parts, and then analyze each part in the subsections. Let 𝛼 = ( 𝑋 𝑇 𝑋 ) + 𝑦 , 𝛼 0 = Í 𝑘 𝑙 = 1 𝜆 𝑙 𝜎 2 𝑙 ˜ 𝑉 𝑙 = ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝑦 , in which 𝜆 𝑙 = ˜ 𝑉 𝑇 𝑙 𝑦 and 𝛼 00 = Í 𝑘 𝑙 = 1 ˜ 𝜆 𝑙 𝜎 2 𝑙 ˜ 𝑉 𝑙 . Then the total error of the classification expression is 1 𝐸 = Δ ( 𝑥 𝑇 𝑋 𝛼 ) ≤ | 𝑥 𝑇 𝑋 ( 𝛼 − ˜ 𝛼 ) | + Δ ( 𝑥 𝑇 𝑋 ˜ 𝛼 ) ≤ k 𝑥 k ( k 𝛼 − 𝛼 0 k + k 𝛼 0 − 𝛼 00 k + k 𝛼 00 − ˜ 𝛼 k ) + Δ ( 𝑥 𝑇 𝑋 ˜ 𝛼 ) Denote 𝐸 1 = k 𝑥 k k 𝛼 0 − 𝛼 k , 𝐸 2 = k 𝑥 k k 𝛼 00 − 𝛼 0 k , 𝐸 3 = k 𝑥 k k ˜ 𝛼 − 𝛼 00 k , 𝐸 4 = Δ ( 𝑥 𝑇 𝑋 ˜ 𝛼 ) . Our target is to show each of them is no more than 𝜖 4 k 𝛼 k k 𝑥 k with probability no less than 1 − 𝜂 4 . So that 𝐸 ≤ 𝐸 1 + 𝐸 2 + 𝐸 3 + 𝐸 4 ≤ 𝜖 𝜅 2 √ 𝑚 k 𝑥 k , with success probability no less than 1 − 𝜂 . 𝐸 1 represents the error introduced by subsampling and eigen vector approximation (i.e., Step 1-5 in Alg. 3). The fact that it is less than 𝜖 4 k 𝛼 k k 𝑥 k with probability no less than 1 − 𝜂 4 is shown in subsection IV -B. 𝐸 2 represents the error introduced by approximation on 𝜆 𝑙 (i.e., Step 6 in Alg. 3). The fact that it is less than 𝜖 4 k 𝛼 k k 𝑥 k with probability no less than 1 − 𝜂 4 is shown in subsection IV -A. 𝐸 3 represents the error introduced in query of 𝑅 and 𝛼 . The fact that it is less than 𝜖 4 k 𝛼 k k 𝑥 k with probability no less than 1 − 𝜂 4 is guaranteed by Step 7 of Alg. 3. 1 For any expression 𝑓 , Δ ( 𝑓 ) represents the difference of the exact value of 𝑓 and the value calculated by the estimation algorithms Alg. 1 and Alg. 3 (These two algorithms cannot get the exact values because randomness is introduced.). JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 6 A A ′ A ′′ ˜ V ˜ V l V ′′ l Thm 2 Thm 2 Thm 6 Thm 4 Thm 3 ˜ V l = 1 σ 2 l R T V ′′ l Eigenv ectors ˜ V = ( ˜ V l ) l =1 ,...,k Fig. 3. The whole procedure of proving k ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴 − 𝐼 𝑚 k ≤ 𝜖 2 . Thm 2 shows the difference among 𝐴 and the subsampling outcomes 𝐴 0 and 𝐴 0 0 . Thm 3 shows the relation between 𝐴 0 and 𝑉 0 0 𝑙 . Thm 4 shows the relation between 𝐴 and ˜ 𝑉 𝑙 . Thm 6 shows the final relation between 𝐴 and ˜ 𝑉 . 𝐸 4 represents the error caused by Alg. 1 in estimating 𝑥 𝑇 𝑋 ˜ 𝛼 as the footnote 1 suggests. The fact that it is less than 𝜖 4 k 𝛼 k k 𝑥 k with probability no less than 1 − 𝜂 4 is guaranteed by Step 8 of Alg. 3. For achie ving accurate classification, we only need a relativ e error 𝐸 𝑥 𝑇 𝑋 𝛼 less than 1 . Thus by lessening 𝜖 , we can achieve this goal in any giv en probability range. A. Pr oof of 𝐸 2 ≤ 𝜖 4 k 𝛼 k k 𝑥 k Notice that 𝐸 3 = k 𝑥 k k 𝛼 − 𝛼 0 k = k 𝑥 k k 𝛼 − ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴𝛼 k ≤ k 𝛼 k k 𝑥 k k ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴 − 𝐼 𝑚 k . Here we put 5 theorems (from 2 to 6) to prove k ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴 − 𝐼 𝑚 k ≤ 𝜖 4 , in which theorem 2 and 5 are inv oked from [21]. W e offer proofs for Theorem 3,4 and 6 in appendix A. The purpose of these theorems is to show that ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 is functionally close to the in verse of matrix A, as k ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴 − 𝐼 𝑚 k ≤ 𝜖 4 suggests. Theorem 2 states the norm distance between 𝐴 , 𝐴 0 and 𝐴 00 . According to the norm distance, and the fact that 𝑉 00 𝑙 are the eigen vectors of 𝐴 00 , Theorem 3 finds the relation between 𝐴 0 and 𝑉 00 𝑙 . W e define ˜ 𝑉 𝑙 = 1 𝜎 2 𝑙 𝑅 𝑇 𝑉 00 𝑙 , and Theorem 6 finally giv es the relation between 𝐴 and ˜ 𝑉 . The procedure is shown in Fig. 3. Theorem 2 [21]. Let 𝑋 0 ∈ C 𝑛 × 𝑟 , 𝑋 00 ∈ C 𝑐 × 𝑟 is the sampling outcome of 𝑋 0 . Suppose 𝑋 00 is normalized that E [ 𝑋 00 𝑇 𝑋 00 ] = 𝑋 0 𝑇 𝑋 0 , then ∀ 𝜖 ∈ [ 0 , k 𝑋 0 k k 𝑋 0 k 𝐹 ] , we have P  k 𝑋 0 𝑇 𝑋 0 − 𝑋 00 𝑇 𝑋 00 k ≥ 𝜖 k 𝑋 0 k k 𝑋 0 k 𝐹  ≤ 2 𝑟 𝑒 − 𝜖 2 𝑐 4 . Hence, for 𝑐 ≥ 4 log 2 ( 2 𝑟 𝜂 ) 𝜖 2 , with pr obability at least 1 − 𝜂 we have k 𝑋 0 𝑇 𝑋 0 − 𝑋 00 𝑇 𝑋 00 k ≤ 𝜖 k 𝑋 0 k k 𝑋 0 k 𝐹 . When a submatrix 𝑋 00 is randomly subsampled from 𝑋 0 , it is a matrix of multiple random variables. Theorem 2 is the Chebyshev’ s Inequality for 𝑋 00 . It points out that the operator norm distance between 𝑋 0 𝑇 𝑋 0 and 𝑋 00 𝑇 𝑋 00 is short with a high probability . Theorem 3. Suppose the columns of matrix 𝑉 00 , denoted as 𝑉 00 𝑙 , 𝑙 = 1 , . . . , 𝑘 , are orthogonal normalized vectors while 𝐴 00 = 𝑘  𝑙 = 1 𝜎 2 𝑙 𝑉 00 𝑙 𝑉 00 𝑇 𝑙 . Suppose k 𝐴 0 − 𝐴 00 k ≤ 𝛽 . Then ∀ 𝑖 , 𝑗 ∈ { 1 , . . ., 𝑟 } , | 𝑉 00 𝑇 𝑖 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ 𝛽 . Theorem 3 points out that if matrix 𝐴 0 and 𝐴 00 are close in operator norm sense, 𝐴 00 ’ s eigenv ectors will approximately work as eigen vectors for 𝐴 0 too. Theorem 4. Suppose the columns of matrix 𝑉 00 , denoted as 𝑉 00 𝑙 , 𝑙 = 1 , . . . , 𝑘 , are orthogonal normalized vectors while | 𝑉 00 𝑇 𝑖 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ 𝛽 , ∀ 𝑖 , 𝑗 ∈ { 1 , . . ., 𝑟 } . Suppose k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0 𝑇 k ≤ 𝜖 0 , k 𝑋 k ≤ 1 , 1 𝜅 ≤ 𝜎 2 𝑖 ≤ 1 and the condition of Thm 3 suffices. Let ˜ 𝑉 𝑙 = 𝑅 𝑇 𝑉 0 0 𝑙 𝜎 2 𝑙 , then | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 | ≤ 𝜅 2 𝛽 2 + 2 𝜅 𝛽 + 𝜅 2 𝜖 0 k 𝑋 k 2 𝐹 , and | ˜ 𝑉 𝑇 𝑖 𝐴 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ ( 2 𝜖 0 + 𝛽 k 𝑋 k 2 𝐹 ) k 𝑋 k 2 𝐹 𝜅 2 . in which 𝐴 0 = 𝑋 0 𝑇 𝑋 0 , 𝐴 = 𝑋 𝑇 𝑋 . Theorem 4 points out that if 𝐴 00 ’ s eigen vectors approxi- mately work as eigen vectors for 𝐴 0 and k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0 𝑇 k ≤ 𝜖 0 , ˜ 𝑉 𝑇 𝑙 approximately work as eigenv ectors for 𝐴 . Theorem 5 [21]. If rank ( 𝐵 ) ≤ 𝑘 , ˜ 𝑉 has 𝑘 columns that spans the r ow and column space of 𝐵 , then k 𝐵 k ≤ k ( ˜ 𝑉 𝑇 ˜ 𝑉 ) + k k ˜ 𝑉 𝑇 𝐵 ˜ 𝑉 k . Under the condition that ˜ 𝑉 𝑇 𝑙 approximately work as eigen- vectors for 𝐴 , the follo wing Theorem 6 points out that ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 is functionally close to the inv erse of matrix A. Theorem 6. If ∀ 𝑖 , 𝑗 ∈ { 1 , . . . , 𝑘 } , | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 | ≤ 1 4 𝑘 , (2) | ˜ 𝑉 𝑇 𝑖 𝐴 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ 𝜁 , and the condition of Thm 4 suffices. Then k ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴 − 𝐼 𝑚 k ≤ 5 3 𝜅 𝑘 𝜁 . T o conclude, for P [ k 𝛼 0 − 𝛼 k > 𝜖 4 k 𝛼 k ] ≤ 𝜂 4 , we need to pick 𝜖 0 and 𝛽 such that 𝜅 2 𝛽 2 + 2 𝜅 𝛽 + 𝜅 2 𝜖 0 k 𝑋 k 2 𝐹 ≤ 1 4 𝑘 , (3) JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 7 ( 2 𝜖 0 + 𝛽 k 𝑋 k 2 𝐹 ) k 𝑋 k 2 𝐹 𝜅 2 ≤ 𝜁 , (4) 5 3 𝜅 𝑘 𝜁 ≤ 𝜖 4 , (5) and decide the sampling parameter as 𝑟 = d 4 log 2 ( 8 𝑛 𝜂 ) 𝜖 0 2 e , (6) 𝑐 = d 4 𝜅 2 log 2 ( 8 𝑟 𝜂 ) 𝛽 2 e . (7) B. Pr oof of 𝐸 1 ≤ 𝜖 4 k 𝛼 k k 𝑥 k Notice that 𝐸 4 = k 𝑥 k k 𝛼 − ˜ 𝛼 k . For 𝑦 = 𝑋 𝑇 𝑋 𝛼 and 𝛼 = 𝑋 + 𝑋 + 𝑇 𝑦 , we have k 𝑦 k ≤ k 𝛼 k ≤ 𝜅 2 k 𝑦 k . For k ˜ 𝛼 − 𝛼 0 k , let 𝑧 be the vector that 𝑧 𝑙 = 𝜆 𝑙 − ˜ 𝜆 𝑙 𝜎 2 𝑙 , we have k ˜ 𝛼 − 𝛼 0 k = k 𝑘  𝑙 = 1 𝜆 𝑙 − ˜ 𝜆 𝑙 𝜎 2 𝑙 ˜ 𝑉 𝑙 k = k ˜ 𝑉 𝑧 k ≤  k ˜ 𝑉 𝑇 ˜ 𝑉 k k 𝑧 k ≤ 4 3 3 𝜖 𝜎 2 𝑙 8 √ 𝑘 k 𝑦 k 1 𝜎 2 𝑙 √ 𝑘 ≤ 1 4 𝜖 k 𝛼 k . in which k ˜ 𝑉 𝑇 ˜ 𝑉 k ≤ 4 3 as shown in proof of theorem 6. V . C O M P L E X I T Y In this section, we will analyze the time complexity of each step in the main algorithm.W e divide these steps into four parts and analyze each part in each subsection: Step 1- 3 are considered in Subsection V -A. Step 4 is considered in Subsection V -B. Step 5-6 are considered in Subsection V -C. Step 7-8 are considered in Subsection V -D. Note that in the main algorithm the variables 𝑅 , ˜ 𝑉 𝑙 , ˜ 𝛼 are queried instead of calculated. W e include the corresponding query complexity in analysis of the steps where we queried these variables. A. Sampling of columns and r ows In Step 1, the value of 𝑟 and 𝑐 are determined according to Inequalities (3,4,5,6,7). The time of solving these inequalities is a constant. In Step 2 we sample 𝑟 indices, each sampling takes no more than log 2 𝑚 time according to the arborescent vector data structure shown in II-C. In Step 3 we sample 𝑐 indices, each sampling takes no more than 𝑟 log 2 𝑛 time according to the arborescent matrix data structure shown in II-C. Thus the overall time complexity of Step 1-3 is 𝑂 ( 𝑟 log 2 𝑚 + 𝑐𝑟 log 2 𝑛 ) . B. The spectral decomposition Step 4 is the spectral decomposition. For 𝑟 × 𝑟 symmet- ric matrix 𝐴 , the fastest classical spectral decomposition is through classical spectral symmetric QR method, of which the complexity is 𝑂 ( 𝑟 3 ) . C. Calculation of ˜ 𝜆 𝑙 In Step 5-6 we calculate ˜ 𝜆 𝑙 . By Alg. 1, we have 𝜆 𝑙 = 1 𝜎 2 𝑙 𝑉 00 𝑇 𝑙 𝑅 𝑦 = 1 𝜎 2 𝑙 T r [ 𝑉 00 𝑇 𝑙 𝑋 0 𝑇 𝑋 𝑦 ] = 1 𝜎 2 𝑙 T r [ 𝑋 𝑦𝑉 00 𝑇 𝑙 𝑋 0 𝑇 ] . Observe that k 𝑦𝑉 00 𝑇 𝑙 𝑋 0 𝑇 k 𝐹 = k 𝑦 k k 𝑉 00 𝑇 𝑙 𝑋 0 𝑇 k ≤ k 𝑦 k , and we can query the ( 𝑖, 𝑗 ) matrix element of 𝑦𝑉 00 𝑇 𝑙 𝑋 0 𝑇 in cost 𝑂 ( 𝑟 ) . According to Lemma 1, the complexity in step 6 is 𝑇 6 = 𝑂 ( k 𝑋 k 2 𝐹 𝑘 2 𝜖 2 log 2 ( 8 𝑘 𝜂 ) ( log 2 ( 𝑚 𝑛 ) + 𝑘 ) ) . D. Calculation of 𝑥 𝑇 𝑋 ˜ 𝛼 In Step 7-8 we calculate 𝑥 𝑇 𝑋 ˜ 𝛼 . Calculation of 𝑥 𝑇 𝑋 ˜ 𝛼 is the last step of the algorithm, and also the most important step for saving time complexity . In Step 8 of Alg. 3, we need to calculate 𝑥 𝑇 𝑋 ˜ 𝛼 , which is equal to Tr [ 𝑋 ˜ 𝛼 𝑥 𝑇 ] , with precision 𝜖 k 𝛼 k k 𝑥 k and success probability 1 − 𝜂 4 using Alg. 1. Let the 𝐴 and 𝐵 in Alg. 1 be 𝑋 and ˜ 𝛼 𝑥 𝑇 , respectiv ely . T o calculate T r [ 𝑋 ˜ 𝛼𝑥 𝑇 ] , we first establish the query access for ˜ 𝛼 𝑥 𝑇 (we already have the sampling access of 𝑋 ), and then using the Alg. 1 as an oracle. W e first analyze the time complexity of querying 𝑅 and ˜ 𝛼 , and then provide the time complexity of calculating 𝑥 𝑇 𝑋 ˜ 𝛼 : 1) Query of 𝑅 : First we find query access of 𝑅 = 𝑋 0 𝑇 𝑋 . For any 𝑠 = 1 , . . ., 𝑟 , 𝑗 = 1 , . . ., 𝑚 , 𝑅 𝑠 𝑗 = 𝑒 𝑇 𝑠 𝑋 0 𝑇 𝑋 𝑒 𝑗 = T r [ 𝑋 𝑒 𝑗 𝑒 𝑇 𝑠 𝑋 0 𝑇 ] , we calculate such trace by Alg. 1 to precision 𝜖 1 with success probability 1 − 𝜂 1 . The time complexity for one query will be 𝑄 ( 𝑅 ) = 𝑂 ( log 2 ( 2 𝜂 1 ) k 𝑋 k 4 𝐹 𝜖 2 1 𝑟 log 2 ( 𝑚 𝑛 ) ) . 2) Query of ˜ 𝛼 : F or any 𝑖 = 1 , . . ., 𝑚 , we hav e ˜ 𝛼 𝑗 = Í 𝑟 𝑠 = 1 𝑅 𝑠 𝑗 𝑢 𝑠 . One query of ˜ 𝛼 will cost time 𝑟 𝑘 𝑄 ( 𝑅 ) , with error 𝜖 1 Í 𝑟 𝑠 = 1 | 𝑢 𝑠 | and success probability more than 1 − 𝑟 𝜂 1 . 3) Calculation of 𝑥 𝑇 𝑋 ˜ 𝛼 : W e use Alg. 1 to calculate 𝑥 𝑇 𝑋 ˜ 𝛼 = T r [ 𝑋 ˜ 𝛼 𝑥 𝑇 ] to precision 𝜖 2 k 𝛼 k k 𝑥 k with success probability 1 − 𝜂 8 . Notice the query of ˜ 𝛼 is with error and success probability . W e only need 𝜖 1 𝑟  𝑠 = 1 | 𝑢 𝑠 | d 36 𝜖 2 e d 6 log 2 ( 16 𝜂 ) e ≤ 𝜖 2 k 𝛼 k k 𝑥 k , 𝑟 𝜂 1 d 36 𝜖 2 e d 6 log 2 ( 16 𝜂 ) e ≤ 𝜂 8 to fulfill the overall computing task. Notice Í 𝑟 𝑠 = 1 | 𝑢 𝑠 | ≤ √ 𝑟 k 𝑢 k and 𝛼 = 𝑅 𝑇 𝑢 W e set 𝜖 1 = 𝜖 k 𝑥 k 2 √ 𝑟 d 36 𝜖 2 e d 6 log 2 ( 16 𝜂 ) e , JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 8 𝜂 1 = 𝜂 8 𝑟 d 36 𝜖 2 e d 6 log 2 ( 16 𝜂 ) e . And the overall time complexity for computing 𝑥 𝑇 𝑋 ˜ 𝛼 is 𝑇 7 = 𝑂 ( 1 𝜖 2 log 2 1 𝜂 ( log 2 ( 𝑚 𝑛 ) + 𝑟 𝑘 𝑄 ( 𝑅 ) ) ) = 𝑂 ( 1 𝜖 2 log 2 1 𝜂 ( log 2 ( 𝑚 𝑛 ) + 𝑟 𝑘 log 2 ( 2 𝜂 1 ) k 𝑋 k 4 𝐹 𝜖 2 1 𝑟 log 2 ( 𝑚 𝑛 ) ) ) . V I . E X P E R I M E N T S In this section, we demonstrate the proposed quantum- inspired SVM algorithm in practice by testing the algorithm on artificial datasets. The feasibility and efficienc y of some other quantum-inspired algorithms (quantum-inspired algorithms for recommendation systems and linear systems of equations) on large datasets has been benchmarked, and the results indicate that quantum-inspired algorithms can perform well in practice under its specific condition: low rank, low condition number, and very large dimension of the input matrix [24]. Here we show the feasibility of the quantum-inspired SVM. Firstly , we test the quantum-inspired SVM algorithm on low-rank and low-rank approximated datasets and compare it to an exist- ing classical SVM implementation. Secondly , we discuss the characteristics of the algorithm by analyzing its dependence on the parameters and datasets. In our experiment, we use the arborescent data structure instead of arrays for storage and sampling [24], making the experiment conducted in a more real scenario compared to the previous work [24]. All algorithms are implemented in Julia [32]. The source code and data are av ailable at https://github.com/helloinrm/qisvm. A. Experiment I: Comparison with LIBSVM In this experiment, we test quantum-inspired SVM algo- rithm on large datasets and compare its performance to the well-known classical SVM implementation LIBSVM [33]. W e generate datasets of size 10000 × 11000 , which represent 11000 vectors (6000 vectors for training and 5000 vectors for testing) with length 10000. All the data vectors in training and testing sets are chosen uniformly at random from the generated data matrix, so that they are statistically independent and identically distributed. W e test quantum-inspired SVM and LIBSVM on two kinds of datasets: low-rank datasets (rank = 1 ) and high-rank but low-rank approximated datasets (rank = 10000 ). Each scenario is repeated for 5 times. The con- struction method for data matrices is described in Appendix B. And the parameters for quantum-inspired SVM are choosen as 𝜖 = 5 , 𝜂 = 0 . 1 and 𝑏 = 1 (W e explain the parameters and their setting in Experiment II.). The average classification rates are sho wn in T able II, from which we observe the advantage of quantum-inspired SVM on such low-rank approximated datasets (on av erage about 5% higher). W e also find that both quantum-inspired SVM and LIBSVM performs better on low-rank datasets than low-rank approximated datasets. B. Experiment II: Discussion on algorithm parameters As analyzed in Section IV and Section V, there are two main parameters for the quantum-inspired algorithm: relative error 𝜖 and success probability 1 − 𝜂 . Based on them we set subsampling size 𝑟 , 𝑐 and run the algorithm. Ho wev er , for datasets that are not large enough, setting 𝑟 , 𝑐 by Equation (6) and Equation (7) is rather time costly . For instance, when the condition number of data matrix is 1.0, taking 𝜂 = 0 . 1 and 𝜖 = 5 . 0 , theoretically , the 𝑟 , 𝑐 for 10000 × 10000 dataset should be set as 1656 and 259973 to assure that the algorithm calculates the classification expression with relativ e error less than 𝜖 and success probability higher than 1 − 𝜂 . For practical applications of not too large datasets, we set 𝑟 , 𝑐 as 𝑟 = 𝑏 d 4 log 2 ( 2 𝑛 / 𝜂 ) / 𝜖 2 e and 𝑐 = 𝑏 d 4 log 2 ( 2 𝑟 / 𝜂 ) / 𝜖 2 e , in which 𝑏 is the subsampling size control parameter . When 𝑏 = 1 , our practical choice of 𝑟 , 𝑐 assures the relati ve error of subsampling (Step 2 and Step 3 in Alg. 3) won’t exceed 𝜖 (guaranteed by Theorem 2). In Experiment I, we took the practical setting of 𝑟 , 𝑐 , where we already found advantage compared to LIBSVM. Our choice of 𝜖 , 𝜂 and 𝑏 is 𝜖 = 5 , 𝜂 = 0 . 1 and 𝑏 = 1 . Here, we test the algorithm on other choices of 𝜖 , 𝜂 and 𝑏 and check the classification rate of the algorithm. W e test each parameter choice for 50 times. The variation intervals of each parameter are 𝜖 from 1 to 10, 𝜂 from 0.1 to 1, and 𝑏 from 1 to 10. The results are shown in Fig. 4. W e find the av erage classification rates of the algorithm in each e xperiment are close. W e notice when using the practical 𝑟 , 𝑐 , which are much smaller than the theoretical ones, the algorithm maintains its performance (classification rate around 0.90). This phenomenon indicates a gap between our theoretical analysis and the actual performance, as [24] reports “the performance of these algorithms is better than the theoretical complexity bounds would suggest”. V I I . D I S C U S S I O N In this section, we will present some discussions on the proposed algorithm. And we will also discuss the potential applications of our techniques to other types of SVMs, such as non-linear SVM and least square SVM, but more works on the complexity and errors are required in future work if we want to realize these extensions. A. The cause of exponential speedup An interesting fact is that we can achieve exponential speedup without using any quantum resources, such as super- position or entanglement. This is a somewhat confusing but reasonable result that can be understood as follows: Firstly , the advantage of quantum algorithms, such as HHL algorithm, is that high-dimensional vectors can be represented using only a few qubits. By replacing qRAM to the arborescent data structure for sampling, we can also represent the low-rank matrices by its normalized submatrix in a short time. By using the technique of sampling, large-size calculations are av oided, and we only need to deal with the problem that has the logarithmic size of the original data. Secondly , the relativ e error of matrix subsampling algorithm is minus-exponential on the matrix size, which ensures the effecti veness of such JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 9 T ABLE II T H E AVE R AG E V A L U ES A ND S T A N DA RD D EV I A T I O NS O F C L A S SI FI C A T I ON R A T ES ( %) O F Q I S V M A N D L I B S VM I N FI V E E X P E RI M E N TS . T esting Set T raining Set qiSVM LIBSVM qiSVM LIBSVM Low-rank 91.45 ± 3.17 86.46 ± 2.00 91.35 ± 3.64 86.45 ± 2.15 Low-rank approximated 89.82 ± 4.38 84.90 ± 3.20 89.92 ± 4.23 84.69 ± 2.87 1 2 3 4 5 6 7 8 9 1 0 0 . 6 0 . 8 1 . 0 a v e r a g e c l a s s i f i c a t i o n r a t e ε (a) 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 0 . 6 0 . 8 1 . 0 a v e r a g e c l a s s i f i c a t i o n r a t e η (b) 1 2 3 4 5 6 7 8 9 1 0 0 . 6 0 . 8 1 . 0 a v e r a g e c l a s s i f i c a t i o n r a t e s u b s a m p l i n g s i z e (c) Fig. 4. The av erage classification rate of quantum-inspired SVM algorithm with different parameters on the dataset with rank 1. Each point represents an av erage classification rate for 50 trials, and the error bar shows the standard deviation of the 50 trials. (a) Algorithm performance when the parameter 𝜖 is taken from 1 to 10. (b) Algorithm performance when the parameter 𝜂 is taken from 0.1 to 1. (c) Algorithm performance when the parameter 𝑏 is taken from 1 to 10. logarithmic-complexity algorithm (e.g. Theorem 2 shows the error of matrix row subsampling). B. Impr oving sampling for dot pr oduct Remember in Alg. 1 we can estimate dot products for two vectors. Howe ver , it does not work well for all the conditions, like when k 𝑥 k and k 𝑦 k are donminated by one element. For randomness, [34] implies that we can apply a spherically random rotation 𝑅 to all 𝑥 , which does not change the kernel matrix 𝐾 , but will make all the elements in the dataset matrix be in a same distribution. C. LS-SVM with non-linear kernels In Section II, we hav e considered the LS-SVM with the linear kernel 𝐾 = 𝑋 𝑇 𝑋 . When data sets are not linear separable, non-linear kernels are usually needed. T o deal with non-linear kernels with Alg. 3, we only have to show how to establish sampling access for the non-linear kernel matrix 𝐾 from the sampling access of 𝑋 . W e first show ho w the sampling access of polynomial k ernel 𝐾 𝑝 ( 𝑥 𝑖 , 𝑥 𝑗 ) = ( 𝑥 𝑇 𝑗 𝑥 𝑖 ) 𝑝 can be established. The corresponding kernel matrix is 𝐾 𝑝 = ( ( 𝑥 𝑇 𝑗 𝑥 𝑖 ) 𝑝 ) 𝑖 = 1 , .. ., 𝑚, 𝑗 = 1 , .. . ,𝑚 . W e take 𝑍 = ( 𝑥 ⊗ 𝑝 1 , 𝑥 ⊗ 𝑝 2 , . .. , 𝑥 ⊗ 𝑝 𝑚 ) , in which the 𝑗 -column 𝑍 𝑗 is the 𝑝 -th tensor power of 𝑥 𝑗 . Notice that 𝑍 𝑇 𝑍 = 𝐾 𝑝 . Once we hav e sampling access of 𝑍 , we can sample 𝐾 𝑝 as Step 2 and Step 3 in Alg. 3 do. The sampling access of 𝑍 can be established by (The effecti veness of Alg. 4 is shown in Appendix C.): Algorithm 4 Polynomial kernel matrices sampling. Input: The sampling access of 𝑋 in logarithmic time of 𝑚 and 𝑛 . Goal: Sample a column index 𝑗 from the column norm vector ( k 𝑥 1 k 𝑝 , k 𝑥 2 k 𝑝 , . . . , k 𝑥 𝑚 k 𝑝 ) of 𝑍 , and them sample a row index 𝑖 from column 𝑥 ⊗ 𝑝 𝑗 of 𝑍 . 1: Sample on column norm vector ( k 𝑥 1 k , k 𝑥 2 k , . . . , k 𝑥 𝑚 k ) of 𝑋 to get index 𝑗 . 2: Query k 𝑥 𝑗 k from ( k 𝑥 1 k , k 𝑥 2 k , . . . , k 𝑥 𝑚 k ) . Calculate k 𝑥 𝑗 k 𝑝 . 3: Sample a real number 𝑎 uniformly distributed in [ 0 , 1 ] . If 𝑎 ≥ k 𝑥 𝑗 k 𝑝 , go to Step 1. If not, output index 𝑗 as the column index and continue. Algorithm 4 Polynomial kernel matrices sampling. 4: Repeat sampling on 𝑥 𝑗 for 𝑝 times. Denote the outcome indices as 𝑖 1 , 𝑖 2 , . . . , 𝑖 𝑝 . Output: Column index 𝑗 and row index Í 𝑝 𝜏 = 1 ( 𝑖 𝜏 − 1 ) 𝑛 𝑝 − 𝜏 + 1 . For general non-linear kernels, we note that they can always be approximated by linear combination of polynomial kernels (and thus can be sampled based on sampling access of poly- nomial kernels) the corresponding non-linear feature function is continous. For instance, the popular radial basis function (RBF) kernel 𝐾 RBF ( 𝑥 𝑖 , 𝑥 𝑗 ) = exp ( − k 𝑥 𝑖 − 𝑥 𝑗 k 2 2 𝜎 2 ) can be approximated by ˜ 𝐾 RBF ( 𝑥 𝑖 , 𝑥 𝑗 ) = 𝑁  𝑝 = 0 1 𝑝 ! − 𝑥 𝑇 𝑖 𝑥 𝑖 − 2 𝑥 𝑇 𝑗 𝑥 𝑖 + 𝑥 𝑇 𝑗 𝑥 𝑗 2 𝜎 2 ! 𝑝 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 10 = 𝑁  𝑝 = 0  − 1 2 𝜎 2  𝑝 𝑝  𝑞 ,𝑙 = 0  𝑝 𝑞  𝐾 𝑞 ( 𝑥 𝑖 , 𝑥 𝑖 ) + ( − 2 ) 𝑙  𝑝 𝑙  𝐾 𝑙 ( 𝑥 𝑖 , 𝑥 𝑗 ) +  𝑝 𝑝 − 𝑞 − 𝑙  𝐾 𝑝 − 𝑞 − 𝑙 ( 𝑥 𝑗 , 𝑥 𝑗 ) . D. General LS-SVM In the former sections, we began with a LS-SVM with 𝑏 = 0 and linear kernels in Section II. And we showed how the method can be extended to nonlinear kernels in Section VII-C. Finally , we deal with the last assumption 𝑏 = 0 . W e show how a general LS-SVM can be tackled using techniques alike in Alg. 3: A general LS-SVM equation [26] is  0 1 𝑇 1 𝐾 + 𝛾 − 1 𝐼   𝑏 𝛼  =  0 𝑦  , (8) in which 𝐾 is the kernel matrix. Equation (8) can be solved as follows: (i) Firstly , by methods in Section VII-C, we establish the sampling access for kernel matrix 𝐾 . Suppose a sampling outcome of 𝐾 is 𝐾 00 . (ii) Secondly , take 𝐴 =  0 1 𝑇 1 𝐾 + 𝛾 − 1 𝐼  . and 𝐴 00 =  0 1 𝑇 1 𝐾 00 + 𝛾 − 1 𝐼  . W e establish the eigen relations between 𝐴 and 𝐴 00 by theo- rems which are similar to Theorem 2 and Theorem 4. (iii) Once 𝐴 ∈ R 𝑚 × 𝑚 is subsampled to 𝐴 00 ∈ R 𝑟 × 𝑟 , we can continue Step 3-7 of Alg. 3. (iv) Once Equation (8) is solved in Step 7 of Alg. 3, which means we can establish the query access for 𝛼 . According to Equation 8, 𝑏 = 𝑦 𝑗 − 𝑥 𝑇 𝑗 𝑋 𝛼 − 𝛾 − 1 𝛼 𝑗 for any 𝑗 such that 𝛼 𝑗 ≠ 0 . W e can then ev aluate the classification expression 𝑦 𝑗 + ( 𝑥 − 𝑥 𝑗 ) 𝑇 𝑋 𝛼 − 𝛾 − 1 𝛼 𝑗 and make classification using Alg. 1. There are two ways to find 𝑗 : One is ex ecuting the rejection sampling on 𝛼 using Alg. 2. The other is checking if 𝛼 𝑗 = 0 after each sampling of 𝑋 in Step 3 of Alg. 1. V I I I . C O N C L U S I O N W e hav e proposed a quantum-inspired SVM algorithm that achiev es exponential speedup ov er the previous classical al- gorithms. The feasibility of the proposed algorithm is demon- strated by experiments. Our algorithm works well on low-rank datasets or datasets that can be well approximated by low-rank matrices, which is similar with quantum SVM algorithm [31] as "when a low-rank approximation is appropriate". Certain in vestigations on the application of such an algorithm are required to make quantum-inspired SVM operable in solving questions like face recognition [25] and signal processing [35]. W e hope that the techniques dev eloped in our work can lead to the emergence of more efficient classical algorithms, such as applying our method to support vector machines with more complex kernels [26], [36] or other machine learning algorithms. The technique of indirect sampling can expand the application area of fast sampling techniques. And it will make contribution to the further competition between classical algorithms and quantum ones. Some improv ements on our work would be made in the future, such as reducing the conditions on the data matrix, further reducing the complexity , and tighten the error bounds in the theoretical analysis, which can be achieved through a deeper in vestigation on the algorithm and the error propaga- tion process. The in vestigation of quantum-inspired non-linear SVMs and least squares SVM as discussed in Section VII also requires theoretical analysis and empirical ev aluations. W e note that our work, as well as the previous quantum- inspired algorithms, are not intended to demonstrate that quantum computing is uncompetiti ve. W e want to find out where the boundaries of classical and quantum computing are, and we expect ne w quantum algorithms to be de veloped to beat our algorithm. A P P E N D I X A P RO O F O F T H E O R E M S I N I V A. Pr oof of Theorem 3 Pr oof: W e break the expression | 𝑉 00 𝑇 𝑖 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | into two parts, | 𝑉 00 𝑇 𝑖 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 00 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | . For the first item, because of the condition k 𝐴 0 − 𝐴 00 k ≤ 𝛽 and 𝑉 00 𝑗 are normalized, | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 | ≤ k 𝑉 00 𝑇 𝑖 k · k ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 k ≤ 𝛽 . For the second item, because of the condition 𝐴 00 = Í 𝑘 𝑙 = 1 𝜎 2 𝑙 𝑉 00 𝑙 𝑉 00 𝑇 𝑙 , | 𝑉 00 𝑇 𝑖 𝐴 00 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | = 0 . In all, | 𝑉 00 𝑇 𝑖 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ 𝛽 . The description above can be written in short as follows: | 𝑉 00 𝑇 𝑖 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 00 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | ≤ k 𝑉 00 𝑇 𝑖 k · k ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 k ≤ 𝛽 . B. Pr oof of Theorem 4 Pr oof: Denote | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 | as Δ 1 , | ˜ 𝑉 𝑇 𝑖 𝐴 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 𝜎 2 𝑖 | as Δ 2 . By definition, ˜ 𝑉 𝑙 = 1 𝜎 2 𝑙 𝑅 𝑇 𝑉 00 𝑙 . Thus Δ 1 = | 𝑉 00 𝑇 𝑖 𝑅 𝑅 𝑇 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 4 𝑖 𝜎 2 𝑖 𝜎 2 𝑗 | . W e break it into two parts: Δ 1 ≤ 1 𝜎 2 𝑖 𝜎 2 𝑗  | 𝑉 00 𝑇 𝑖 𝐴 0 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 4 𝑖 | + | 𝑉 00 𝑇 𝑖 ( 𝑅 𝑅 𝑇 − 𝐴 0 𝐴 0 ) 𝑉 00 𝑗 |  . JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 11 For the first item, we hav e | 𝑉 00 𝑇 𝑖 𝐴 0 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 4 𝑖 | = | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 2 𝑉 00 𝑗 + 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 𝐴 0 𝑉 00 𝑗 + 𝑉 00 𝑇 𝑖 𝐴 0 ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 + 𝑉 00 𝑇 𝑖 𝐴 00 𝐴 00 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 4 𝑖 | ≤ | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 2 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 𝐴 0 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 0 ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 00 𝐴 00 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 4 𝑖 | ≤ 𝛽 2 + 𝜎 2 𝑗 𝛽 + 𝜎 2 𝑖 𝛽 . The last step abov e used the same technique as the proof of Thm 3. For the second item, we hav e | 𝑉 00 𝑇 𝑖 ( 𝑅 𝑅 𝑇 − 𝐴 0 𝐴 0 ) 𝑉 00 𝑗 | ≤ k 𝑅 𝑅 𝑇 − 𝐴 0 𝐴 0 k = k 𝑋 0 𝑇 𝑋 𝑋 𝑇 𝑋 0 − 𝑋 0 𝑇 𝑋 0 𝑋 0 𝑇 𝑋 0 k ≤ k 𝑋 0 k 2 k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0 𝑇 k . Because k 𝑋 0 k ≤ k 𝑋 0 k 𝐹 = k 𝑋 k 𝐹 , we have | 𝑉 00 𝑇 𝑖 ( 𝑅 𝑅 𝑇 − 𝐴 0 𝐴 0 ) 𝑉 00 𝑗 | ≤ 𝜖 0 k 𝑋 k 2 𝐹 . In all, due to 𝜎 𝑖 ≥ 𝜅 ∀ 𝑖 ∈ { 1 , . . . , 𝑘 } , Δ 1 ≤ 1 𝜎 2 𝑖 𝜎 2 𝑗 ( 𝛽 2 + 𝜎 2 𝑗 𝛽 + 𝜎 2 𝑖 𝛽 + 𝜖 0 k 𝑋 k 2 𝐹 ) ≤ 𝜅 2 𝛽 2 + 2 𝜅 𝛽 + 𝜅 2 𝜖 0 k 𝑋 k 2 𝐹 . By definition, ˜ 𝑉 𝑙 = 1 𝜎 2 𝑙 𝑅 𝑇 𝑉 00 𝑙 . Thus Δ 2 = | 𝑉 00 𝑇 𝑖 𝑅 𝐴 𝑅 𝑇 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 6 𝑖 𝜎 2 𝑖 𝜎 2 𝑗 | . W e break it into two parts: Δ 2 ≤ 1 𝜎 2 𝑖 𝜎 2 𝑗 ( | 𝑉 00 𝑇 𝑖 ( 𝑅 𝐴 𝑅 𝑇 − 𝐴 0 𝐴 0 𝐴 0 ) 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 0 𝐴 0 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 6 𝑖 | ) For the first item, we hav e | 𝑉 00 𝑇 𝑖 ( 𝑅 𝐴 𝑅 𝑇 − 𝐴 0 𝐴 0 𝐴 0 ) 𝑉 00 𝑗 | ≤ k 𝑅 𝐴 𝑅 𝑇 − 𝐴 0 𝐴 0 𝐴 0 k ≤ k 𝑋 0 k 2 k 𝑋 𝑋 𝑇 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0 𝑇 𝑋 0 𝑋 0 𝑇 k ≤ k 𝑋 k 2 𝐹 ( k 𝑋 𝑋 𝑇 ( 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0 𝑇 ) k + k ( 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0 𝑇 ) 𝑋 0 𝑋 0 𝑇 k ) ≤ 2 k 𝑋 k 2 𝐹 k 𝑋 k 2 k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0 𝑇 k ≤ 2 k 𝑋 k 2 𝐹 𝜖 0 . For the second item, we hav e | 𝑉 00 𝑇 𝑖 𝐴 0 𝐴 0 𝐴 0 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 6 𝑖 | = | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 𝐴 0 𝐴 0 𝑉 00 𝑗 + 𝑉 00 𝑇 𝑖 𝐴 00 ( 𝐴 0 − 𝐴 00 ) 𝐴 0 𝑉 00 𝑗 + 𝑉 00 𝑇 𝑖 𝐴 00 𝐴 00 ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 + 𝑉 00 𝑇 𝑖 𝐴 00 𝐴 00 𝐴 00 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 6 𝑖 | ≤ | 𝑉 00 𝑇 𝑖 ( 𝐴 0 − 𝐴 00 ) 𝐴 0 𝐴 0 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 00 ( 𝐴 0 − 𝐴 00 ) 𝐴 0 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 00 𝐴 00 ( 𝐴 0 − 𝐴 00 ) 𝑉 00 𝑗 | + | 𝑉 00 𝑇 𝑖 𝐴 00 𝐴 00 𝐴 00 𝑉 00 𝑗 − 𝛿 𝑖 𝑗 𝜎 6 𝑖 | ≤ k ( 𝐴 0 − 𝐴 00 ) 𝐴 0 𝐴 0 k + k 𝐴 00 ( 𝐴 0 − 𝐴 00 ) 𝐴 0 k + k 𝐴 00 𝐴 00 ( 𝐴 0 − 𝐴 00 ) k ≤ k 𝑋 0 k 4 k 𝐴 0 − 𝐴 00 k + k 𝑋 00 k 2 k 𝑋 0 k 2 k 𝐴 0 − 𝐴 00 k + k 𝑋 00 k 4 k 𝐴 0 − 𝐴 00 k ≤ 𝛽 k 𝑋 k 4 𝐹 . In all, Δ 2 ≤ 1 𝜎 2 𝑖 𝜎 2 𝑗 ( 2 k 𝑋 k 2 𝐹 𝜖 0 + 𝛽 k 𝑋 k 4 𝐹 ) ≤ ( 2 𝜖 0 + 𝛽 k 𝑋 k 2 𝐹 ) k 𝑋 k 2 𝐹 𝜅 2 . C. Pr oof of Theorem 6 Pr oof: F or ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 are elements of ˜ 𝑉 𝑇 ˜ 𝑉 − 𝐼 and | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 | ≤ 1 4 𝑘 , k ˜ 𝑉 𝑇 ˜ 𝑉 − 𝐼 k ≤ 𝑘 max { | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 − 𝛿 𝑖 𝑗 | } ≤ 1 4 . Thus k ˜ 𝑉 𝑇 ˜ 𝑉 k is inv ertible and k ( ˜ 𝑉 𝑇 ˜ 𝑉 ) − 1 k = 1 / k ˜ 𝑉 𝑇 ˜ 𝑉 k ≤ 1 / ( 1 − k ˜ 𝑉 𝑇 ˜ 𝑉 − 𝐼 k ) = 4 3 . T ake 𝐵 = ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴 − 𝐼 𝑚 , we have | ˜ 𝑉 𝑇 𝑖 𝐵 ˜ 𝑉 𝑗 | = | 𝑘  𝑙 = 1 ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑙 · ˜ 𝑉 𝑇 𝑙 𝐴 ˜ 𝑉 𝑗 𝜎 2 𝑙 − ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 | . W e break it into two parts: | ˜ 𝑉 𝑇 𝑖 𝐵 ˜ 𝑉 𝑗 | ≤ | 𝑘  𝑙 = 1 ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑙 𝜎 2 𝑙 ( ˜ 𝑉 𝑇 𝑙 𝐴 ˜ 𝑉 𝑗 − 𝛿 𝑙 𝑗 𝜎 2 𝑙 ) | + | 𝑘  𝑙 = 1 ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑙 𝛿 𝑙 𝑗 − ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 | . The second item is zero because | 𝑘  𝑙 = 1 ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑙 𝛿 𝑙 𝑗 − ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 | = | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 − ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑗 | . The first item | 𝑘  𝑙 = 1 ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑙 𝜎 2 𝑙 ( ˜ 𝑉 𝑇 𝑙 𝐴 ˜ 𝑉 𝑗 − 𝛿 𝑙 𝑗 𝜎 2 𝑙 ) | ≤ 𝜁 𝜅 | 𝑘  𝑙 = 1 ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑙 | ≤ 𝜁 𝜅 (  𝑙 ≠ 𝑖 | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑙 | + | ˜ 𝑉 𝑇 𝑖 ˜ 𝑉 𝑖 | ) ≤ 𝜁 𝜅 ( ( 𝑘 − 1 ) 1 4 𝑘 + ( 1 4 𝑘 + 1 ) ) ≤ 5 4 𝜁 𝜅 . Thus | ˜ 𝑉 𝑇 𝑖 𝐵 ˜ 𝑉 𝑗 | ≤ 5 4 𝜁 𝜅 and k ˜ 𝑉 𝑇 𝐵 ˜ 𝑉 k ≤ 5 4 𝜁 𝜅 𝑘 . By Theorem 5, k ˜ 𝑉 Σ − 2 ˜ 𝑉 𝑇 𝐴 − 𝐼 𝑚 k = k 𝐵 k ≤ k ( ˜ 𝑉 𝑇 ˜ 𝑉 ) − 1 k k ˜ 𝑉 𝑇 𝐵 ˜ 𝑉 k ≤ 5 3 𝜅 𝑘 𝜁 . JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 12 A P P E N D I X B T H E C O N S T RU C T I O N M E T H O D O F DAT A S E T S In our experiment, we constructed artificial datasets which are low-rank or can be low-rank approximated. Here we put up our construction mehtod: 1. Firstly , we multiply a random matrix 𝐴 of size 𝑛 × 𝑘 with another random matrix 𝐵 of size 𝑘 × 𝑚 . The elements in both of them are e venly distributed in [ − 0 . 5 , 0 . 5 ] . Denote the multiplication outcome as 𝑋 . Then the rank of 𝑋 is at most 𝑘 . 2. W e add turbulence to the matrix 𝑋 by adding a random number ev enly distributed in [ − 0 . 1 𝑥 , 0 . 1 𝑥 ] to all the elements in 𝑋 , in which 𝑥 is the average of all the absolute values of 𝑋 . After adding turbulence, 𝑋 is no more low-rank but still low-rank approximated. 3. W e normalize 𝑋 such that 𝑋 has operator norm 1. 4. W e di vide the column vectors of 𝑋 into two classes by a random hyperplane 𝑤 𝑇 𝑥 = 0 that passes the origin (By random hyperplane we mean the elements in 𝑤 are uniformly sampled from [ 0 , 1 ] at random.), while making sure that both classes are not empty . 5. Since now we hav e 𝑚 linear-separable labeled vectors, each with length 𝑛 . W e choose uniformly at random 𝑚 1 of them for training, and let the other 𝑚 2 = 𝑚 − 𝑚 1 for testing, while making sure that the training set includes vectors of both classes. A P P E N D I X C T H E E FF E C T I V E N E S S O F A L G . 4 The goal of Alg. 4 is to sample a column index and a row index from 𝑍 . W e show it achieves this goal. Step 1-3 are for sampling out the column index. They are essentially Alg. 2 with 𝐴 = Diag ( k 𝑥 1 k 𝑝 − 1 , . . . , k 𝑥 𝑚 k 𝑝 − 1 ) and 𝑏 = ( k 𝑥 1 k , . . . , k 𝑥 𝑚 k ) , which sample from the column norm vector 𝑏 = ( k 𝑥 1 k 𝑝 , . . . , k 𝑥 𝑚 k 𝑝 ) of 𝑍 to get the column inde x 𝑗 . W e note that in practical applications, Step 1-3 can be adjusted for speedup, such as frugal rejection sampling suggested in [37]. Step 4 is for sampling out the row index. Suppose 𝑙 = Í 𝑝 𝜏 = 1 ( 𝑖 𝜏 − 1 ) 𝑛 𝑝 − 𝜏 + 1 . According the definition of tensor power , the 𝑙 -th element of 𝑥 ⊗ 𝑝 𝑗 is ( 𝑥 ⊗ 𝑝 𝑗 ) 𝑙 = Π 𝑝 𝜏 = 1 𝑥 𝑖 𝜏 𝑗 . When Step 4 executes 𝑝 times of sampling on 𝑥 𝑗 , the probabil- ity of getting the outcome 𝑖 1 , 𝑖 2 , . . . , 𝑖 𝑝 is | Π 𝑝 𝜏 = 1 𝑥 𝑖 𝜏 𝑗 | 2 , which is exactly the probability of sampling out ( 𝑥 ⊗ 𝑝 𝑗 ) 𝑙 in 𝑥 ⊗ 𝑝 𝑗 . Thus we output index 𝑙 = Í 𝑝 𝜏 = 1 ( 𝑖 𝜏 − 1 ) 𝑛 𝑝 − 𝜏 + 1 . A C K N O W L E D G M E N T The authors would like to thank Y i-Fei Lu for helpful discussions. R E F E R E N C E S [1] H.-L. Huang, D. Wu, D. Fan, and X. Zhu, “Superconducting quantum computing: a re view , ” Science China Information Sciences , vol. 63, no. 180501, 2020. [2] P . W . Shor , “ Algorithms for quantum computation: Discrete logarithms and factoring, ” in Pr oc. 35th Annual Symposium F oundations Computer Sci. Santa Fe, NM, USA: IEEE, Nov . 1994, pp. 124–134. [Online]. A v ailable: https://ieeexplore.ieee.org/document/365700 [3] C.-Y . Lu, D. E. Browne, T . Y ang, and J.-W . Pan, “Demonstration of a compiled version of shor’ s quantum factoring algorithm using photonic qubits, ” Physical Review Letters , vol. 99, no. 25, p. 250504, 2007. [Online]. A vailable: https://journals.aps.org/prl/abstract/10.1103/ PhysRevLett.99.250504 [4] H.-L. Huang, Q. Zhao, X. Ma, C. Liu, Z.-E. Su, X.-L. W ang, L. Li, N.-L. Liu, B. C. Sanders, C.-Y . Lu et al. , “Experimental blind quantum computing for a classical client, ” Physical re view letters , v ol. 119, no. 5, p. 050503, 2017. [Online]. A v ailable: https://journals.aps.org/prl/abstract/10.1103/PhysRe vLett.119.050503 [5] L. K. Grover , “ A fast quantum mechanical algorithm for database search, ” in Proc. 21th Annual ACM Symposium Theory Computing . Philadelphia, Pennsylvania, USA: ACM, May 1996, pp. 212–219. [Online]. A v ailable: http://doi.acm.org/10.1145/237814.237866 [6] T . Li, W .-S. Bao, H.-L. Huang, F .-G. Li, X.-Q. Fu, S. Zhang, C. Guo, Y .-T . Du, X. W ang, and J. Lin, “Complementary- multiphase quantum search for all numbers of target items, ” Physical Review A , vol. 98, no. 6, p. 062308, 2018. [Online]. A vailable: https://journals.aps.org/pra/abstract/10.1103/PhysRe vA.98.062308 [7] J. Biamonte, P . Wittek, N. Pancotti, P . Rebentrost, N. W iebe, and S. Lloyd, “Quantum machine learning, ” Nature , v ol. 549, no. 7671, p. 195–202, Sept. 2017. [Online]. A vailable: https: //doi.org/10.1038/nature23474 [8] H.-L. Huang, X.-L. W ang, P . P . Rohde, Y .-H. Luo, Y .-W . Zhao, C. Liu, L. Li, N.-L. Liu, C.-Y . Lu, and J.-W . Pan, “Demonstration of topological data analysis on a quantum processor, ” Optica , vol. 5, no. 2, pp. 193–198, 2018. [Online]. A vailable: https: //www .osapublishing.org/optica/abstract.cfm?uri=optica- 5- 2- 193 [9] J. Liu, K. H. Lim, K. L. W ood, W . Huang, C. Guo, and H.-L. Huang, “Hybrid quantum-classical con volutional neural networks, ” arXiv pr eprint , 2019. [Online]. A vailable: https://arxi v .org/abs/1911.02998 [10] H.-L. Huang, Y .-W . Zhao, T . Li, F .-G. Li, Y .-T . Du, X.-Q. Fu, S. Zhang, X. W ang, and W .-S. Bao, “Homomorphic encryption experiments on ibm’ s cloud quantum computing platform, ” F r ontiers of Physics , vol. 12, no. 1, p. 120305, 2017. [Online]. A vailable: https://link.springer .com/article/10.1007/s11467- 016- 0643- 9 [11] H.-L. Huang, Y . Du, M. Gong, Y . Zhao, Y . W u, C. W ang, S. Li, F . Liang, J. Lin, Y . Xu et al. , “Experimental quantum generativ e adversarial networks for image generation, ” , 2020. [12] H.-L. Huang, A. K. Goswami, W .-S. Bao, and P . K. Panigrahi, “Demon- stration of essentiality of entanglement in a deutsch-like quantum algorithm, ” SCIENCE CHIN A Physics, Mechanics & Astr onomy , v ol. 61, no. 060311, 2018. [13] H.-L. Huang, M. Naro ˙ zniak, F . Liang, Y . Zhao, A. D. Castellano, M. Gong, Y . W u, S. W ang, J. Lin, Y . Xu et al. , “Emulating quantum teleportation of a majorana zero mode qubit, ” Physical Review Letters , vol. 126, no. 9, p. 090502, 2021. [14] D. R. Simon, “On the power of quantum computation, ” SIAM J. Comput. , v ol. 26, no. 5, pp. 1474–1483, July 1997. [Online]. A v ailable: https://doi.org/10.1137/S0097539796298637 [15] I. Kerenidis and A. Prakash, “Quantum recommendation systems, ” in 8th Innovations Theoretical Computer Sci. Conf. , ser. Leibniz International Proceedings in Informatics (LIPIcs), vol. 67, Berkeley , CA, USA, Jan. 2017, pp. 49:1–49:21. [Online]. A v ailable: http: //drops.dagstuhl.de/opus/vollte xte/2017/8154 [16] E. T ang, “ A quantum-inspired classical algorithm for recommendation systems, ” in Proc. 51st Annual ACM SIGACT Symposium Theory Computing , vol. 25. Ne w Y ork, NY , USA: ACM, June 2019, pp. 217–228. [Online]. A vailable: https://doi.org/10.1145/3313276.3316310 [17] A. Frieze, R. Kannan, and S. V empala, “Fast monte-carlo algorithms for finding low-rank approximations, ” J. Assoc. Comput. Mach. , vol. 51, no. 6, pp. 1025–1041, Nov . 2004. [Online]. A vailable: http://doi.acm.org/10.1145/1039488.1039494 [18] S. Lloyd, M. Mohseni, and P . Rebentrost, “Quantum principal component analysis, ” Nat. Phys. , vol. 10, no. 9, p. 631–633, July 2014. [Online]. A v ailable: https://doi.org/10.1038/nphys3029 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2020 13 [19] S. Lloyd, M. Mohseni, and P . Rebentrost, “Quantum algorithms for supervised and unsupervised machine learning, ” arXiv preprint , Nov . 2013. [Online]. A vailable: https://arxi v .org/abs/1307.0411 [20] E. T ang, “Quantum-inspired classical algorithms for principal component analysis and supervised clustering, ” arXiv pr eprint , Oct. 2018. [Online]. A v ailable: http://arxiv .or g/abs/1811.00414 [21] A. Gilyén, S. Lloyd, and E. T ang, “Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension, ” arXiv preprint , Nov . 2018. [Online]. A v ailable: http://arxiv .org/abs/ 1811.04909 [22] N.-H. Chia, H.-H. Lin, and C. W ang, “Quantum-inspired sublinear classical algorithms for solving low-rank linear systems, ” arXiv pr eprint , Nov . 2018. [Online]. A vailable: https://arxiv .org/abs/1811.04852 [23] A. W . Harro w , A. Hassidim, and S. Lloyd, “Quantum algorithm for linear systems of equations, ” Phys. Re v . Lett. , vol. 103, no. 15, p. 150502, Oct. 2009. [24] J. M. Arrazola, A. Delgado, B. R. Bardhan, and S. Lloyd, “Quantum- inspired algorithms in practice, ” Quantum , vol. 4, p. 307, Aug. 2020. [Online]. A v ailable: https://doi.org/10.22331/q- 2020- 08- 13- 307 [25] P . J. Phillips, “Support vector machines applied to face recognition, ” in Advances Neural Inform. Processing Systems , vol. 48, no. 6241, Gaithersbur g, MD, USA, Nov . 1999, pp. 803–809. [Online]. A vailable: https://doi.org/10.6028/nist.ir .6241 [26] J. A. K. Suykens and J. V andewalle, “Least squares support vector machine classifiers, ” Neural Process. Lett. , vol. 9, no. 3, pp. 293–300, June 1999. [Online]. A vailable: https://doi.org/10.1023/A: 1018628609742 [27] J. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support V ector Machines, ” Apr . 1998. [Online]. A v ailable: https://www .microsoft.com/en- us/research/publication/ sequential- minimal- optimization- a- fast- algorithm- for- training- support- v ector - machines/ [28] H. P . Graf, E. Cosatto, L. Bottou, I. Dourdano vic, and V . V apnik, “Parallel Support V ector Machines: The Cascade SVM, ” in Advances in Neural Information Pr ocessing Systems 17 , L. K. Saul, Y . W eiss, and L. Bottou, Eds. MIT Press, 2005, pp. 521–528. [Online]. A vailable: http://papers.nips.cc/paper/ 2608- parallel- support- vector - machines- the- cascade- svm.pdf [29] J. Xu, Y . Y . T ang, B. Zou, Z. Xu, L. Li, Y . Lu, and B. Zhang, “The generalization ability of svm classification based on mark ov sampling, ” IEEE transactions on cybernetics , vol. 45, no. 6, pp. 1169–1179, 2014. [Online]. A v ailable: https://ieeexplore.ieee.org/abstract/document/ 6881630 [30] B. Zou, C. Xu, Y . Lu, Y . Y . T ang, J. Xu, and X. Y ou, “ 𝑘 -times markov sampling for svmc, ” IEEE transactions on neural networks and learning systems , vol. 29, no. 4, pp. 1328–1341, 2017. [Online]. A v ailable: https://ieeexplore.ieee.org/abstract/document/7993056/ [31] P . Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector machine for big data classification, ” Phys. Rev . Lett. , vol. 113, p. 130503, Sept. 2014. [Online]. A v ailable: https://link.aps.org/doi/10. 1103/PhysRevLett.113.130503 [32] J. Bezanson, A. Edelman, S. Karpinski, and V . B. Shah, “Julia: A fresh approach to numerical computing, ” SIAM re view , vol. 59, no. 1, pp. 65–98, 2017. [Online]. A v ailable: https://doi.org/10.1137/141000671 [33] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines, ” A CM T ransactions on Intelligent Systems and T echnolo gy , vol. 2, pp. 27:1–27:27, 2011, software av ailable at http://www .csie.ntu. edu.tw/~cjlin/libsvm. [34] D. Achlioptas, F . McSherry , and B. Schölkopf, “Sampling techniques for kernel methods, ” in Advances Neural Inform. Processing Systems , T . G. Dietterich, S. Becker , and Z. Ghahramani, Eds. V ancouver , British Columbia, Canada: MIT Press, Dec. 2002, pp. 335–342. [Online]. A v ailable: https://papers.nips.cc/paper/ 2072- sampling- techniques- for- kernel- methods [35] L. W ang, Support V ector Machines for Signal Processing , 1st ed. The Netherlands: Springer , Berlin, Heidelberg, 2005, ch. 15, pp. 321–342. [Online]. A v ailable: https://doi.org/10.1007/b95439 [36] L. W ang, Multiple Model Estimation for Nonlinear Classification , 1st ed. The Netherlands: Springer, Berlin, Heidelberg, 2005, ch. 2, pp. 49–76. [Online]. A v ailable: https://doi.org/10.1007/b95439 [37] I. L. Markov , A. Fatima, S. V . Isakov , and S. Boixo, “Quantum Supremacy Is Both Closer and Farther than It Appears, ” arXiv preprint , Sep. 2018. [Online]. A v ailable: http://arxiv .or g/abs/1807.10749 Chen Ding received the B.S. degree from Univ ersity of Science and T echnology of China, Hefei, China, in 2019. He is currently a graduate student in CAS Centre for Excellence and Synergetic Innovation Centre in Quantum Information and Quantum Physics. His current research interests include quantum machine learning, quantum-inspired algorithm designing and variational quantum computing. Tian-Y i Bao received the B.S. degree from Univer - sity of Michigan, Ann Arbor, USA, in 2020. She is currently a graduate student in Oxford Univ ersity . Her current research interests include the machine learning and human-computer interaction. He-Liang Huang receiv ed the Ph.D. degree from the Uni versity of Science and T echnology of China, Hefei, China, in 2018. He is currently an Assistant Professor of Henan Ke y Laboratory of Quantum Information and Cryp- tography , Zhengzhou, China, and the Postdoctoral Fellow of University of Science and T echnology of China, Hefei, China. He has authored or co-authored over 30 papers in refereed international journals and co-authored 1 book. His current research interests include secure cloud quantum computing, big data quantum computing, and the physical implementation of quantum computing architectures, in particular using linear optical and superconducting systems.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment