Deep Tensor Encoding

Learning an encoding of feature vectors in terms of an over-complete dictionary or a information geometric (Fisher vectors) construct is wide-spread in statistical signal processing and computer vision. In content based information retrieval using de…

Authors: B Sengupta, E Vasquez, Y Qian

Deep Tensor Encoding
Deep T ensor Encodings B Sengupta ∗ Dept. of Engineering, University of Cambridge Trumpington Str eet Cambridge, United Kingdom CB2 1PZ bs573@cam.ac.uk Y Qian † E V azquez Cortexica Vision Systems Limited Capital T ower – 91 W aterloo Road London, United Kingdom SE1 8RT yu.qian@cortexica.com ABSTRA CT Learning an encoding of feature vectors in terms of an over-complete dictionary or a information geometric (Fisher vectors) construct is wide-spread in statistical signal pr ocessing and computer vision. In content based information retrieval using deep-learning classiers, such encodings are learnt on the aened last layer , without adher- ence to the multi-linear structure of the underlying feature tensor . W e illustrate a variety of feature encodings incl. sparse dictionar y coding and Fisher vectors along with proposing that a structured tensor factorization scheme enables us to perform r etrieval that can be at par , in terms of average precision, with Fisher v ector encoded image signatures. In short, we illustrate how structural constraints increase retrieval delity . CCS CONCEPTS • eory of computation → Paern matching; • Computing method- ologies → Neural networks; KEY W ORDS computer vision, deep learning, image retrieval, tensor factoriza- tion, sparse coding A CM Reference format: B Sengupta, Y Qian, and E V azquez. 2017. Deep T ensor Encodings. In Proceedings of KDD, Canada, August 2017 (KDD W orkshop on ML me ets Fashion), 6 pages. DOI: 1 IN TRODUCTION e success of deep-learning lies in constructing feature spaces where in competing classes of objects, sounds, etc. can be shaer ed using high-dimensional hyp erplanes. e classier relies on the accumulation of representation in the nal convolution layer of a deep neural network. Oen times, the classier p erformance ∗ BS has dual appointments at Cortexica Vision Systems Ltd. and the Dept. of Bioengi- neering, Imperial College London. † BS and Y Q contributed equally to this paper. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. T o copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and /or a fee. Request permissions from permissions@acm.org. KDD W orkshop on ML me ets Fashion, Canada © 2017 ACM. 978-x-xxxx-x xxx-x/Y Y/MM. . . $ 15.00 DOI: increases as one incorporates information from earlier layers of the neural network. Such a structural constraint has be en imposed on certain deep learning architectures via the inception module . In addition to decreasing the computational eort, utilization of 1 × 1 convolution lters enables the dimensionality of feature map to be immensely reduced; in tandem with po oling, the dimensionality reduces even further . us, information from the previous layer( s) can be accumulated and concatenate d in the inception module. By learning the weights feeding to the inception module ther e is the additional exibility in veing the dierent sources of information that reaches deeper layers. A big demerit of de ep-learning architectures is their inability to perform well in the advent of small amounts of training data. Tricks such as input (rotation, blur , etc.) and feature augmentation (in terms of inception module) have proven useful [8]. Such struc- tural constraints regularize the optimization problem, reducing over-ing. In this paper , we propose a much simpler structural constraint i.e., to utilize the multi-linear structur e of deep feature tensors. W e will rst emphasize the importance of feature encod- ing, starting with Fisher encoding (of the last layer ) followed by a sparse dictionary base d on the last feature layer; this fee ds to deep tensor factorization that brings together tensor learning and deep learning – cementing the idea that taking into account the high dimensional multi-linear repr esentation increases the delity of our learnt repr esentation. Albeit, these algorithms are evaluated on a content-based-image-retrie val (CBIR) seing for a te xture dataset, many of them have been combined in Cortexica’s ndSimilar tech- nology (hps://www .cortexica.com/technology/; Figures 2 and 3), that facilitate retailers to recommend items from fashion databases comprising inventory items such as tops, trousers, handbags, etc. 2 METHODS 2.1 Dataset and deep-feature generation In this paper , Describable T extur es Dataset (DTD ) [ 4 ] is used to evaluate feature encodings for image retrieval. A wide variety of fashion inventory rely on capturing the dierences between vary- ing textures. Hence, our feature encoding comparison leverages the DTD dataset, a widely used dataset for texture discrimination. Rather than recognition and description of object, texture images in DTD are collected from wild images (Google and F lickr) and classied based on human visual perception [ 16 ], such as direction- ality (line-like), regularity (polka-doed and che quered), etc. DTD contains 5640 wild texture images with 47 describable aributes drawn from the psychology literature, and is publicly available on the web at hp://www .robots.ox.ac.uk/vgg/data/dtd/. KDD W orkshop on ML meets Fashion, August 2017, Canada Sengupta et al. T extures can be describ ed via orderless pooling of lter bank re- sponse [ 7 ]. In de ep convolutional neural network (dCNN), the con- volutional layers are akin to non-linear lter banks; these have in fact been proved to b e beer for texture descriptions [ 5 ]. Here, the local deep features are extracted from last convolutional layer of a pre-trained V GG-M [ 3 ]. is is represented by T =  t 1 , . . . t i , . . ., t N : t ∈ R D  ; the size of last convolutional layer is H × W × D , where D denotes the dimension of lter response at the i t h pixel of last convolution layer; N = H × W is the total number of lo cal features. Dierent feature encoding strategies are then utilized for encoding local de- scriptors. A similarity metric is then applied to rank images. W e use the l 2 norm (Frobenius norm for matrices and tensors) between vectors as a notion of distance b etween the query and the database images. W e will now introduce ve dierent encodings of the feature matrix – (a) Fisher encoding, (b) Sparse matrix dictionary learning, (c) t-SVD factorization, (d) Lo w-rank plus Sparse factorization, and e) Multilinear Principal Component Analysis (MPCA). Feature encoding 2.1.1 Fisher encoding. W e use a Gaussian Mixture Model (GMM) for encoding a probabilistic visual vocabulary on the training dataset. Images are then represented as Fisher V ectors [ 14 , 17 ] – derivatives of log-likelihood of the model with respect to its parameters (the score function). Fisher encoding describes how the distribution of features of an individual image diers fr om the distribution ed to the feature of all training images. First, a set of D dimension local deep featur es are extracted from an image and denoted as T =  t 1 , . . . t i , . . ., t N : t ∈ R D  . As Ref. [ 5 , 15 ], a K component GMM with diagonal covariance is gener- ated on the training set with the parameters { Θ = ( ω k , µ k , Σ k )} K k = 1 , only the derivatives with respect to the mean { µ k } K k = 1 and vari- ances { Σ k } K k = 1 are enco ded and concatenated to repr esent an image T ( X , Θ ) =  ∂ L ∂ µ 1 , . . ., ∂ L ∂ µ K , ∂ L ∂ Σ 1 , . . ., ∂ L ∂ Σ K  , where, L ( Θ ) = N Õ i = 1 l o д ( π ( t i )) π ( t i ) = K Õ k = 1 ω k N ( t i ; µ k , Σ k ) (1) For each component k , mean and co variance deviation on each vector dimension j = 1 , 2 . . . D are ∂ L ∂ µ j k = 1 N √ ω k N Õ i = 1 q i k t j i − µ j k σ j k ∂ L ∂ Σ j k = 1 N √ 2 ω k N Õ i = 1 q i k "  t j i − µ j k σ j k  2 − 1 # (2) where q i k is the so assignment weight of feature t i to the k t h Gaussian and dened as q i k = e x p h − 1 2 ( t i − µ k ) T Σ − 1 k ( t i − µ k ) i Í K t = 1 e x p h − 1 2 ( t i − µ t ) T Σ − 1 t ( t i − µ t ) i (3) Just as the image representation, the dimension of Fisher vector is 2 K D , K is the number of comp onents in the GMM, and D is the dimension of local feature descriptor . Aer l 2 normalization on Fisher vector , the Euclidean distance is calculated to nd similarity between two images. 2.1.2 Sparse co ding on de ep features. e compressed ( sparse) sensing framework allows us to learn a set of over-complete bases D and sparse weights ϕ such that the feature matrix T can be represented by a linear combination of these basis vectors: arg min D , ϕ 1 n n Õ i = 1  1 2 k T − D · ϕ i k 2 F + λ k ϕ i k 1  (4) e k-SVD algorithm [ 9 ] comprises of two stages – rst, a sparse coding stage (either using matching pursuit or basis pursuit) and second, a dictionary update stage. In the rst stage when the dictio- nary D is xed, we solve for ϕ using orthogonal matching pursuit (OMP). Briey , OMP recovers the support of the weights ϕ using an iterative greedy algorithm that selects at each step the column of D that is most correlated with the current residual. Practically , we initialise the r esidual r k , subsequently computing the column that r educes the residual the most j ∗ and nally adding this column to the support I k r k = T − D ϕ k − 1 j ∗ = arg min j = 1 . . . n ϕ   r k − d j ϕ   2 I k = I k − 1 ∪ j ∗ (5) Iterating through these equations for a pre-specie d number of iteration, we can update the sparse weight ϕ . Aer obtaining the sparse weights, we use a dictionary up date stage where we update only one column of D each time. e update for the k -th column is, k T − D · ϕ k 2 F =       T − K Õ j = 1 D j · ϕ T j       2 F =             ©  « T − Õ j , k D j · ϕ T j ª ® ¬ | {z } E k − D k · ϕ T k             2 F (6) In order to minimize    E k − D k · ϕ T k    2 F we decompose (SVD) E k as U W V T . Using this decomposition we utilize a rank-1 appro xi- mation to form d k = u 0 and ϕ k = w 0 v 0 . W e can then iterate this for every column of D . Sparsity is aained by collapsing each ϕ k Deep T ensor Enco dings KDD W orkshop on ML meets Fashion, August 2017, Canada only to non-zero entries, and constraining E k to the corresponding columns. For image retrieval, each local deep feature can be encoded by sparse weights ϕ . Such an image signature can be repr esented by max p ooling of a set of ϕ , follow ed by measuring a distance between such sets. 2.1.3 T ensor factorization of deep features. In the earlier section, we relied on an alternate minimization of the dictionary and the loadings (weights) to yield a conve x optimization problem. Specif- ically , the last convolutional layer was used to form a dictionar y without reference to the tensorial (multi-linear) repr esentation of feature spaces obtaine d from the deep conv olutional network. us, our goal is to approximate these tensors as a sparse conic combina- tions of atoms that have been learnt from a dictionar y comprising the entire image training set. In other words, we would like to obtain an over-complete dictionary such that each feature tensor can be represented as a w eighted sum of a small subset of the atoms (loadings) of the dictionary . ere are two decompositions that are used for factorizing ten- sors, one is based on T ucker de composition whilst the second is known as Canonical Decomposition/Canonical Polyadic De compo- sition ( CANDECOMP/CPD), also known as Parallel Factor Analysis (P ARAF A C). In the former , tensors are represented by sparse core tensors with block structures. Specically , T is approximated as a multi-linear transformation of a small “ core ” tensor G by factor matrices A [11], T = G • 1 A ( 1 ) • 2 A ( 2 ) . . . N A ( N ) , h h G ; A ( 1 ) , A ( 1 ) . . . A ( N ) i i (7) In the laer , a tensor T is wrien as a sum of R rank-1 tensors, each of which can be wrien as the outer product of N factor variables i.e., T = R Õ r = 1 a ( 1 ) r ⊗ a ( 2 ) r . . . a ( N ) r , h h A ( 1 ) , A ( 2 ) . . . A ( N ) i i (8) It is canonical when the rank R is minimal; such a decomposition is unique under certain conditions [ 6 ]. Even then, due to numerical errors, factorization of a feature matrix obtained using a deep neural network results in a non-unique factorization. us, CPD proves inadequate for unique feature encoding. W e therefore utilize a factorization that is similar to a 2D-PCA alb eit lied for multi-linear objects. Spe cically , we use t-SVD [ 10 ] to factorize the feature matrices. Based on t-SVD. e t-product between two tensors, T 1 and T 2 , T 1 ∗ T 2 ≡ fold ( circ (T 1 ) · unfold ( T 2 )) T = U ∗ S ∗ V T (9) where, circ (·) creates a block circulant matrix and the unfold op- erator matricizes the tensor on the tube (3rd) axis. S is a f-diagonal tensor that contains the eigen-tupules of the covariance tensor on its diagonal whereas, the columns of U are the eigenmatrices of the covariance tensor . e images in the training set are inserted as the second index of the tensor . In other words, using t-SVD , we obtain an orthogonal factor dictionar y of the entire training set. Ultimately , a projection of the mean-removed input tensor (i.e ., a single feature tensor , T test ) on the orthogonal projector ( U T ∗ T test ) forms the tensor encoding of each image. e Frobenius norm then measures the closeness between a pair of images (individual tensor projections). Computation eciency is guaranteed since the comparison between the test image and the database is measured in Fourier domain – the t-product utilizes a fast Fourier transform algorithm in its core [13]. Another feature enco ding that we consider is the partition of each individual tensor i.e., T = L + P where, L is a low-rank tensor and P is a sparse tensor . W e have L = U 1: r ∗ S 1: r ∗ V T 1: r and P = T − L . r denotes the truncation index of the singular components. Based on mPCA. For high-order tensor analysis, multilinear Prin- cipal Component Analysis(mPCA) or High Order Singular V alue Decomposition(HOSVD) [ 12 , 18 ] compute a set of orthonormal matrices associated with each mo de of a tensor – this is analogous to the orthonormal row and column space of a matrix computed by the matrix PCA/SVD . In a T ucker decomposition (Eqn. 7) if the factor matrices are constrained to be orthogonal the input tensor can b e decomposed as a linear combination of rank-1 tensors. Given a set of N-order training tensor T , the objective of mPCA is to nd N linear pro- jection matrices that maximize the total tensor scaer (variance) in the projection subspace. When factor matrices A ( N ) in Eqn. 7 are orthogonal then k T k 2 F = k G k 2 F . Each quer y (test) tensor T q can then be projected using Y = T q × 1 A ( 1 ) T × 2 A ( 2 ) T . . . × N A ( N ) T where the bold-faced matrices represent a low-dimensional space . e objective then becomes learning a set of matrices A to admit arg max A ( 1 ) . . . A ( N ) M Õ m = 1    Y m , t r a i n − Y    2 F (10) Y is the mean tensor . Since the projection to an N t h order tensor subspace consists of N projections to N vector subspaces, in Ref. [ 12 ], optimization of the N projection matrices can b e solved by nding each A ( n ) that maximizes the scaer in the n-mode vector subspace with all other A -matrices xed. e local optimization procedure can be iterated until convergence. EXPERIMEN TS In this section, ve deep feature encoding methods are evaluated for image retrieval on the DTD dataset. Images are resized to same size (256 x 256), deep feature is e xtracted from last convolutional layer of a pre-trained VGG-M. For Fisher vector and sparse coding, deep features are aened as a set of 1D local features. For t-SVD, deep features are represented as 2D feature maps, and treated as [HxW ,1,D] data structures (see Methods). Aer encoding and l 2 normalization, the Euclidean distance is calculated to nd similarity between two images. T o evaluate image retrieval, Mean A verage Precision (MAP) on top 10 rankings are calculated. Tw o images per category i.e., a total of 94 images are selected as queries from the test dataset. e dataset retrieved includes 3760 images from DTD training and validation datasets. MAP on DTD is listed in T able 1. An example KDD W orkshop on ML meets Fashion, August 2017, Canada Sengupta et al. d ) c ) a ) b ) f ) e ) Figure 1: Retrieved results on DTD: a) Fisher vector b) Sparse coding c) t-SVD d) mPCA e) Low rank tensor f ) Raw tensor T able 1: A verage precision for the DTD dataset. Raw tensors are feature tensors without any encoding. Methods top-1 top-5 top-10 time taken Fisher V ector 0.56 0.52 0.48 12ms Sparse Coding 0.62 0.49 0.44 35ms t-SVD dictionary 0.53 0.42 0.38 2188ms mPCA dictionary 0.53 0.43 0.39 5ms Low Rank 0.51 0.42 0.38 967ms Raw T ensor 0.41 0.35 0.31 of the retrieval obtained with each method is shown in Figure 2. On each case 10 images are displayed. T op le image is the query used. e rest of images are arranged by similarity to query image as obtained with each encoding method. Image retrieval amounts to sear ching large databases to return ranked list of images that have similar properties to that of the query image. Here, we r eason that in contrast to raw feature ten- sors (i.e., without any encoding of the feature maps), their enco dings either using Fisher vector , sparse dictionaries or multi-linear ten- sors increases the average precision of retrieval. T able 1 shows that multi-linear encodings based on t-SVD, mPCA or low-rank decomposition of individual images all have similar delity whilst performing ver y close to information geometric enco ding of the feature vectors. Sparse coding supersedes other methods in terms of average pre- cision. is is because the dictionary learnt using k-SVD is matched to the underlying image distribution (i.e., by learning the sparse weights) whereas the tensor dictionaries (t-SVD or mPCA) use orthogonal matrices as dictionaries without the added constraint to nesse the weights, or to add additional structure in terms or sparsity , low-rank or non-negativity . As shown in T able 1, computing mPCA tensor encodings are computationally ecient in contrast to sparse dictionaries or learn- ing a probabilistic model for Fisher vector enco dings. Combined with reasonable retrieval performance, tensor encodings of deep neural features make them a contender for commercial infrastruc- tures. e code was implemented in Matlab 2015a under linux with Intel Xeon CP U E5-2640 @ 2.00GHz and 125G RAM. Sandia’s T ensor toolbox [ 1 ], K U Leuven’s T ensorlab [ 19 ] and TFOCS [ 2 ] were used to implement our algorithms. CONCLUSION Feature encoding is crucial for comparing images ( or videos, text, etc.) that are similar to one another . Our experiments show that whilst sparse encoding of a feature tensor proves to be the most ecient encoding for r etrieval, having no encoding gr ossly r educes the average precision. T aking the multi-linear properties of the fea- ture tensor improves r etrieval, and the delity is at par with Fisher encoding. W e further show that computing such multi-linear rep- resentation of the feature tensor is computationally much ecient than constructing a sparse dictionar y or learning a probabilistic model. e sparse dictionary encoding has the highest average precision due to the added e xibility of learning the weights as well as impos- ing the structural constraint of sparsity . Fisher vector encoding has the second highest precision because of its ability to capture the information geometry of the underlying pr obabilistic model. Sp ecif- ically , the Fisher tensor encodes the underlying Riemannian metric which augments the encoding with the curvature of the underlying distribution. e multi-linear approaches based on t-SVD , mPCA and low-rank decomposition perform at par with Fisher vectors as they encode the third and higher order interaction in the feature tensors. Comparison of the compute time tells us that amongst all of the methods, encoding deep feature tensors using mPCA is the most time-ecient algorithm. Our results have not exploited the geometr y exhibited by the tensors, for example, one can calculate lagged covariance tensors from the feature tensor – these tensors exhibit a Riemann geometry due to their positive denite structure. erefore building a dictio- nary of co-variance tensors using t-SVD , wherein an A ugmented Lagrangian alternating direction method can be employed to learn a sparse representation, is the next viable step to our work. W e anticipate that such a multi-linear overcomplete dictionary should at the very least have increased pr ecision to that of the Fisher en- coding method. In so far , the last convolutional layer was used to form a dictionary without reference to the earlier layers of the deep neural network. In fact a step for ward would be to constrain the construction of an image signature with tensorial informa- tion from an earlier layer . e tensor methods rely on factorizing large tensors, especially those that emerge from deep neural net- works. Y et, GP U implementation of the Fourier transform in the form of cuFFT enables us to build a scalable commercial solution (hps://ww w .cortexica.com/technology/). REFERENCES [1] Bre W . Bader , T amara G. Kolda, and others. 2015. MA TLAB T ensor T o olbox V ersion 2.6. A vailable online. (February 2015). hp://www .sandia.gov/ ∼ tgkolda/ T ensor T oolbox/ [2] Stephen R Becker , Emmanuel J Cand ` es, and Michael C Grant. 2011. T emplates for convex cone pr oblems with applications to sparse signal recovery. Mathematical programming computation 3, 3 (2011), 165. [3] K. Chateld, K. Simonyan, A. V e daldi, and A. Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference . [4] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. V edaldi. 2014. Describ- ing textures in the wild. In IEEE Conference on Computer Vision and Paern Recognition . [5] M Cimpoi, S Maji, and A V edaldi. 2015. Deep Filter Banks for T exture Recog- nition and Segmentation. In IEEE Conference on Computer Vision and Paern Recognition . [6] Ignat Domanov and Lieven De Lathauwer . 2013. On the Uniqueness of the Canonical Polyadic Decomposition of ird-Order T ensors - Part II: Uniqueness Deep T ensor Enco dings KDD W orkshop on ML meets Fashion, August 2017, Canada of the Overall Decomposition. SIAM J. Matrix A nalysis Applications 34, 3 (2013), 876–903. [7] Y unchao Gong, Liwei W ang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale Orderless Pooling of Deep Convolutional Activation Features. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII . 392–407. [8] Ian Goodfellow , Y oshua Bengio, and Aaron Courville. 2016. De ep Learning . MI T Press. [9] Zhuolin Jiang, Zhe Lin, and Larry S. Davis. 2011. Learning a discriminative dictionary for sparse coding via label consistent K-SVD. In Computer Vision and Paern Re cognition (CVPR), 2011 IEEE Conference on . 1697 – 1704. [10] Misha E Kilmer , Karen Braman, Ning Hao, and Randy C Hoover . 2013. ir d-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. A ppl. 34, 1 (2013), 148–172. [11] T amara G. Kolda and Bre W . Bader . 2009. T ensor Decompositions and Applica- tions. SIAM Rev. 51, 3 (2009), 455–500. [12] H. Lu, K. N. Plataniotis, and A. N. V enetsanop oulos. 2008. mPCA: Multilinear principal component analysis of tensor obje cts. IEEE T rans. Neural Network 1, 19 (2008), 18–39. [13] Nvidia. 2007. CUDA CUFFT Library . (2007). [14] F. Perr onnin and C. Dance. 2006. Fisher kernels on visual vocabularies for image categorization. In IEEE Conference on Computer Vision and Paern Recognition . [15] K. Simonyan, O . M. Parkhi, A. V edaldi, and A. Zisserman. 2013. Fisher Ve ctor Faces in the Wild. In British Machine Vision Conference . [16] H. T amura, S. Mori, and T . Yamawak. 1978. T extural features corr esponding to visual p erception. IEEE Transactions on Systems, Man and Cybernetics (1978), 460–473. [17] Tiberio Uricchio, Mar co Bertini, Lorenzo Seidenari, and Alberto Del Bimbo. 2015. Fisher Encoded Convolutional Bag-of-Windows for Ecient Image Retrieval and Social Image T agging.. In ICCV W orkshops . 1020–1026. [18] M.A.O . V asilescu and D . T erzopoulos. 2003. Multilinear Subspace Analysis for Image Ensembles. In IEEE Conference on Computer Vision and Paern Re cognition . 93–99. [19] N. V er vliet, O. Debals, L. Sorb er , M. V an Barel, and L. De Lathauwer . 2016. T ensorlab 3.0. (Mar . 2016). hp://w ww.tensorlab .net A vailable online. A TENSOR NORMS Let t i j k be the elements of tensor T , then the Frobenius norm is k T k F = k v e c (T ) k 2 = r Í i Í j Í k t 2 i j k . e nuclear (trace) norm is dened as k T k ∗ = t r ace ( √ T T T ) = min { m , n } Í i = 1 σ i . σ are the singular values of T . B CORTEXICA ’S FINDSIMILAR TECHNOLOGY Figure 2: e ndSimilar technology: A consumer takes a photograph of a clothing item. Using proprietary versions of feature encodings that leverage deep-learning as well as multi-scale analysis, the retailer presents similar items from the database. KDD W orkshop on ML meets Fashion, August 2017, Canada Sengupta et al. (a) Retrieval of similar bags (b) Retrieval of similar dresses Figure 3: Feature encodings: Each image is encode d using a (proprietary) combination of encodings describe d in this paper , along with other patented descriptors. Shown here are examples wherein the query is the top-le item and a ranked list is returned based on similarity .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment