On quantum statistics in data analysis

On quantum statist ics in data an alysis Dusko Pa vlovic ∗ Kes trel Institute and Oxford University dusko@ { kes trel.edu,com lab.ox.ac.uk } Abstract Originally , quantum probability theory was dev el oped to an- alyze statistical phenomena in quantum systems, where cl as- sical probability theory does not apply , because t he lat tice of measurable sets is not necessarily distributiv e. On the other hand, it is well known t hat the lattices of concepts, that arise in data analy sis, are in gene ral also non-distribu tiv e, albeit for completely differen t reasons. In his recent book, van Ri- jsbergen (2004) argues that many of the logical tools dev el- oped for quantum systems are also suitable for applications in information r etriev al. I explore the mathematical support for this idea on an abstract v ector space model, cov eri ng sev- eral forms of data analysis (information retriev al, data min- ing, collaborati ve ﬁltering, formal concept analysis. . . ), and roughly based on an idea from categorical quantum mechan- ics (Abramsky & Coeck e 2004; Coecke & Pav lovic 2007). It turns out that quantum (i.e., nonc ommutati ve) probability distributions arise already i n t his r udimentary mathematical frame work. W e show that a B ell-type inequality (Bell 1964) must be satisﬁed by the standard similarity measures, if they are used for preference predictions. The fact that already a very general, abstract version of the vector space model yields simple counterexamples for such inequalities seems to be an indicator of a genuine need for q uantum statist ics in data anal- ysis. Intr oduction Until recen tly , Computer Science was mainly concern ed with data storage and processing in purpose-built data bases and compu ters. W ith the advent of the W eb and social com- putation, the task of ﬁnd ing and u nderstand ing inform ation arising fr om local interactions in spontaneo usly ev o lving computatio nal network s and data repositories has taken cen- ter stage. As computers evolved from calculators, the key paradigm of Comp uter Science was com putation- as-calculation, with the T uring M achine construe d as a g eneric calculator, and with data proce ssing perf ormed b y a small set of local operation s. As co mputers got c onnected into n etworks, and cap tured a range of social function s, the p aradigm of computatio n-as-com munication em erged, w ith data pro- cessing p erform ed not o nly loca lly , but also throug h distri- bution, merging, an d association of d ata sets thro ugh vari- ∗ Supported by EPSR C an d ONR. ous communicating p rocesses. Such non- local d ata process- ing h as been implemented through markets, elections, and many oth er social m echanisms for a very lon g time, albeit on a smaller scale, with less conc rete inf rastructure, and with more complex comp utational agents. A new family of its implementatio ns is based on a n ew computational platform, which is not any mor e the Comp uter , or e ven its o perating system, but t he W eb, and its kn owledge systems. But while the interfaces o f the local compu tational pro - cesses are d eﬁned to be the interfaces o f the co mput- ers which pe rform them, the carriers of co mputation -as- commun ication d o not come with clearly deﬁned interfaces. The task of ﬁn ding a nd supplying reliable data within a mar- ket, or on the W eb, or in a social group, carries with it many deep pro blems. T wo of them are par ticularly relev ant fo r this work. Pro blem of partial info rmation and indeterminacy Data processing in a network is ongoin g. O n the other hand, the data sets ar e u sually incomp lete, an d informatio n needs to be extracted fr om such inco mplete sets. E.g., a task in a recommen der system is to extrap olate which movies (boo ks, music. . . ) will a u ser like, fr om a sparse s ample of those tha t she had previously rated. I n infor mation re triev al, the task is to extrapolate which in formation is relev ant for a quer y , from a small set of tokens cha racterizing the quer y o n one hand, and the information on the other hand. In the standard model of data an alysis, suc cinctly pr e- sented e.g. in ( Azar et al. 2001), it is assum ed that a ma- trix of random variables, containing a complete information about the relev ant properties of the objects of interest, e xists out there (in some sort of a Platonic hea ven of informa tion), and can be sampled. The problem o f data analysis is that the sampling pro cess is noisy , an d par tial; mo re speciﬁcally , that the distributions of the random variables are distorted b y an error pro cess, and by an omission process. The task o f d ata analysis is to elim inate the effects o f th ese processes, and re- construct a good approxim ation of the original information . While mathema tically co n venient, and computation ally effecti ve, this model d oes no t seem very realistic. If we instantiate it to a reco mmend er system a gain, then its b a- sic assumption becomes that each user has a co mpletely de- ﬁned p reference distribution, albeit only over the items that he h as u sed, and that the recom mender system just needs to recon struct th is p referenc e distribution. But if we z oom in, and ask the user himself, he will often be u nable to pr e- cisely recon struct his own preference distrib ution. If we a sk him to rate som e items again, he will of ten assign different ratings. One reason is that information processing is on- going, and that the preferen ces e volve and change. If we zoom in even further, we will ﬁnd that the state of user’ s preferen ces is usually not com pletely deter mined even in a completely static model: right after watching a movie, one usually need s to toss a ”m ental coin” to d ecide wheth er to assign 2 or 3 stars, say , to the p erform ance of an actor; or to decide whether to pay more attention, while w atching the movie, to this or that aspect, music, colors. . . While the indetermin acy of information in a network can be r educed to an effect of no ise, like in th e standa rd mod el, and averaged out, it is in teresting to p onder whether v iew- ing this indetermina cy as an essential f eature of network computatio n, r ather than a bug, may lead to m ore realistic models of infor mation systems. I s the ”men tal coin”, which resolves the superp osition of the many comp onents of my preferen ces when I need to me asure them, a kin to a real coin, which we all agre e is g overned by completely deter ministic laws o f classical physics, and its rando mness is just the ap- pearance of its co mplex b ehavior; or is this ”mental coin” governed b y a more fu ndamen tal f orm of rando mness, like the one th at occurs in quantum m echanics, causing the su- perposition of many states to collapse under measur eme nt ? Pro blem of classiﬁcati on and latent semantics The task of concep tualizing data has been formu lated in many ways. In informa tion r etriev al, the central task is to determine the r elev an ce of data with respect to a q uery . In recommen der systems, the implicit quer y is always: ”What will I like, given m y past choices and rankin gs?”, and the task is to ﬁnd the rele vant reco mmenda tions. In ord er to tackle such tasks, on e classiﬁes th e data on one hand , the queries on the oth er , an d a ligns the two classiﬁcations, in or- der to extrapolate the fu ture cho ices fr om the p ast choices. — But what are these classiﬁcations based on? The simplest appro ach is based on keywords. But even classifying a corpus of p urely textual doc uments, viewed as bags of words, accord ing to th e frequ ency of the occu rrences of the rele vant keywords, leads to signiﬁcant problems: pol- ysemy , hom onymy , syn onymy . The p roblem becomes very difﬁcult wh en it comes to classifying f amilies of non-textual objects: im ages, music, video, ﬁlm. Only a small par t of their cor relations can be captured by connecting the key- words, captions, or other forms of te xtual annotatio ns. Latent s emantics correlates data by extracting their intrin- sic structure. For instance, the centra l p iece of th e origi- nal Google search engine, distingu ishing it fr om o ther sim- ilar engin es, was that the keyword search was supp orted by PageRank ( Page et al. 1998), a rep utation rankin g of the W eb pag es, extracte d from their intrinsic hyperlink struc- ture. E ven for the keyword search, the crucial step was to recogn ize this laten t variable (Everitt 198 4) extracting rele- vance fro m n on-loca l network structure, rather than from lo- cal term occur rence. Su ch semantical suppor t is even mo re critical for sear ch an d re triev a l o f non-textual infor mation, on the W eb and in other data spaces. Overview of latent semantics W e conside r th e case when two types of data assign the meaning to each other . Patter n matrices Latent semantics is generally gi ven as a map J × U A / / R where • J is a set of obje cts , or items , • U is a set o f pr operties , or users , • R is a set o f values , or r atings . This map is conveniently presented as a pattern matrix A = ( A iu ) J × U . Th e entr y A iu can b e intu iti vely written as a model relatio n i | = u , especially wh en R = { 0 , 1 } . In general, it c an b e construed as the degree to which the ob- ject i satisﬁes th e p roperty , or the user u . While the ratings R usually car ry a structu re o f an ordered rig 1 , the attributes U often carry a more general algeb raic structure, wh ereas the b ehaviors of the o bjects in J may be expr essed coa lge- braically . Clearly , the rig structure of R is ju st e nough to support the usual ma trix comp osition. Sometime s, but n ot always, we also assume that R has n o nilpo tents, so th at it can be embedde d in an ordered ﬁeld. Examples. domain J U R A iu text analysis documents terms N occurrence count measurement instances quantities R outcome user preference items users { 0,. . . ,5 } rating topic search authorities hubs N hyperlinks concept analysys objects attributes { 0,1 } satisfaction elections candidates voters { 0,. . . ,n } preference market producers consumers Z delivery digital images images pixels [0 , 1] intensity Balancing and normali zation Notatio n. For e very v ector x = ( x k ) n k =1 , we deﬁne • the a verage (expectation) E ( x ) = 1 n P n k =1 x k • the ℓ 2 -norm k x k 2 = p P n k =1 | x k | 2 , • the ℓ ∞ -norm k x k ∞ = W n k =1 | x k | . Item balancing of a semantics matrix A redu ces each of its rows A i • , corresp onding to the item i , to a row vector A 0 i • , deﬁned A 0 i • = A i • − E ( A i • ) The unassigned ratings in A i • are padded by zeros. In an item-balanced matrix rec ords, the difference be- tween th e items with a h igher average r ating and the item s 1 A rig R = ( R , + , · , 0 , 1) is a ”ring without the ne gativ es”. This means that ( R , + , 0) and ( R , · , 1) are commutativ e monoids satisfying a ( b + c ) = a b + ac and 0 a = 0 . The typical exam ples include natural numbers, non-ne gati ve reals, but also distributiv e lattices, which generally do not embed in a ring. with a lo wer average rating is factor ed out. Only the sat- isfaction proﬁle of each item is r ecorded , over the set of users who have assign ed it better-than-av erage, or worse- than-average r ating. Th e a verag e and unassigned ratings are identiﬁed, and both become 0. User balancing of a seman tics matrix A reduces each of its colum ns A • u , correspond ing to the user u , to a co lumn vector A 0 • u , with the expected v alu e 0, by setting A 0 • u = A • u − E ( A • u ) The unassigned ratings are again padded by zeros. In a user-balanced matrix, u sers’ different rating hab its, that some of them are more generou s than others, are fac- tored out. Only th e satisfaction pr oﬁle of each user is recorde d, over the set of all item s that she has rated. Th e av erage and unassigned ratings are identiﬁed, both with 0. Item normalizat ion o f a sem antics matrix A factors its rows into unit vector s; the user no rmalization factors its colum ns into unit vectors — by setting A i • = A i • k A i • k 2 A • u = A • u k A • u k 2 Comment. The purpose of balan cing and n ormalization o f raw sema ntic m atrices is to factor out th e aspects of r ating that are ir relev ant for th e in tended an alysis. Whether a p ar- ticular a djustment is app ropriate o r n ot de pends on the in- tent, and on the available da ta. E.g., padding the available ratings by assigning the a verage r ating to all unrated items may be usefu l in some cases, but it ske ws the data wh en the sample is small. 2 In the r est of the paper , we assume that all such adjustments h ave been a pplied to data as appr opriate, and we focus on the methods for extr acting information fr om them. Classiﬁcati on Throu gh patter n matrice s a nd laten t semantics, the ob- jects an d the properties lend a m eaning to each oth er . The simple m ethod f or e xtracting that me aning is based on th e gen eral ideas o f Princip al Compo nent Analysis (Jolliffe 1986). This metho d underlies not on ly the vec- tor sp ace based approac hes, like Latent Seman tics Ind exing (LSI) (Deerwester et al. 1990), or Hyp ertext Ind uced T opic Search (HITS) (Klein berg 1999), but also, albeit in a less ob- vious w ay , Formal Con cept Analysis (FCA) (W ille 1982), and som e other ap proache s. The gen eral idea is that the latent sem antical structures can be obtain ed by factoring the pattern matrix th rough suitable transfor mations, requ ired to preserve a conceptua l distance between the objects, as well as between th eir p roperties. T hese d istance-preserv ing transform ations can be captured under the abstract notion of isometry . 2 E.g., when only one rating is av ailable f rom a user , t hen ex- trapolating his averag e rating to the unrated items simply erases all av ailable information. Suppose th at th e r ig o f values is giv en with an inv olutive automor phism ( − ) : R → R , called co njugatio n . I f the values are the com plex nu mbers, R = C , then of course a + ib = a − ib . For general rigs R , conjug ation sometimes boils down to a = a . In any case, any p attern matrix A = ( A iu ) J × U induces an adjoint matrix A ‡ = ( A ‡ ui ) U × J , whose entries are deﬁned to be A ‡ ui = A iu . The inn er pr oduct of vectors x, y ∈ R J can now be deﬁned as h x | y i = y ‡ ◦ x . Deﬁnitions. An isome try is a map U : A   / / B such that h U x | U y i = h x | y i hold s f or all x, y . Equivalently , this means that U ‡ U = id A . It is a un itary if both U and U ‡ are isometries. An isometric deco mposition of an op erator B : U → J consists o f isom etries ˆ V : ˆ U   / / U and ˆ W : ˆ J   / / J such that ther e is a (n ecessarily uniq ue) map ˆ B : ˆ J → ˆ U satisfying B = ˆ W ˆ B ˆ V ‡ U B / / ˆ V ‡ & & & & J ˆ W ‡     ˆ U ˆ B / / J * ˆ V f f ˆ J *  ˆ W G G The spectral decomposition B = ¯ W ¯ B ¯ V ‡ is minimal among B ’ s isometric decompo sitions: U B / / ¯ V ‡ ) ) ) ) ˆ V ‡ ' ' ' ' J ˆ W ‡     ˆ U ˆ B / / J * ˆ V g g ˇ V ‡     ˜ J ˇ W ‡     *  ˆ W G G ¯ U 6  ˇ V U U ¯ B / / ¯ J 6  ˇ W U U  ( ¯ W T T in the sense that fo r ev ery isometric decomp osition B = ˆ W ˆ B ˆ V ‡ , there is an isometric deco mposition ˆ B = ˇ W ¯ B ˇ V ‡ , such that ¯ W = ˆ W ˇ W and ¯ V = ˆ V ˇ V . W e furth er also need Correlation matric es are the self-ad joint m atrices in the form M J = AA ‡ and M U = A ‡ A , i.e. M J ij = X u ∈ U A j u · A iu M U uv = X i ∈ J A iu · A iv Examples of cla ssiﬁcatio n through iso metric decompositio n Giv en a pattern matrix J × U A / / R , we set J = R J U = R U so that A becom es a linear operator A : U → J , d eﬁned by the usual matrix action on the vectors. Latent Semantic Indexing. (Deer wester et al. 199 0) L et the rig o f values R b e the ﬁeld of re al numb ers R , with the trivial conjugatio n r = r . This mea ns that J = R J and U = R U are real vector spaces. The pattern matrix J × U A / / R induc es th e linear operato r U A / / J and the adjoint J A ‡ / / U is just the transpose. The isometric decomposition boils d own to the singular value dec omposition . Th e isometries V : U ′   / / U and W : J ′   / / J are ob tained by the spectral d ecompo si- tion of the symme tric m atrices M U = A ‡ A and M J = AA ‡ . Since both decom pose thro ugh the same ran k space , with the sam e spec trum Λ = { λ 1 ≥ λ 2 ≥ . . . ≥ λ n } , we get a p ositiv e diag onal matrix Λ such that A ‡ A = V Λ V ‡ and AA ‡ = W Λ W ‡ , from which A = W DV ‡ follows for D = √ Λ . The e igenspaces of M U and M J can be v iewed as pure topics cap tured b y th e patter n ma trix A . Th e eige n values correspo nd to the degree of sem antical relevance of each topic in the data set fro m which the p attern matr ix was ex- tracted. If U are users and J item s, then the eigenspaces in U can be thought of as tastes , the eigenspaces in J as s tyles . Remarkably , there is a b ijectiv e corre sponden ce between the two, and the eigen values quantify t he correlations. As an in- stance o f the sam e deco mposition , Kleinberg’ s (1999) anal- ysis of Hyperlink Ind uced T opic Sea rch (HITS) yield s a sim- ilar correspon dence between the hubs and the authorities on the W eb. In all cases, the und erlying v iew is that th e infor- mation consumers an d th e information pro ducers, lending each other the latent semantics, shar e a unifor m concep tual space. An even simpler presentatio n of th at op timistic view is Formal Concept Analysis. (Ganter, Stumme, & W ille 200 5) Let the rig of values R now be the distrib utiv e lattice B = (2 , ∨ , ∧ , 0 , 1) , over the und erlying set 2 = { 0 , 1 } , with the n egation ¬ : B → e B as the conjug ation ı = ¬ i . Note that this is now an antimorp hism of B = (2 , ∨ , ∧ , 0 , 1) with th e dual lattice e B = ( e 2 , ∧ , ∨ , 1 , 0) . Th e space of the ob jects is thus the boolean lattice J = 2 J , o rdered by inclusion, wh ereas the space of the p roper ties is the boolean lattice U = e 2 U , ordered by rev erse inclusion. Giv en a pattern matrix, which in this case boils do wn to a binary relation J × U A / / 2 , we consider the induced map U ¬ A / / 2 J , and derive th e monoton e maps B ( X ) = { i ∈ J | ∃ u ∈ X . ¬ uAi } B ‡ ( Y ) = { u ∈ U | ∀ i 6∈ Y . u Ai } which are adjoint to each other in the sense B ( X ) ⊆ Y ⇐ ⇒ X ⊆ B ‡ ( Y ) and by conjugating yield the Galois connection Y ⊆ ¬ B ( X ) ⇐ ⇒ X ⊆ B ‡ ( ˜ ¬ Y ) 2 U B + + 2 J ¬ + + B ‡ k k ˜ 2 J ˜ ¬ k k The spectral decomposition 2 U ¬ B - - ¯ V ‡ ' ' ' ' ˜ 2 J B ‡ ¬ m m ¯ W ‡     ¯ U o o L / / J * ¯ V g g ¯ J *  ¯ W G G is obtained by setting ¯ U = { X ∈ 2 U | M U ( X ) = X } ¯ J = { Y ∈ ˜ 2 J | M J ( Y ) = Y } where th e clo sure operators M U = ( B ‡ ˜ ¬ ) ˜ ◦ ( ¬ B ) and M J = ( ¬ B ) ◦ ( B ‡ ˜ ¬ ) u nfold to M U ( X ) = { u ∈ U | ∀ i ∈ J . ( ∀ v ∈ X . i Av ) ⇒ iAu } M J ( Y ) = { i ∈ J | ∀ u ∈ U . ( ∀ j ∈ Y . j Au ) ⇒ iAu } Note that M U is o btained by co mposing the m atrices ¬ B and B ‡ ˜ ¬ over the space 2 J , where the comp osition ˜ ◦ is dual to the usual one, i.e. ( P ˜ ◦ Q ) ik = V j ( P ij ∨ Q kl ) . It is easy t o see that the lattices of closed sets ¯ U and ¯ J are isomorph ic, because they are both isomorphic wit h L =  h X, Y i ∈ P U × P J | B ( X ) = ¬ Y ∧ B ‡ ( Y ) = ¬ X  This is the for m in which a concep t lattice is usually pre- sented ( Ganter, Stu mme, & W ille 20 05). The fact th at the spectral co mposition is minimal mean s that it cor relates users’ str ongest tastes , captured in ¯ U with items’ str ongest styles , captured in ¯ J . Remark. While LSI is a standard, well-studied data min- ing meth od, FCA ha s b een less familiar in th e da ta analy sis commun ities, alth ough an early propo sal of a concept-lattice approa ch can be traced back to the earliest d ays of the infor- mation r etriev al research (Salton 1968), predating both FCA and e ven the standard vector sp ace model. More recently , though , the applicatio ns of FCA in information retrie val have been tested and explained ( Carpineto & Romano 2004; Priss 2006; Poshyvanyk & Marcu s 2007). The succin ct pre- sentation of LSI and FCA as special cases of the same p at- tern, in our abstract model ab ove, poin ts to the fact that the Singular V alue Deco mposition, on which LSI is b ased, and the Galois Connections, that lead to FCA, both subsume un- der the abstract structure of isometric deco mposition, just instantiated to th e rig of reals for LSI, and to the b oolean rig f or FCA. The simple structure of isometr ic decomp osi- tion, and th e correspo nding no tion of conceptu al distance, can thus be constru ed as th e b asic building block o f seman- tical classiﬁcation in data analysis. It tur ns o ut that already this rudimentary structure leads into quantum statistics. Concept latt ices are not distrib utiv e While classical measures are deﬁned over σ -algeb ras, w hich are distributive (a nd b oolean) as lattices, quantum mea- sures are deﬁned over a more general family of algebras, which need not be d istributi ve lattices, but only orthomo du- lar (Meyer 1986; Meyer 1993; Redei & Summers 2006). A cru cial, frequ ently mad e ob servation, eventually lead - ing into q uantum statistics, is that the lattices of co ncepts, and of topics, induced by the various forms of latent s eman- tics, are not distributive . Indeed , sin ce the lattice structure is induced by x ∧ y = x ∩ y x ∨ y = M ( x ∪ y ) the closure op erator M often disturbs the distributi vity of the underlying set-theo retic operation s. Th e observation that this non-distributi vity of concept latti ces lifts to the realm of informa tion r etriev al is due to van Rijsbergen. For reader ’ s conv enience, we r epeat the intuitiv e exam ple of x ∧ ( y ∨ z ) 6 = ( x ∧ y ) ∨ ( x ∧ z ) from (van Rijsbergen 2004, p. 36 ). In a taxonom y of animals, take x = ”b ird”, y = ”h uman” an d z = ”lizzard ”. Th en both x ∧ y an d x ∧ z are empty , so that ( x ∧ y ) ∨ ( x ∧ z ) rem ains empty . On the other hand, y ∨ z = ”vertebrates”, because vertebrates ar e the smallest class including bo th hum ans an d lizzards. Hence x ∧ ( y ∨ z ) = ”birds” is not empty . The po int is that such phen omena arise fr om all forms o f latent semantics. But beyon d this point, there are e ven more speciﬁc indications of quantum statistics at work. Similarity and ranking At the core of the vector space mo del of information re- triev a l, data mining and other forms of data analysis lies the idea th at the b asic similarity m easure, ap plicable to pairs of objects, or of attributes, or to th e mixtures ther eof, is ex- pressible in term s o f the inn er prod uct of their normalized (often also balanced ) vectors: s( i, j ) = h A j • | A i • i = X u ∈ U A j u · A iu s( u, v ) = h A • u | A • v i = X i ∈ J A iu · A iv More gen erally , using th e inner p roduct one can also mea- sure th e similarity of pure topics x and y , viewed as line ar combinatio ns of the proper ty vectors: s M ( x, y ) = h x | A ‡ A | y i = h Ax | Ay i In the same vein, the ranking o f mixed topics, represented by the subspaces E of t he s pace of prop erties, then co rrespon ds to the trace operato r: tr M ( x ) = h x | A ‡ A | x i = h Ax | Ax i tr M ( E ) = X x ∈ B E tr M ( x ) Noting that a correlation matrix M = A ‡ A amoun ts to what is in qua ntum statistics called an o bservable , we see th at the rank ing measures, alr eady in th e stand ard vector model, correspo nd to quantum measures. If the pattern matrices are further more n ormalized as to generate the correlation matri- ces with a unit trace, then they correspond to quantum prob- ability distributions, or to quantum states. Bell’ s inequality of simil arities In th is ﬁnal section, we attempt to use the described mea- sure of similarity of users’ tastes, derived from their past ratings of similar item s, to predict the prob ability that th ey will a gree in their future ratin gs. Althou gh b ased o n a sim- ple, in tuitiv e view o f similarity and a greement, this pr edic- tion turns out to be impossible, as it leads to a contradic tion. This im possibility result can be viewed as an ind icator of a quantum statistical correlation , or at least as evidence tha t there is a pr oblem with the straigh tforward statis tical mo del of this simple situation. The contradictio n arises along the lines o f Bell’ s der iv a- tion o f his n otable inequality (Bell 1964). Mor e prec isely , for any pair of users x, y ∈ U , repr esented b y the unit vec- tors x, y : J → R , d eriv e from their past r atings of the same items, we co nsider th e rand om variables X , Y : J ′ → { 0 , 1 } , over a possibly larger set of items. Suppose that X ( i ) = 1 means th at the u ser x likes the item i , an d that X ( i ) = 0 means that she does not like it. W e assume that the prob a- bility P( X = Y ) ∈ [0 , 1] th at X and Y will agree is propor- tional to their past similarity s( x, y ) ∈ [ − 1 , 1] , m odulo the rescaling o f [ − 1 , 1 ] to [0 , 1] . T his induces a constraint on th e similarities. Proposition. Let the p ast pr eferences of x 0 , x 1 , y 0 , y 1 ∈ J be given as u nit vectors x 0 , x 1 , y 0 , y 1 : U → R . If the pr ob- ability of th eir future agreement is determined by r escaling the similarity of their past pr e fer en ces P( X = Y ) = 1 + s( x, y ) 2 then their similarities must satisfy the following condition: s( x 0 , y 1 ) + s( x 1 , y 1 ) + s( x 1 , y 0 ) − s( x 0 , y 0 ) ≤ 2 (1) This follows from the genera l fact th at the disagreemen t o f { 0 , 1 } -valued rand om v ariables is a distance function. Lemma. Any th r ee rando m variables X , Y , Z : J → { 0 , 1 } satisfy P( X 6 = Z ) ≤ P( X 6 = Y ) + P( Y 6 = Z ) (2) Proof . Let W X Y : U → { 0 , 1 } be the rand om v ariable W X Y ( i ) =  1 if X ( i ) 6 = Y ( i ) 0 if X ( i ) = Y ( i ) W e claim that W X Z ≤ W X Y + W Y Z (3) T owards the c ontradictio n, supp ose that there is j ∈ J with W X Z ( j ) > W X Y ( j ) + W Y Z ( j ) . This means th at W X Z ( j ) = 1 , but W X Y ( j ) = W Y Z ( j ) = 0 , and thus X ( j ) 6 = Z ( j ) but X ( j ) = Y ( j ) and Y ( j ) = Z ( j ) — which is clea rly impo ssible. The refore (3) m ust b e tru e. But since P( X 6 = Y ) = E ( W X Y ) , av eraging (3) gi ves (2).  Proof of the Proposition . Since P( X = Y ) = 1+s( x,y ) 2 , it follows tha t P( X 6 = Y ) = 1 − s( x,y ) 2 . Su bstituting this into (2) gi ves (1).  Corollary . The pr obability o f u sers’ future agr e ement P( X = Y ) canno t be derived by rescaling th e past simi- larities of the ir tastes s( x, y ) , wher e the similarity measur e s is deﬁned by the inner pr od uct. The reason is that formula (1), which would have to be satisﬁed, does not always hold. Proof . The taste vectors x 0 = (1 , 0) , y 0 = ( − 1 , 0) , x 1 =  − 1 2 , √ 3 2  and y 1 =  1 2 , √ 3 2  provide a cou nterexample for (1).  Interpretation. Why is it not justiﬁed to predict future agreemen ts fro m past similarities, both deﬁned in intuitively obvious ways? One line of explanation is that the indep en- dence a ssumptions are violated . As usua lly , th e de penden - cies can be explained in terms of hidd en v ariables (e.g., off- line interaction s of the u sers), o r in terms of non- local in- teractions. Another line of explanation is that the dep en- dencies ar e intr o duced in the model itself . Intuitively , this means tha t the users, whose ag reements ar e pr edicted, h av e not be en sam pled in the same measure spa ce, and that th eir preferen ces should not be statistically mixed. Remark. Rather than deriv ed from similarity , users’ se- mantical distance can be d eﬁned by P ( X = Y ) = | Ax − Ay | ∞ . A read er familiar with qu antum p robability theory (Meyer 1986; Meyer 1993) will recognize this in teraction of the Hilbe rt space ℓ 2 and the Banach space ℓ ∞ , which acts on it as a von Neum ann algebra, as the familiar interface between the quantu m and the classical prob abilities. Conclusion and futur e work W e h av e shown that alrea dy in the basic, b ut sufﬁciently abstract m odels o f in formation re triev al, data mining, and other f orms o f data a nalysis, a suitable version of Bell’ s ar- gument app lies, su ggesting that the quantum statistical ap - proach may be necessary . The simp le interpretation o f Bell’ s argumen t is th at the quantum statistical pred ictions refer to no n-local interac- tions. More subtle interpretations lead into th e issues of con- textuality (Bell 198 7, p. 9). In some cases, of course, b oth the non-loca l interactions and the contextual depe ndencies arise as a ﬁg ment of the statistical mo del, mixing variables that canno t be sample d to gether . Either way , the version of the argumen t pr esented above sug gests simple minded pre - diction based on the vector space model of information pro- cessing in a n etwork may lead to p roblems if th e locality o f the interactions is n ot taken in to acco unt. I s it possible that genuine entangle ment phenomena arise on a network? After a moment of thought about this question, one gets a strange feeling that q uantum p robability migh t in fact be ea s- ier to c ompreh end in the realm of network comp utation, than in p hysics. 3 While actio n at a distance is a highly unintuitive pheno menon in physics — Einstein called it ”spooky” — in 3 Perhaps like the theory of parallel univ erses, which seems to hav e more con vincing i nterpretations in everyd ay life, and in dis- tributed comp utation, than in physics. network compu tation it can b e reduc ed to the fact that the in- formation may ﬂow no t only th rough the n etwork links, but also off th e network. This fact is not only intuiti vely natu ral, in the sense th at, say , the data o n th e W eb move not o nly in packets, along the Internet links, b ut they a lso get teleported from site to site, by peop le talking to each other , and then typing on their keyboards; b u t it is also informatio n theoret- ically robust, in th e sense th at there are always covert chan- nels. In abstract models, th ey can be represented in terms of no n-local hidde n variables, or in terms o f en tanglemen t. Either way , the operational content of quantum statistical methods will undoubted ly broaden the algorithmic horizons of n etwork comp utation an d d ata analy sis, already by an - alyzing the meaning of the notable q uantum algorithms in physics-fre e implementation s. Convenient t oolkits for com- bining quantu m states, and for composing quan tum o per- ations ( Coecke & P avlovic 2007) are likely to acqu ire new roles in latent semantics. On th e other hand, the gen eric no- cloning and n o-bro adcasting theorems (Barnum et al. 2 006) are likely to p oint to some interesting statistical limitations, with a potential impact in security . 4 Acknowledgement. I am gr ateful to Eleanor Rief f el for pointing out an error in an ear lier version of this abstract, caused by some of my notational abuses. Refer ences [Abramsky & Coecke 2004] Abramsky , S., and Coe cke, B. 2004. A categorical seman tics of qua ntum pr otocols. In Pr oceed ings of the 19 th Annua l IE EE Symposium on Logic in Comp uter S cience (LI CS) . IE EE Com puter So ciety . Also arXiv:quant-ph /0402130. [Azar et al. 2001] Azar, Y .; Fiat, A. ; K arlin, A. R.; McSh- erry , F .; and Saia, J. 2001. Spectral analysis of data. In AC M Symposium on Theory of Computing , 619–6 26. [Barnum et al. 2006] Barnu m, H.; Barrett, J.; Leifer, M.; and W ilce, A. 2006. Cloning and broad casting in g eneric probab ilistic theories. [Bell 1964] Bell, J. S. 1964. On the Einstein- Podolsky- Rosen paradox . Ph ysics 1:195– 200. [Bell 1987] Bell, J. S. 1987 . Spe akable a nd Unspeaka ble in Quantum Mechanics . Cambridge Univ ersity Press. [Carpineto & Romano 2004] Carp ineto, C., and Romano, G. 2004 . E xploiting the p otential of co ncept lattices for in- formation retrieval with credo . J o urnal of Universal Com- puter Science 10(8):9 85–10 13. [Coecke & P a vlovic 2007] Coecke, B., an d Pa vlovic, D. 2007. Quantu m measurements without su ms. In Chen, G.; Kauffman, L.; and Lamonaco, S., eds., Mathematics of Quantum Computing and T echnology . T aylor an d Francis. 559–5 96. [Deerwester et al. 1990] De erwester , S. C.; Dum ais, S. T .; Landaue r , T . K. ; Furnas, G. W .; and Harshma n, R. A. 1990 . 4 One direct consequence o f the no-cloning theo rem see ms to be that only classical styles can be copied. Indexing by latent s emantic analysis. Journal of t he Amer- ican Society of Information Science 41(6):3 91–40 7. [Everitt 1984] Everitt, B. 1984. An In tr odu ction to Laten t V ariable Models . London: Chapm an & Hall. [Ganter, Stumme, & W ille 200 5] Ganter, B.; Stumme , G.; and W ille, R., eds. 2005. F ormal Concep t Analysis, F o un- dations and Applicatio ns , volume 3626 of Lecture Notes in Computer Science . Springer . [Jolliffe 1986] Jolliffe, I. T . 1986 . Princip al Comp onent Analysis . Sprin ger Series in St atistics. Spring er-V erlag. [Kleinberg 1999] Kleinberg, J. M. 1999. Auth oritative sources in a hyperlinked en vironmen t. Journal of the A CM 46(5) :604–6 32. [Meyer 1986] Meyer, P .-A. 19 86. ´ El ´ ements de probab ilit ´ es quantiqu es (expo s ´ es I ` a IV). In S ´ eminair e de pr obabilit ´ es de Strassbour g , volume 120 4,124 7 of Lectur e Notes in Mathematics . Berlin: Sprin ger-V erlag. [Meyer 1993] Meyer, P .-A. 19 93. Quantu m Pr oba bility for Pr ob abilists . Number 1538 in Lecture Notes in Mathemat- ics. Springer-V erlag. [Page et al. 1998] Page, L.; Brin, S.; Motwani, R.; and W in ograd, T . 1 998. The PageRank citation r anking: Bring- ing ord er to the W eb. T echnic al report, Stanford Digital Library T echn ologies Project. [Poshyvanyk & Mar cus 2007] Po shyvanyk, D., and Mar- cus, A. 2007. Combining form al concept a nalysis with informa tion r etriev al for conc ept location in source code. In ICPC ’0 7: Pr oce edings of the 1 5th IEE E I nternation al Confer ence on Pr ogram Compr ehension , 37–48. W ashin g- ton, DC, USA: IEEE Computer Society . [Priss 2006] Priss, U . 2006. Formal co ncept analysis in informa tion science. In Cro nin, B., ed., Annua l Review o f Information Science and T echno logy , volume 40. [Redei & Summers 2006 ] Redei, M., and Sum mers, S. J. 2006. Qu antum p robability theo ry . T o app ear in Studies in the History and Philosophy of Modern Physics . [Salton 1968 ] Salton, G. 19 68. Automatic Informa tion Or - ganization and Retrieval. McGraw Hill T ext. [van Rijsbergen 2004] van Rijsbergen, C. J. 2004. The Ge- ometry of I nformation Retrieval . New Y ork, NY , USA: Cambridge Uni versity Press. [W ille 1982] Will e, R. 1982. Restru cturing la ttice theo ry: an appr oach based o n hierar chies of conce pts. In Riv al, I. , ed., Or dered Sets . Dordrecht: Dan Reidel. 44 5–47 0.

On quantum statistics in data analysis

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment