Projection pursuit for discrete data

IMS Collectio ns Probability and St atistics: Essays i n Honor o f David A. F reedman V ol. 2 (2008) 265–288 c  Institute of Mathematical Statistics , 2008 DOI: 10.1214/ 19394030 70000004 82 Pro jection pursuit for discrete data P ersi Diaconis 1 and Julia Salzman ∗ 2 Stanfor d University Abstract: This paper develops pro j ection pursuit for discrete dat a using the discrete Radon transf orm. Dis cr ete pro jection pursui t is presented as an ex- ploratory metho d f or ﬁnding inf ormative low dimensional views of data such as binary vectors, rankings, phylogene tic trees or gr aphs. W e show that for most data sets, most pro jections are close to uniform. Thus, informativ e sum- maries are ones deviat ing from uniformity . Syllabic data from s ev eral of Pl ato’s great works i s used to illustrate the methods. Along with some basic distribu- tion theory , an automated pr ocedure for computing informative pro jections is int ro duced. 1. In troductio n Pro jection pursuit is an explor atory graphical to ol for pictur ing high dimensional data through low dimensional pro jections. In tro duced b y Krusk al [ 35 ], [ 36 ], and developed by F riedman and T u key [ 28 ], the idea is to hav e the computer select a small family of pro jections by numerically optimizing an index of “interest”. The origina l pro jectio n indices were a d ho c. In joint work with David F reedman [ 15 ], it was shown that for most data sets, most pro j ections are ab out the sa me: approximately Gaussian. T he r efore, the in teresting pro jections , the ones whic h were sp ecial for this data set, are pro jectio ns that ar e far from Gaussian. Peter Hub er [ 32 ] found his own version of this: pro jections ar e uninformative if they are unstructure d or “ra ndo m”. Thus pro jections with high entrop y are unin- formative. F or a ﬁx ed sca le, distributions having high ent ropy ar e approximately Gaussian. Hub er also show ed that the F riedman-T uk ey index is a meas ure of non- Gaussianess. The purp ose of the present pap er is to g ive a pa rallel developmen t for data in discrete spaces : collections of binary vectors, r ankings, phylogenetic trees or sets of graphs . W e develop a no tion of pr o jection a s a par tition of the disc rete data int o blo cks. W e show that most for most data se ts , most pro jections ar e close to uniformly pa rtitioned. This sugges ts that the infor mative summar ies are the ones with splits that a re far from uniform. The outline o f the pap er is as follows. Deﬁnitions and ﬁrst examples are given in Sec tion 2. The ideas lean on classical developments in blo ck desig ns and g ive new applications for that theo ry . A discrete version o f the Rado n transfor m along with a n inv ersion theory is presented, determining when a c ollection of pr o jections ∗ Supported in part b y NSF Grant DMS-02-41246. 1 Stanford Uni v ersity , Departmen t of Mathematics and Department of Statistics, Sequoia H all, 390 Serra Mall, Stanford CA, 94305, USA 2 Stanford Uni v ersity , Departmen t of Statistics, Sequoia Hall, 350 Serra M all, Stanford CA 94305, U SA, e-mail : julia.sa lzman@gm ail.com AMS 2000 subje ct classiﬁc ations: 44A12, 62K10, 90C08. Keywor ds and phr ases: binary vector, discrete data, discrete Radon transform, l east uniform partition, phylogenet ic tree, Plato, pro jection pursui t, ranking, s yllable patterns. 265 266 P. Diac onis and J. Salzman loses info r mation. Sectio n 3 gives a da ta analy tic example in s o me detail. The data arise from the pr oblem of putting some of Plato’s works in chronologica l order. Her e, discrete pro jection pursuit lea ds to the discovery of a strik ing , easily int erpretable structure that do es no t appe a r in o ther a nalyses o f this data (eg. Ahn et a l. [ 1 ], Cox and Brandwoo d [ 11 ], Charnomor dic and Holmes [ 8 ], Wishart and Leach [ 49 ]). Section 4 pr ov es that fo r most data sets, mos t partitions lea d to approximately unifor m pro jections. This lea ds dire ctly to a usable cr iteria: a pro jection is int eresting if it is far fr o m uniform. The distance to unifor mit y can be measured by a ny distance b etw ee n probabilities, a nd we co nsider the well-kno wn total v ariation, Hellinger a nd V ass erstein metrics . The ﬁnal section gives results for the le ast uniform pro jection. Theorem 5.1 shows that if the c lass of pro jections is not to o rich, for example, the aﬃne hy- per plane in Z k , then for most data sets even the least unifor m partitio n is close to uniform. If the class of pro jections contains ma ny sets, then lea st unifor m pro- jections are “structur ed”. The ﬁnal theore m attacks the problem of a data analys t ﬁnding “structure” in “ noise”. Computational details for computing the metrics and automating the analysis are in an App endix. There has been extensive development o f pro jection pursuit for density esti- mation (F riedma n et al. [ 26 ]), r egress ion (F riedma n a nd Stuetzle [ 27 ], Hall [ 3 0 ]), applications to time series (Donoho [ 22 ]), discriminant ana ly sis (Posse [ 4 2 ], Polzehl [ 41 ]) and standard mult iv aria te problems s uch as cov aria nce estimation (Hwang e t al. [ 33 ]). This has led to a healthy developmen t captur ed in the mo dern implemen- tations (Xgobi, Ggo bi). O nline documentation for this softw a re is an instructive catalog. W e hav e not a ttempted to develop our ideas in these dir ections, but the beg inning steps of r idge functions will b e found b elow. 2. Pro jections and Radon transforms This section introduces our notatio n and set up for working with discr ete data. It deﬁnes pro jectio n bases, the discr ete Radon transfo r m a nd g ives exa mples with binary data and p ermutation data . Analysis will b e p erformed o n binary n -tuple data from several works of P la to. Let X b e a ﬁnite se t. Let Y b e a class of subsets of X . L e t f : X → R b e a function. The Rado n transfor m o f f at y ∈ Y is deﬁned by ¯ f ( y ) = X x ∈ y f ( x ) . (1) The class Y is called a pro jection base if: | y | is co nstant for y ∈ Y ( | y | denotes the cardinality of Y ) . (2) There is a par tition p 1 , . . . , p j of Y such that each p i is a partition o f X . (3) F or a pa rtition p , the num bers ¯ f ( y ) y ∈ p will b e ca lle d the pro jection of f in direction p . The sets in Y may b e thought of as “lines” in a g e ometry . If lines in the s a me par titio n are called parallel, then ( 3 ) cor resp onds to the Euclidea n axio m: for every po int x ∈ X a nd every line y ∈ Y , ther e is a unique line y ∗ parallel to y such that x ∈ y ∗ . In the statistics literature, designs with pro pe rty ( 3 ) are called “resolv able” (See Heday at et a l. [ 31 ] or Constantine [ 10 ] for examples). As s umption ( 2 ) guar a ntees that pro jections are based on averages o v er co mpa rable s ets. Consider the following exa mples: Pr oje ction pursuit for discr ete data 267 T ab le 1 Per centage distribution of sentenc e endings Type o f ending Rep. Laws Phil. P ol. Soph. Tim. ∪ ∪ ∪ ∪ ∪ 1.1 2.4 2.5 1.7 2 .8 2.4 - ∪ ∪ ∪ ∪ 1.6 3.8 2.8 2.5 3 .6 3.9 ∪ - ∪ ∪ ∪ 1.7 1.9 2.1 3.1 3 .4 6.0 ∪ ∪ - ∪ ∪ 1.9 2.6 2.6 2.6 2 .6 1.8 ∪ ∪ ∪ - ∪ 2.1 3.0 4.0 3.3 2 .4 3.4 ∪ ∪ ∪ ∪ - 2.0 3.8 4.8 2.9 2 .5 3.5 - - ∪ ∪ ∪ 2.1 2.7 4.3 3.3 3 .3 3.4 - ∪ - ∪ ∪ 2.1 1.8 1.5 2.3 4 .0 3.4 - ∪ ∪ - ∪ 2.8 0.6 0.7 0.4 2 .1 1.7 - ∪ ∪ ∪ - 4.6 8.8 6.5 4.0 2 .3 3.3 ∪ - - ∪ ∪ 3.3 3.4 6.7 5.3 3 .3 3.4 ∪ - ∪ - ∪ 2.6 1.0 0.6 0.9 1 .6 3.2 ∪ - ∪ ∪ - 4.6 1.1 0.7 1.0 3 .0 2.7 ∪ ∪ - - ∪ 2.6 1.5 3.1 3.1 3 .0 3.0 ∪ ∪ - ∪ - 4.4 3.0 1.9 3.0 3 .0 2.2 ∪ ∪ ∪ - - 2.5 5.7 5.4 4.4 5 .1 3.9 - - - ∪ ∪ 2.9 4.2 5.5 6.9 5 .2 3.0 - - ∪ - ∪ 3.0 1.4 0.7 2.7 2 .6 3.3 - - ∪ ∪ - 3.4 1.0 0.4 0.7 2 .3 3.3 - ∪ - - ∪ 2.0 2.3 1.2 3.4 3 .7 3.3 - ∪ - ∪ - 6.4 2.4 2.8 1.8 2 .1 3.0 - ∪ ∪ - - 4.2 0.6 0.7 0.8 3 .0 2.8 ∪ ∪ - - - 2.8 2.9 2.6 4.6 3 .4 3.0 ∪ - ∪ - - 4.2 1.2 1.3 1.0 1 .3 3.3 ∪ - - ∪ - 4.8 8.2 5.3 4.5 4 .6 3.0 ∪ - - - ∪ 2.4 1.9 5.3 2.5 2 .5 2.2 ∪ - - - - 3.5 4.1 3.3 3.8 2 .9 2.4 - ∪ - - - 4.0 3.7 3.3 4.9 3 .5 3.0 - - ∪ - - 4.1 2.1 2.3 2.1 4 .1 6.4 - - - ∪ - 4.1 8.8 9.0 6.8 4 .7 3.8 - - - - ∪ 2.0 3.0 2.9 2.9 2 .6 2.2 - - - - - 4.2 5.2 4.0 4.9 3 .4 1.8 no. sentence s 3778 3783 958 77 0 919 762 Example 2.1. X = Z k 2 the set of binar y k -tuples. Here is a concrete exa mple of a data set with this structure; L. Brandwoo d classiﬁed ea ch sentence of Plato’s R epubli c accor ding to its last ﬁve syllables . These can run from all sho rt ( ∪ ) through all long (-). Identifying ∪ with 1 and - with 0, each sentence is as s o ciated with a binary 5-tuple. As x r a nges over Z 5 2 , let f ( x ) denote the prop or tion of s e ntences with ending x . The v alues of f ( x ) are g iven in the ﬁrst co lumn of T a ble 1 . A second example of data with this structure is the result of grading cor- rect/incor rect in a test with k questions. There are several useful choices o f Y given next: 2.1. Pr oje cti ons for data in Z k 2 2.1.1. Mar ginal pr oje ctions in Z k 2 F or i = 1 , 2 , . . . , k , let y 0 i = { x ∈ Z k 2 : x i = 0 } , let y 1 i = { x ∈ Z k 2 : x i = 1 } . The sets Y = { y j i } , 1 ≤ i ≤ k , j ∈ { 0 , 1 } for m a pro jection base. In the P lato example, 268 P. Diac onis and J. Salzman the pro jections hav e a simple interpretation a s the prop ortion of sentences with a sp eciﬁc syllable in the i th place. Displaying pro jections o ﬀers no pr oblem her e; a single num b er suﬃces. A second natural choice of Y gives seco nd o r der mar gins. This is based on sets y ab ij = { x ∈ Z k 2 : x i = a, x j = b } , 1 ≤ i < j ≤ k , a, b ∈ { 0 , 1 } . In this case, a pro jection co nsists of 4 n um ber s. In the Plato ex ample, the pro jection alo ng co or- dinates i, j g ives the pr op ortion of sentences with each of the 4 p ossibles patterns ∪ ∪ , ∪ -, - ∪ , - - in p o sitions i, j . T able 3 in Section 3 is an e xample of one metho d to display such pro jections. Sec tio n 2 c o ntains an analys is of the da ta in T a ble 1 based on these pro jections. The ana lysis gives a clea r interpretation to a classica l wa y of dating the bo o ks of Plato. The analy s is is independent of the other e xamples in this section and can b e read at this time. Here a re some examples to show how the structure o f f is r eﬂected in ¯ f . If f ( x ) = δ x,x 0 , ¯ f ( y ) = 1 if x 0 ∈ y and zero other wise. If f ( x ) = 1 2 k for a ll x , then ¯ f ( y ) = | y | 2 k and hence is constant for all y . As a ﬁnal example, conside r a ﬁxed, non-zero vector y ∗ ∈ Z k 2 . Let S b e the hyperpla nes deter mined by y ∗ : S = { x ∈ Z k 2 : x · y = 0 mo d 2 } . Let f ( x ) =  1 2 k , if x ∈ S , 0 , otherwise . An easy computation shows ¯ f ( y 0 z ) =  1 , if z = y ∗ , 1 2 , otherwise , ¯ f ( y 1 z ) =  0 , if z = y ∗ , 1 2 , otherwise . The hype rplane transfor m is ess entially the same as the ordinar y F o urier tr ans- form on the g roup Z k 2 . This is deﬁned by ˆ f ( z ) = X x ( − 1) x · z f ( x ) . If f is a probability on Z k 2 , ˆ f ( z ) = 2 ¯ f ( y 0 z ) − 1. The trans form ˆ f has b een widely used for data ana lysis of this type of data. See So lomon [ 44 ] or Diaco nis [ 18 , 19 ], Chapter 1 1. The discr ete Radon tr ansform with pr o jections onto aﬃne hyperpla nes is also used by Ahn et al. [ 1 ]. 2.1.2. Aﬃne hyp erpla nes in Z k 2 This is one na tural wa y of “ﬁlling out” the mar ginal pr o jections presented ab ov e. F or z ∈ Z k 2 and a ∈ { 0 , 1 } , let y a z = { x ∈ Z k 2 : x · z = a mo d 2 } . The collection Y = { y a z } z ∈ Z k 2 , a ∈{ 0 , 1 } forms a pr o jection base. Observe that when z has a 1 in po sition i and z e ros els ewhere, y a z equals the y a i of the previo us example. The sets in Y are the a ﬃne hyper planes in Z k 2 . Similar ly , the aﬃne planes of any dimension form a pro jection base. An analy sis of the Pla to data us ing all aﬃne hyperplanes is in App endix A.3 b e low. Pr oje ction pursuit for discr ete data 269 2.2. Pr oje cti ons for data in X = S n , the sets of p ermutations of n letters. Perm utation data arises in taste tes ting , ranking a nd elections; for ex ample, in pre s- ident ial elections of the American Psychological Asso ciation, mem ber s are asked to rank order 5 candidates. Here, for a p ermutation π, f ( π ) is taken as the prop ortio n of voters choo s ing the o rder π . F or background and ma ny examples, see Critchlow [ 12 ], Fligner and V erducci [ 25 ] o r Marden [ 39 ]. 2.2.1. Partitions b ase d on mar ginal pr oje ctions of p ermut ations in S n . Let y ij = { π ∈ S n : π ( i ) = j, 1 ≤ i , j ≤ n } . These sets for m a pro jection base. F or ﬁxed i , the s ets y i 1 , y i 2 , . . . , y in form a partition p ( i ). The pro jection in directio n p ( i ) ha s a natural interpretation in the ex a mple: how did p eo ple rank candidate i ? The pro jection ca n b e dis play ed b y making a histog ram. A second useful choice of Y is based on c o nsidering tw o p ositio ns: y kl ij = { π ∈ S n : π ( i ) = k , π ( j ) = l } with i 6 = j, k 6 = l . This lea ds to pro jections giving the joint rankings of a ﬁx ed pair of ca ndida tes in the exa mple. Such pro jections ca n be displayed by mak ing a 2- dimensional picture and g ray scaling the ( i, j ) s q uare to corr esp ond to the prop ortio n of voters ranking the pair of candidates in order ( i, j ). Simila rly third and higher o rder pro jections can b e deﬁned. 2.2.2. Partitions b ase d on sub gr oups of S n . When X is a gr oup such as S n , the following constructions for Y a re av ailable. Let N b e a subgroup of X . The or bits o f N acting on X a re the co sets { N y } y ∈X , and the dis tinct orbits partition X . V ary ing N by conjugation, { y N y − 1 } y ∈X , gives a pro jection base for X . When N is taken as S n − 1 = { π ∈ S n : π (1) = 1 } the pro jections ar e the marg inal pro jections deﬁned a bove. T aking N a s S n − 2 = { π ∈ S n : π (1) = 1 , π (2 ) = 2 } gives the second o r der mar gins. An imp ortant class of s ubgroups are the s o -called Y oung subgroups: let λ 1 ≤ λ 2 ≤ · · · ≤ λ n be a pa rtition of n so P i λ i = n . Let S λ 1 × S λ 2 × · · · × S λ n be the p er m utations that p ermute the ﬁrst λ 1 elements among themselves and the nex t λ 2 elements among themselves, etc. These include the pr evious examples a nd provide enough tr ansforms for an inv ersion theory , as will b e s hown b elow. Display of s uch pr o jections is not a well studied pro blem. In the case of a pr o jection corr esp onding to a Y oung subgr oup, one suggestion is a 1-dimensional histogram using one o f the orderings suggested in Cha pter 3 of James [ 34 ]. If X = G/H wher e G is a gr oup and H is a subgroup and G ⊂ N ⊂ H , with N a subgroup, then the or bits of N in X ar e a partition and the or bits of { g N g − 1 } g ∈ G form a partition base. One approa ch to the display of such pro jections is a 2- dimensional histogr a m using the o rdering given by one of the metrics suggested in Chapter 7 o f Diaco nis [ 18 ]. 2.3. Pr oje cti ons for X = R p : Eucli de an data. Consider da ta vectors x 1 , x 2 , . . . , x n ∈ R p . F or γ in the p -dimensional unit sphere, the pr o jection in directio n γ is just γ · x 1 , . . . , γ · x n . This is the cla ssical Rado n 270 P. Diac onis and J. Salzman transform, with Y consisting of the aﬃne hype r planes y t γ = x ∈ R p : x · γ = t . F or ﬁxed γ these partition the space R p as t v aries, a nd the partitions v ar y as γ v aries. In statistical a pplications, a histogra m is made of γ · x i and one v a r ies γ , trying to un- derstand the structure of the p -dimensional data from the v arying histograms. This leads to the cla ssical version of pro jection pursuit considered in the introduction. 2.4. Pr oje cti ons w hen X is a ﬁnite set with n el ements, and Y is the class of k -element subsets. In this example, if k divides n , it is a no n-trivial theor em of Ba rany ai that Y forms a pro jections bas e . Details and discuss ion may be found in Camer on [ 7 ]. This example o c c urs natur ally when considering extensio ns of a given class o f par titions. F or exa mple, consider the mar g inal pro jections y a i in Z k 2 deﬁned ab ov e. These sets all have cardinality | y a i | = 2 k − 1 . It is natural to consider the extension to pro jections based on the cla ss o f all subsets o f car dinality 2 k − 1 . 2.5. Uniqueness of R adon t r ansforms: W e now consider the questio n: when is f → ¯ f one to o ne? A conv enien t cr iteria inv o lves the notion of a blo ck design. Let |X | = n . The class of sets Y is a blo ck design with pa rameters ( n, c, k , l ) provided | y | = c for all y ∈ Y , (4) each x ∈ X is contained in k subsets y , (5) each pair x 6 = x ′ is contained in l subsets y . (6) Aﬃne planes or Z k 2 and k sets of an n s et are blo ck designs. A g r eat many o ther examples ar e discussed in the litera tur e of combinatorial designs. In the statistics literature they are sometimes called ba la nced incomplete blo ck designs. In the c o m- binatorial litera ture they are often ca lled 2-des ig ns, or 2 -( n, c, l ) designs. It is easy to see that the parameters n, c, k , l satisfy |Y | c = nk, (7) ( n − 1) l = k ( c − 1) . (8) Bailey [ 3 ], Dembroski [ 1 4 ] and Lander [ 38 ] a re useful references for blo c k designs. The following result is well k nown in the theory of designs. W e ﬁrst le a rned it from Bolker [ 4 ]. Theorem 2.2. If X is a ﬁnit e set and Y is a blo ck design with |Y | > 1 , then the R ad on tr ansform f → ¯ f is one to one, with an explicit inversion formula given by ( 12 ) b elow. Pr o of. F or any x , X y : x ∈ y ¯ f ( y ) = k f ( x ) + l X s,s ′ ∈X x 6 = x ′ f ( x ′ ) (9) = ( k − l ) f ( x ) + l X x ∈X f ( x ) . (10) Pr oje ction pursuit for discr ete data 271 If P x ∈X f ( x ) = 1, this determines f as f ( x ) = 1 k − l X y : x ∈ y ¯ f ( y ) − l k − l . (11) Observe that k > l follows from the assumption that |Y | > 1. When P x ∈X f ( x ) is not known, it ca n b e recov ered by summing b oth sides o f ( 9 ) in x . This gives X x ∈X f ( x ) = c k − l + nl X y ∈Y ¯ f ( y ) and so the inversion formula f ( x ) = 1 k − l X y : x ∈ y ¯ f ( y ) + l c ( k − l ) 2 + nl ( k − l ) X y ∈Y ¯ f ( y ) . (12) Remarks. • It is not necessar y that Y be a blo ck desig n for f → ¯ f to b e o ne to one. F or example, K ung [ 37 ] shows that the Radon transform is one to one when Y consists of the sets of rank i in a matroid. Diaconis and Graha m [ 17 ] give examples where the transform is one to one when Y consists of the nea rest neighbors in a metr ic spa c e. F or example, when X = Z 2 k 2 and Y cons ists o f the balls of Hamming distance less than or equal to 1, the transfor m is one to one, and a n explicit inv ersion theorem is known. When X is S n , the sym- metric gro up, and Y is unit balls in the Cayley metric , the transfor m is o ne to o ne if and only if n is in { 1 , 2 , 4 , 5 , 6 , 8 , 10 , 12 } . F urther work on inv ersion formulas for functions o n ﬁnite symmetric spac e s is fo und in V ela squez [ 47 ] and fo r functions on the torus Z k n in Dedeo and V elasquez [ 13 ]. Fill [ 24 ] dis- cusses in vertibilit y when the Radon tr ansform of f a t x av erages over a set o f translates of f ( x ) which ha s applications to directiona l da ta and time series. • The transfor m can s till b e useful and interesting if it is not one to o ne . F or example, the ma rginal pr o jections in the example ab ov e do not capture all asp ects of the data but are often the ﬁrs t things to b e lo o ked at. In Z k 2 , if high enoug h margina l distributions are considere d, the function f can b e completely recov ered. In the symmetric gr o up, the pro jections co rresp ond- ing to all Y oung subgr oups determine f b eca use they determine its F our ier transform. See Diaconis [ 18 ] for details. 3. Data analysis of syllable patterns in the w orks o f Pl ato This sectio n presents a new a na lysis of da ta arising from syllable patterns in the works of Pla to. The data a re given in T able 1. It recor ds, for ea ch of 6 b o oks, the pa ttern of lo ng (-) and sho rt ( ∪ ) syllables among the last 5 syllable s in ea ch sentence. It is known that Plato wrote R epublic early and L aws late. Plato also men tions that he changed his rhyming patterns over time. This led Br andwoo d to collect the da ta in T able 1. The other b o oks were written b etw een these but it is not k nown in what order . The goal of the a nalysis is to try to order the bo ok s. Our a pproach will b e to study the b o oks one at a time, trying to ﬁnd patterns. Pro jection pursuit sugges ts lo oking at v a r ious par titions of the data, sea r ching for structured partitions which ar e far from uniform. Using ﬁrs t and second order 272 P. Diac onis and J. Salzman margins as partitions, a r easonably str iking diﬀerence b etw een R epublic and L aws is observed. This sugges ts a simple, interpretable way of order ing the other b o o k s as R epublic, Timaeus, Sophist, Politicus, Philebus, L aws . This ag rees with the sta nda rd ordering as discuss e d in Brandwo o d ([ 6 ], pg. xv iii) and in Ahn e t a l. [ 1 ]. Other analyses of this data set are in Co x and Bra ndwo o d [ 1 1 ], A tkinson [ 2 ], Wishart a nd Leach [ 49 ], Bonev a [ 5 ], and Char nomordic and Holmes [ 8 ]. [ 11 ] contains a history and explana tion for the choice o f da ta. The ﬁrst thre e analyses all us e statistica l mo dels. Bonev a’s analysis uses a form o f sca ling . None of the prev ious a na lyses seem to hav e picked up the simple, striking pa ttern in the data that pro jectio n purs uit leads to. The analysis is prese nted b elow, in a somewha t discur sive style, in the order it was a ctually p e rformed: ﬁrst lo oking a t the R epublic , then L aws and ﬁnally the other b o oks. In the Appendix, we present a mor e automated and formal version. 3.1. Rep ublic T able 2 shows the ﬁrst order margins ; e.g., the prop ortio n of se ntences with ∪ in po sition i , 1 ≤ i ≤ 5. Roughly , p ositions 1- 4 ar e evenly divided b etw een long and short. The last p o - sition is clear ly diﬀerent. T able 3 shows the second order ma rgins. A glance at T able 3 shows that the ﬁrst order eﬀects ar e all to o v isible in the second order margins. F or exa mple, the num bers in the ﬁrs t co lumn ( ∪ ∪ ) a re all “ small” while the num bers in the last column a r e “lar ge”. One simple wa y of adjusting for the ﬁr st order structure is to divide each num ber in T able 3 by the pro duct of the marg inal tota ls . F or example, in the ﬁr s t r ow, .194 would b e divided by ( . 465)( . 472 ) (from T able 2) while . 271 would be divided by ( . 465)(1 − . 472 ). The results are shown in T able 4. Most of the ratios are close to 1, so a pro duct model is a rea sonable ﬁrs t descrip- tion. T he pro jection pursuit a ppr oach suggests that a par tition of the data (here a row) is “interesting” if the partition is far from uniform. By eye, lo ok ing at T a ble 4, po sitions (1 , 2 ) , (2 , 3) , (3 , 4) , (4 , 5) ar e far fro m b eing all 1. Observe that these po sitions are adjac e nt, a s ( i , i + 1). Next obser ve that each of the 4 designated rows has a co mmon pattern: the ﬁrst and last entries are small, the middle tw o ent ries a re lar ge. Go ing back to the T ab le 2 First or der mar gins for Republic P osition 1 2 3 4 5 Prop or tion of ∪ 0.465 0.471 0.466 0.511 0.36 2 T ab le 3 Se co nd or der mar gins for Republic P osition ∪∪ ∪ - - ∪ - - (1,2) 0.194 0.271 0.277 0.258 (1,3) 0.208 0.257 0.258 0.277 (1,4) 0.238 0.227 0.272 0.263 (1,5) 0.177 0.288 0.185 0.350 (2,3) 0.209 0.262 0.257 0.272 (2,4) 0.241 0.230 0.269 0.260 (2,5) 0.162 0.309 0.200 0.329 (3,4) 0.211 0.255 0.299 0.235 (3,5) 0.170 0.296 0.192 0.342 (4,5) 0.167 0.343 0.195 0.295 Pr oje ction pursuit for discr ete data 273 deﬁnitions, this pattern ar ises fro m a negative as s o ciation o f adjac e n t syllables; in the Republi c , adjacent syllables tend to alternate. The pattern in p ositions (1 , 3) shows that this cannot be a complete descr iption; after a ll, if the sym bo ls alternate, the p ositions tw o apa rt should be p ositively asso cia ted, but (1 , 3 ) displays negative asso ciatio n. Lo oking a t the other r ows of the table, we obse rve that the s ize go es big, small, small, big or its opp osite, small, big, big , small. This is a n artifact. Consider the ﬁrst row of T able 4. It was formed from 4 prop ortio ns that s um to 1: w, x, y , z say . The 4 adjusted entries are w ( w + x )( w + y ) x ( w + x )( x + z ) y ( y + z )( y + w ) z ( z + y )( z + x ) . It is ea sy to show tha t the ﬁrst entry is less than 1 if a nd only if the seco nd is larger than 1 , if and o nly if the third is lar ger than one, if and only if the fourth is les s than 1. This means that the ﬁrst column in T a ble 4, together with the ﬁrst order mar gins, determines the r emaining entries. This artifact in no way reﬂects on the asso c iation pa ttern noted ear lier– the mos t s tructured rows corr esp ond to adjacent syllables, and adjacent syllables are nega tively asso ciated. 3.2. La ws and a c omp arison with Republi c . The ﬁrst o r der mar gins for L aws ar e only slightly diﬀerent from tho se in R epublic (see T a ble 5). The pattern is the sa me: ov erall, fewer than half ∪ ’s; the las t p osition sharply smaller. The similar ity b etw een the ﬁrst order marg ins in R epublic and La ws sug - gests that sec ond or higher order margins must b e used to order the remaining bo oks. The analo g of the ﬁrs t column of T able 4 is given in T able 6. The entries ab ov e are the prop o rtion of sentences with ∪ ∪ in the ( i, j ) p osition divided by the pro duct of the mar ginal pr o p ortions. T ab le 4 djuste d se c ond or der mar gins for Republic P osition ∪ ∪ ∪ - - ∪ - - (1,2) 0.89 1.10 1.10 0.91 (1,3) 0.96 1.00 1.00 0.97 (1,4) 1.00 1.00 1.00 1.00 (1,5) 1.10 0.97 0.96 1.00 (2,3) 0.95 1.00 1.00 0.96 (2,4) 1.00 1.00 1.00 1.00 (2,5) 0.95 1.00 1.00 0.97 (3,4) 0.89 1.10 1.10 0.90 (3,5) 1.00 1.00 0.99 1.00 (4,5) 0.90 1.10 1.10 0.94 T ab le 5 First or der mar gins for Laws P osition 1 2 3 4 5 Prop or tion of ∪ ∪ 0.477 0.489 0.411 0.599 0.375 T ab le 6 A djuste d se c ond or der mar gins for La ws P ositions (1,2 ) (1,3) (1,4) (1,5) (2,3) Adjusted ∪ ∪ 1.07 1.03 0.92 0.99 1.43 Po sitions (2,4) (2,5) (3,4) (3,5) (4,5) Adjusted ∪ ∪ 0.97 0.98 1.04 1.09 1.02 274 P. Diac onis and J. Salzman Again, pairwise a djacent p o sitions are as so ciated, all in the sa me w ay . Here, the asso ciatio n is po sitive, whereas for R epu bl ic , the asso ciation is negative. This is the striking pattern referr ed to above. It suggests a metho d of ranking the other b o oks: compare the sign pattern or actua l ra tios of the a djusted second order margins of other b o oks with Re public and L aws . F or deﬁniteness, the sum of abs o lute deviations b etw een seco nd order ma rgins ov er all 10 p ositions will b e used. This is carr ied out data analytica lly in Sections 3.3–3.5 . 3.3. Analysis for Philebus and P oliticus These bo o k s are somewhat similar to each o ther. The ﬁrst and second order mar gins for Philebus ar e given in T ables 7 and 8. Note the diﬀerence in ﬁrst order ma rgins: betw een Philebus and R epublic (or L aws ) po sition 1 is high, a s are p ositions 4 a nd 5. F or se cond o rder margins, the adjacent patterns ar e all p os itively asso cia ted ((2,3) b eing truly ex tr eme). Compar - ing T able 8 with T able 6, the asso ciatio n pattern matches L aws in dir ection, exc e pt in p osition (1 ,5 ). The relev an t av erages for Politicus are given in T ables 9 a nd 10. The ﬁr st or der mar g ins are, very roughly , like those in b oth Rep ublic a nd L aws , but a gain the third p osition ha s a lo w prop ortion o f short syllables. The s econd order marg ins hav e the same pattern as L aws . The same rema rks made for the second order ma r gins o f Philebus apply . Both Philebus and Politicus seem very similar to L aws . Which of these tw o is closest to L aws ? One simple a pproach is to consider the sum of the absolute v a lues of the diﬀerences b etw een the entries of T ables 8 and 6 alo ng with the diﬀerences betw een 1 0 a nd 6. The sum for L aws to Philebus is .64 , while the s um for La ws to Politicus is .83 . Thus a tentativ e rank ing is: Politicus , Philebus , L aws . T ab le 7 First or der mar gins for Philebus P osition 1 2 3 4 5 Prop or tion of ∪ 0.522 0.464 0.398 0.594 0.46 5 T ab le 8 A djuste d se c ond or der mar gins for Philebus P ositions (1,2 ) (1,3) (1 ,4) (1,5 ) (2, 3) Adjusted ∪ ∪ 1.11 1.03 0.85 1.11 1.48 Po sitions (2,4) (2,5) (3,4) (3,5) (4,5) Adjusted ∪ ∪ 0.92 0.85 1.02 0.95 1.01 T ab le 9 First or der mar gins for Politicus P osition 1 2 3 4 5 Prop or tion of ∪ 0.477 0.457 0.348 0.524 0.46 9 T ab le 10 A djuste d se c ond or der mar gins for Politicus P ositions (1,2 ) (1,3) (1 ,4) (1,5 ) (2, 3) Adjusted ∪ ∪ 1.17 1.10 0.96 1.01 1.26 Po sitions (2,4) (2,5) (3,4) (3,5) (4,5) Adjusted ∪ ∪ 0.86 0.90 1.05 1.10 1.13 Pr oje ction pursuit for discr ete data 275 3.4. Analysis for Sophist and Timaeus These bo o ks a re q uite similar to e a ch o ther and, as w e shall see, quite diﬀer e n t from L aws , Philebus a nd Politicus . The ﬁrst o rder marg ins ar e quite diﬀerent from the b o ok s examined previo usly . They are ro ughly consistent with all s yllables b eing e q ually likely to b e long or short. The ﬁrst order pattern see ms closest to Poli ticus . The second order asso ciations ar e closer to 1 than in L aws , Politicus or Philebus . Adjacen t p os itions are p ositively asso ciated, exc e pt for (3,4). The directio n o f asso cia tion matc hes L aws in only 6 of 10 p o sitions. The sum of abso lute devia tions b etw een the entries of T a bles 6 and 12 is . 87. W e now g ive the analys is for the ﬁna l b o o k. A dis tinctiv e feature of the ﬁr st or der margins is the large pro po rtion of sho rt syllables in the third pos ition. The adjusted second order margins are close to 1 , so Timaeus seems close st to S ophist . Of the 4 adjace nt po sitions, tw o show p ositive asso ciatio n and tw o sho w negativ e as so ciation. The direction of asso ciation matc hes L aws in 6 p ositions; the s um o f abs olute deviations b etw een T ables 14 and 6 is . 94 . The distance b etw een Timae us and the R epublic (T ables 14 and 4 ) is . 6, so Tima eus seems c lo ser to R epublic than to L aws using this measure . Because of the decrease in the num ber of matches and the increase in the sum of absolute deviations, it seems reasona ble to rank or der the three as R epubli c , Timaeus , S ophist . This completes the discussion of this example. The App endix co ntains an automated version. 4. Most pro jections are uniform Graphical pro jection pursuit is a standard to ol in data analysis. The classica l survey of Hub er [ 32 ], the s urvey of Posse [ 42 ] a nd the o nline do cumentation in the Xgobi and Ggobi pack ages contain extensive p ointers to a larg e literatur e . T ab le 11 First or der mar gins for Sophist P osition 1 2 3 4 5 Prop or tion of ∪ 0.474 0.491 0.454 0.527 0.48 7 T ab le 12 A djuste d se c ond or der mar gins for Sophist P ositions (1,2) (1,3) (1,4 ) (1,5) (2,3)) Adjusted ∪ ∪ 1.07 1.03 1.01 0.93 1.07 Po sitions (2,4) (2,5) (3,4) (3,5) (4,5) Adjusted ∪ ∪ 0.88 1.01 0.97 0.98 1.10 T ab le 13 First or der mar gins for Timaeus P osition 1 2 3 4 5 Prop or tion of ∪ 0.494 0.476 0.565 0.521 0.49 6 T ab le 14 A djuste d se c ond or der mar gins for Ti m aeus P ositions (1,2 ) (1,3) (1 ,4) (1,5 ) (2, 3) Adjusted ∪ ∪ 0.98 1.02 0.97 1.04 0.92 Po sitions (2,4) (2,5) (3,4) (3,5) (4,5) Adjusted ∪ ∪ 0.94 0.97 0.96 0.97 1.06 276 P. Diac onis and J. Salzman The theorems of this sec tion imply that for most data sets f ( x ), most pro jections ¯ f ( y ) a re ab out the sa me: close to uniform. This necess itates pro jection pursuit – choosing pr o jections that a re far from uniformly distributed – to determine w ha t is sp ecial a b o ut a particular f . This gives an independent rationale for Hub er’s suggestion that Euclidean pr o jections ar e interesting if they are far from uniform in the sense of having minimum entropy (of course , the uniform distribution o n a ﬁnite set has max imum entrop y). Theorem 4 .1. L et X b e a ﬁn ite set with n elements. L et Y b e a blo ck design with blo ck size c (so | y | = c for y ∈ Y ). L et f : X → R b e any function and let µ ( f ) = P x ∈X f ( x ) . L et y b e chosen uniformly in Y . Then E ¯ f ( y ) = c n µ ( f ) , (13) v ar ¯ f ( y ) = c n (1 − ( c − 1) ( n − 1) ) µ ( f − µ ( f ) n ) 2 . (14) Pr o of. ( 13 ) follows from computing E ¯ f ( y ) = 1 |Y | X y ¯ f ( y ) = 1 |Y | X x f ( x ) | y : x ∈ y | = c |Y | µ ( f ) . F or ( 14 ), assume without los s of g enerality , that µ ( f ) = 0 . Then v ar( ¯ f ( y )) = 1 |Y | X y ¯ f ( y ) 2 = 1 |Y | X y    X x ∈ y f ( x )( f ( x ) + X x 6 = x ′ x,x ′ ∈ y f ( x ′ ))    = k − l |Y | µ ( f 2 ) . F rom ( 7 ) and ( 8 ), k − l |Y | = c ( n − c ) n ( n − 1) , g iving the result. Example 4 .2. When Y is the j sets o f an n set, |Y | =  n j  , c = j , and the r esult reduces to the usual mean and v a riance for a sample witho ut replacement. Example 4.3. Let X = Z k 2 and Y b e the j-dimensiona l aﬃne planes. Then n = 2 k and c = 2 k − j . If µ ( f ) = 1, the result b eco mes E ( ¯ f ( y )) = 1 2 j , v ar( ¯ f ( y )) = 1 2 j (1 − 2 k − j − 1 2 k − 1 ) µ ( f − 1 2 k ) 2 . F or future use, o bserve that the car dinality of Y in this case is 2 j (2 k − 1)(2 k − 2) · · · (2 k − 2 j − 1 ) (2 j − 1) · · · (2 j − 2 j − 1 ) . Returning to the situa tio n in Theorem 2.2 , Chebyc hev’s inequality implies: Corollary 4.4. Wit h notation as in The or em 2.2 , the pr op ortion of y ∈ Y su ch that | ¯ f ( y ) − c n µ ( f ) | > ǫ is smal ler than 1 ǫ 2 c n (1 − c − 1 n − 1 ) µ ( f − µ ( f ) n ) 2 . Pr oje ction pursuit for discr ete data 277 Remarks. The corolla r y implies that for functions f which ar e “not to o wild” in the sense tha t µ ( f − µ ( f ) n ) 2 is small, most transfo rms ¯ f ( y ) are uninfor mative in the sense of b eing close to their mean v a lue. As a n example, take X = Z 5 2 and f the function deﬁned by the ﬁrst column of T able 1. Then µ ( f − 1 32 ) 2 = . 002 1 . If Y is taken a s the set of all aﬃne hyperplanes, the cor ollary gives that 95 % of the transforms have | ¯ f ( y ) − 1 2 | < . 04. The next theorem says that for most probabilities f , µ ( f − 1 n ) 2 is small (abo ut 1 n ). Theorem 4.5. L et ( U 1 , U 2 , . . . , U n ) b e chosen u niformly on the n s implex. F or lar ge n , the r ando m variable n 3 / 2 2 n X i =1 ( U i − 1 n ) 2 − 1 n ! has an appr oximate standar d normal distribution. Pr o of. The a rgument uses the r epresentation of a uniform distr ibutio n by means of ex po nent ial v aria bles. Let X 1 , X 2 , . . . , X n be indep endent standa rd exp onential v aria bles with density e − x on [0 , ∞ ). Let S 1 = n X i =1 X i , S 2 = n X i =1 X 2 i . F or lar ge n , the rando m vector  Z 1 Z 2  = 1 √ n  S 1 − n S 2 − 2 n  has a n appr oximate biv ariate nor mal distribution with mean vector zero and c o - v aria nce matrix  1 4 4 20  . T o chec k the cov ar iance matrix, no te that v ar( S 1 − n √ n ) = v ar( X 1 ) = 1 , v a r( S 2 − n √ n ) = v ar( X 2 1 ) = 20 a nd 1 n E (( S 1 − n )( S 2 − 2 n )) = E  ( X 1 − 1)( X 2 1 − 2)  = E ( X 3 1 ) − E ( X 2 1 ) − 2 E ( X 1 ) + 2 = 4 . Represent a uniform vector o n the n simplex as U i = Z i S 1 . Then n X i =1 ( U i − 1 n ) 2 = 1 S 2 1 n X i =1 X 2 i − 1 n = 1 S 2 1 n X i =1 ( X 2 i − 2) + 2 n S 2 1 − 1 n . Now S 1 = n (1 + Z 1 √ n ) with Z 1 = S 1 − n √ n . Thus S 2 1 = n 2 (1 + 2 √ n Z 1 + Z 2 1 n ) . Using the sta nda rd O p notation (see Pratt [ 43 ]), 1 S 2 1 = 1 n 2 − 2 Z 1 n 3 / 2 + O p ( 1 n 3 ) . 278 P. Diac onis and J. Salzman Thu s, 1 S 2 1 n X i =1 ( X 2 i − 2) = 1 n 3 / 2 1 √ n n X i =1 ( X 2 i − 2) + O p ( 1 n 2 ) , 2 n S 2 1 = 2 n − 4 Z 1 n 3 / 2 + O p ( 1 n 2 ) . The biv a riate limiting normality of  Z 1 Z 2  implies that Z 2 − 4 Z 1 has an a pproximate normal distribution with mean 0 and v ariance v ar( Z 2 ) + 16 v ar ( Z 1 ) − 8 cov a r( Z 1 , Z 2 ) = 4 . Corollar y 4.4 a nd Theorem 4.5 imply that for most probabilities f , mos t trans- forms ¯ f ( y ) ar e close to uniform. The ﬁnal result of this sec tio n deals w ith the entire pro jection ¯ f ( y ) y ∈ p where p is a partition of X into blo cks in Y . Let X b e a ﬁnite set. Let Y b e a blo ck design on X with par ameters ( n, c, k , l ). Supp ose that Y is also a pro jection base for X with p 1 , p 2 , . . . , p j being a partition of Y , and each p i being a partitio n of X . O f course, j = |Y | c n . The next theo r em implies that for most functions, the pro jection onto a randomly chosen par tition is uniformly c lo se to c n . Theorem 4.6. L et Y b e a blo ck design on X with p ar ameters ( n, c, k , l ) . Supp ose that Y is a pr oj e ction b ase. L et f b e a ﬁx e d pr ob abi lity on X . L et the p artition p b e chosen uniformly at r ando m over al l p artitions p i of X , wher e p i ⊂ Y . F or ǫ > 0 , X y ∈ p | ¯ f ( y ) − c n | ≤ ǫ. (15) with pr ob ability at le ast 1 − 1 ǫ  n ( n − c ) c ( n + 1) µ ( f − 1 n ) 2  1 2 . Pr o of. The pro ba bilit y mo del for choo sing a random partition is based on a ﬁxed enum eration p 1 , p 2 , . . . , p j of the par titions that make up Y . Each partition is as- sumed to b e taken in a ﬁxed order p i = { ( y 1 i , . . . , y n/c i ) } . The random v ariable S ( p ) = P y ∈ p | ¯ f ( y ) − c n | is inv ariant under p ermuting the y ∈ p amo ng them- selves. Thus a random v ar iable with the same distribution of S ( p ) but ex changeable ¯ f ( y ) y ∈ p exists. F o r this realizatio n, E ( P y ∈ p | ¯ f ( y ) − c n | ) = n c E | ¯ f ( y ∗ ) − c n | with y ∗ chosen uniformly in Y . Using Cauch y-Sc h w artz and Theorem 4.1 , the ex p ecta tion is b ounded ab ove b y n c s c n  1 − c − 1 n − 1  µ ( f − 1 n ) 2 . Theorem 4.6 follows from this b ound a nd Markov’s ineq uality applied to the original random v ariable. Remarks . F rom Theorem 4.5 , µ ( f − 1 n ) 2 . = 1 n for most functions f . F or such f , the theorem implies that for larg e blo ck size c , mo st partitions are close to uniform in v aria tion distance. This may b e contrasted with Theor ems 4.1 a nd 4.5 which imply that the compo nents ¯ f ( y ) of most pro jections a re close to c n . When c is small, there are many terms in the sum ( 15 ). As a n exa mple, consider the 2- sets o f a n n set Pr oje ction pursuit for discr ete data 279 where n = 2 j . L et p b e a rando m partition into 2-e lement sets. Le t f be chosen at random from the n simplex and p any ﬁxed partition into tw o e le men t sets. It is straightforward to s how that with pro ba bility tending to 1 as n tends to inﬁnit y , X y ∈ p | ¯ f ( y ) − 2 n | → 8 e − 2 . The analo g ous result holds with the same assumptions when p is any ﬁxed par- tition of ﬁxed size c . Similar ly , it is natura l to ask for a central limit theorem in connection with The o rems 4.1 and 4 .5 . F or j sets o f an n set, such a theorem is av a ilable from the usual res ults on sampling without r eplacement from a ﬁnite pop- ulation. Most likely , there is a similar set of results for block designs with |Y | and c large. See Stein [ 45 ] for results for designs ar is ing from subgro ups of a ﬁnite g roup. 5. Least uniform partitions The re s ults of Sec tio n 4 imply that, under suitable conditions, for most functions the pro jection along most partitions is close to unifor m. This sugges ts tha t the sp ecial prop erties o f particula r functions ar e only seen in partitions that are fa r from uniform. In this se c tion, pr op erties o f leas t uniform partitions are examined. Theorem 5.1 shows that for most functions, even the least uniform partitions will be close to uniform if the the num b er o f sets in Y is small in the se ns e tha t log |Y | is small compared b oth to n and the blo ck size c . This is true, in particular , for aﬃne hyperpla nes in Z k 2 . Theorem 5.1 . L et X b e a set of n elements. L et Y b e a class of subsets in X of ﬁx e d c ar dinality c . Supp ose that p 1 , . . . , p j is a p artition of Y into p artitions of X . L et f b e chosen at r andom in the n simplex. Le t p ∗ b e the p artition in p i that maximizes P y ∈ p | ¯ f ( y ) − c n | . F or any ǫ > 0 , X y ∈ p ∗ | ¯ f ( y ) − c n | < ǫ, exc ept for a set of f ’s of pr ob abili ty smal ler than ( |Y | + 1) β with β e qual to 1 minu s 1 β ( c, n ) Z c n (1+ ǫ ) c n (1 − ǫ ) x c − 1 (1 − x ) n − c − 1 dx, (16) wher e β ( c, n ) denotes the b eta function. Pr o of. Repr esent the i th comp onent of a r andomly chosen f a s X i S where X i are independent standard exp onentials and S = P n i =1 X i . Let y ∗ be the set in Y with the larges t v alue of c n (1 − ǫ ). The a rgument beg ins b y bounding the pr obability that | ¯ f ( y ∗ ) − c n | < ǫ c n . T o beg in with, P  ¯ f ( y ∗ ) < c n (1 − ǫ )  ≤ P  X 1 + · · · + X c S < c n (1 − ǫ )  . 280 P. Diac onis and J. Salzman F urther, P  ¯ f ( y ∗ ) > c n (1 + ǫ )  ≤ X y ∈Y P  ¯ f ( y ) > c n (1 + ǫ )  = |Y | P  X 1 + · · · + X c S > c n (1 + ǫ )  . Next, let y ∗ denote the set in Y with the sma lle st v alue of ¯ f ( y ). T o b ound the probability that | ¯ f ( y ∗ ) − c n | < ǫ c n , obser ve that ¯ f ( y ∗ ) = 1 − ¯ f ( y ∗∗ ) with y ∗∗ the union of sets in a partition omitting the one element that maximize s ¯ f . Thus, P  ¯ f ( y ∗ ) < c n (1 − ǫ )  = P  ¯ f ( y ∗∗ ) > 1 − c n (1 − ǫ )  ≤ |Y | P  X 1 + · · · + X n − c S > 1 − c n (1 − ǫ )  = |Y | P  X 1 + · · · + X c S < c n (1 − ǫ )  . F urther, P  ¯ f ( y ∗ ) > c n (1 + ǫ )  = P  ¯ f ( y ∗∗ ) > 1 + c n (1 − ǫ )  ≤ P  X 1 + · · · + X n − c S < 1 − c n (1 + ǫ )  = P  X 1 + · · · + X c S > c n (1 + ǫ )  . Summing the four b ounds th us o btained we see that b oth | ¯ f ( y ∗ ) − c n | < ǫ c n , | ¯ f ( y ∗ ) − c n | < ǫ c n (17) except for a set of f ’s o f probability smaller than ( |Y | + 1) β as deﬁned by ( 16 ). Now ( 17 ) implies tha t | ¯ f ( y ) − c n | < ǫ c n for all y ∈ Y . Summing this la st inequality over the partition p ∗ completes the pro of of the theorem. Remarks. The b eta integral that app ea rs in the b ound is straig ht forward to ap- proximate numerically . A r aft of techniques and approximations app ear in the ﬁrst chapter of Pearson [ 40 ]. F or example, co nsider cases where c n = 1 2 . Then, using the Peizer-Pratt approximation given in Pearson [ 40 ], and Mills’ ratio, the β in ( 16 ) is approximately 2 √ 2 π e − x 2 2 1 + x with x = s 2 c lo g 1 4( 1 2 − ǫ )( 1 2 + ǫ ) . F or this to b e small when multiplied b y |Y | + 1, it cle a rly suﬃces that log |Y | b e small compared to c . This is the cas e fo r the a ﬃne subs pa ces of dimension j in Z k 2 if j is bo unded and k is la rge. As a numerical ex ample, consider the a ﬃne hyper planes in Z 10 2 . Then |Y | + 1 = 2049, c = 512, n = 1024. T aking ǫ = 0 . 1, ( |Y | + 1 ) | β . = 2 . 595 × 10 − 7 . The next theor em s hows that when ther e are many s e ts in Y , the lea s t unifor m pro jection is typically far from uniform. The theorem dea ls with n s ets in a set Pr oje ction pursuit for discr ete data 281 of car dinality 2 n . The v aria tion dista nc e of a t ypical probability pro jected along the least uniform half split is shown to b e ab out 0 . 3 . This may b e compare d with Theorems 4.5 and 4.6 which s how that for a t ypical probability f on 2 n p oints, | ¯ f ( y ) − 1 2 | is close to zero fo r most sets y o f ca r dinality n . Theorem 5.2. L et f b e chosen at r ando m on the 2 n simplex. L et S − b e the sum of the n smal lest f ( x ) . Then for lar ge n , the r andom variable √ 2 n  S − − ( 1 2 − log 2 2 )  has an appr oximate normal distribution with me an 0 and varianc e 3 2 − 2 lo g 2 . Pr o of. Repr esent a randomly chosen f as X i S where X i are indep endent standard exp onential random v ariables and S = P 2 n i =1 X i . Denote the or der statistics b y round brack ets: X (1) ≤ X (2) ≤ · · · ≤ X ( n ) . Let L 1 = X (1) , L 2 = X (2) − X (1) , . . . , L 2 n = X (2 n ) − X (2 n − 1) . Then the L i are independent, and L i +1 has the distribution o f a standard exp onential times 1 (2 n − i ) – see F e ller ([ 23 ] Section I I I.3). With this notation, S = 2 n X i =1 X i = 2 n − 1 X i =0 (2 n − i ) L i +1 , (18) S − = 1 S n X i =1 X ( i ) = 1 S n − 1 X i =0 ( n − i ) L i +1 . (19) The pro of is c ompleted by approximating the sums in this re pr esentation o f S and S − . Let µ i = n − i 2 n − i , s o ( n − i ) L i +1 has the same distribution as µ i times a standard exp onential. Let σ 2 = 2 n − 1 X i =0 µ 2 i = 2 n − 1 X i =0 (1 − 2 n 2 n − i + n 2 (2 n − i ) 2 ) = 2  n − (2 n log 2 + O (1 )) + 3 2 + n 2 + O (1)  = 2 n  3 2 − 2 lo g 2  + O (1) . Now, let Z 1 = S − 2 n √ 2 n and Z 2 = ( P n i =1 X ( i ) − µ i ) √ 2 n . The vector ( Z 1 , Z 2 ) has a limiting biv aria te normal distribution, with mean (0 , 0) and cov ariance matrix  σ 2 1 ρ ρ σ 2 2  with σ 2 1 = 2 , σ 2 2 = 3 2 − 2 log 2, and ρ = 1 2 (1 − log 2). T o check the v alue of ρ , observe that the cov ariance of Z 1 and Z 2 is 1 2 n times n X i =0 E  h (2 n − i ) L i +1 − 1 i  ( n − i ) L i +1 − 1 2 n − i  = n X i =0 n − i 2 n − i = n − n lo g 2 + O (1) . 282 P. Diac onis and J. Salzman Using the sta nda rd O p calculus, 1 S = 1 2 n 1 (1 + Z 1 √ 2 n ) = 1 2 n  1 − Z 1 √ 2 n  + O p  1 n 2  . In particular , 1 S = 1 2 n + O p  1 n 3 2  . The repres entation ( 19 ) for S − can b e rewritten a s S − = √ 2 n Z 2 X + µ S = Z 2 √ 2 n + 1 − log 2 2  1 − Z 1 √ 2 n  + O p  1 n  . It follows that √ 2 n ( S − − 1 − log 2 2 ) has the s ame limiting distribution as Z 2 − (1 − log 2) 2 Z 1 . This is nor mal with mean 0 and v ariance  3 2 − 2 lo g 2  + 2  1 − log 2 2  − 2  1 − log 2 2  = 3 2 − 2 lo g 2 . Corollary 5. 3. L et f b e chosen at r andom on the 2 n simplex. L et ( y , y c ) b e a p artition of X into an n set and its c omplement which maximizes the value of | ¯ f ( y ) − 1 2 | + | ¯ f ( y c ) − 1 2 | . Then, as n tends to inﬁnity, the max imum discr ep ancy tends to log 2 . = . 30 1 with pr ob abi lity t ending to 1. Pr o of. F or almost all f , the maximum is ta ken o n uniq ue ly at the partition S − , ( S − ) c as deﬁned in Theo r em 5.2 . The maximum discr e pancy equa ls 2 | S − − 1 2 | , and the re s ult follows from Theorem 5.2 . Remark . The pro of of Theorem 5 .2 a nd its cor ollary can ea sily b e extended to cov er the j sets o f an n set. The ar gument s hows that for mo st proba bilities f , the v aria tion distance b etw een the lea st uniform pro jectio n and the uniform distribution is b ounded awa y fr o m zero if j is a n apprecia ble fractio n o f n . F or the ﬁnal theorem, a diﬀer ent metho d of c ho o sing a r andom proba bilit y is int ro duced. Let X be a s et of car dinality 2 n . Fix a n int eger b . Drop b ba lls into 2 n b oxes, a nd let f ( x ) b e the prop or tion o f ba lls in the b ox lab eled x . Let Y b e the s ubsets o f X with cardina lity n . Clearly , if b is la rge with re s pe c t to n , f ( x ) is appr oximately 1 2 n and so for any y ∈ Y , ¯ f ( y ) . = 1 2 , even for the y ∗ minimizing ¯ f ( y ). At the other ex treme, if b is sma ll with resp ect to n , ¯ f ( y ∗ ) will be clos e to zero. F or example, if b = n , ¯ f ( y ∗ ) = 0 . It will follow fro m Theo rem 5.4 that ¯ f ( y ∗ ) is approximately ze r o for v ≤ 2 n lo g 2. This mo del for genera ting a r andom pro bability gives insight into the following problem. If data is genera ted fro m a structur e less mo del, ra ndom ﬂuctuations may pro duce structure that is pick ed up by a rich enough da ta analy tic pro cedure . As b v aries in the ab ove mo del, the random probability conv erges to a unifor m distribution. The following theo r em g ives a n indication of how large b must b e Pr oje ction pursuit for discr ete data 283 T ab le 15 λ 1 2 3 4 5 6 7 8 9 10 2 e − λ λ m m ! 0.74 0.54 0.44 0.40 0.36 0.32 0. 30 0.28 0.26 0.24 for all pr o jections to b e close to uniform. Some re quired no tation: F o r λ < 0, let p λ ( j ) = e − λ λ j j ! denote the Poisson density . Let P λ ( j ) = P j i =0 p λ ( i ) . Let m b e the largest integer with P λ ( m ) ≤ 1 2 , P λ ( m + 1) > 1 2 . Deﬁne θ = θ ( λ ) b y P λ ( m ) + θp λ ( m + 1) = 1 2 , so 0 ≤ θ < 1 . When λ is an integer, Ramanujan show ed that θ = 1 3 + O ( 1 λ ) a s λ → ∞ . See Cheng [ 9 ] for references a nd extens ions of Ramanujan’s results. Theorem 5.4. Su pp ose that n and b tend to inﬁnity in such a way that b 2 n → λ . L et y ∗ b e t he n set with smal lest value of ¯ f ( y ∗ ) . Then | ¯ f ( y ∗ ) − 1 2 | + | ¯ f ( y c ∗ ) − 1 2 | = 2 e − λ λ m m !  1 + θ ( λ m + 1 − 1)  + o p (1) . Remarks. F or λ ≤ lo g 2 a nd m = 0 , the v ar iation distance ca n b e shown to tend to one. F or larg e λ , e − λ λ m m ! is ro ug hly 1 √ 2 π λ ; thus for lar g e λ , the v ar iation distance tends to zero lik e 1 √ λ . T his is not very rapid as T able 15 shows. (Note that for int eger λ , m + 1 = λ , s o the a symptotic v alue of the v ariation distanc e is 2 e − λ λ m m ! . ) Pr o of. The argument will only b e sketched. F or b and n large , the num b er of balls in the i th box has a limiting Poisson distribution with para meter λ , a nd diﬀere nt boxes can b e treated a s indep endent. The arg uments in Dia c onis a nd F reedman ([ 15 ], Section 3 ) can b e used to justify this step. Thu s let X 1 , X 2 , . . . b e indep e nden t Poisson v ariables with mean λ . With prob- ability 1, even tually the median of X 1 , X 2 , . . . , X 2 n is m + 1 and the prop o r tion of X i , 1 ≤ i ≤ 2 n equal to j is p λ ( j ) + o (1) unifor mly for 0 ≤ j ≤ m + 1. Let S − be the sum of the n smallest X i , 1 ≤ i ≤ 2 n . It follows that S − 2 n equals 0 p λ (0) + p λ (1) + · · · + m p λ ( m ) + θ ( m + 1 ) p λ ( m + 1) + o (1) . This sum equa ls λ 2 − e − λ λ m m !  1 + θ ( λ m + 1 − 1)  + o (1) . The identit y asserted in the theorem follows from noting that ¯ f ( y ∗ ) is the limiting v alue of S − 2 λn . App endix: Automating the analysis In Section 2 , we used the a djusted seco nd order margins in a graphical, data a na lytic fashion to seriate the b o ok s of Plato. F or some purp oses, it ma y be desira ble to hav e 284 P. Diac onis and J. Salzman a mor e formal ranking pro cedure. W e carr y this out in Section A.1 . The pr o cedure is based on a co llection o f metrics be tw een probabilities. These are explained in Section A.2. Finally , in Section A.3, we carry out a fully a uto mated analysis o f the Plato data bas ed on all aﬃne pro jections , not just ﬁrs t and second order statistics. W e conclude that most metho ds a g ree, and sugg est that the structures descr ibe d in Section 3 ar e r obustly embedded in the Plato data. In this section, we have added a seven th b o ok, Cr iticus, to the ana lysis. A.1. A metric appr o ach In our data analysis, the adjusted second order statistics emer ged as an informative summary of the rhyming patterns in Plato’s R epublic . As explained in Se c tion 2, this is a vector of ten num bers (one for ea ch pair o f the la s t ﬁve syllables , i.e.  5 2  = 10 ). F or the moment, call this vector p R = ( p R 1 , . . . , p R 10 ) with “ R ” denoting R epubli c . A simila r ten-vector can b e computed fo r ea ch of the other b o ok s. W e may then use the dis tance b et ween these vectors and p R to order the b o oks . B o oks closest to p R are r anked earlier . W e also compute a r anking based on the distance to p L , the adjusted second o r der statistics for Plato’s L aws . These tw o ranking s generally agr e e , and a gree with the conclusio ns of Sectio n 3. T o pr o ceed, we need to cho ose a distance betw een vectors. W e hav e examined three standard dista nc e s b etw een pro ba bilit y vectors: the Hellinger distance H , the T otal V aria tion distance T V , and the V a s serstein distance V . These are expla ine d more car efully in Sectio n A.2. The ra nkings are g iven in T able 1 6 : R denotes Re- public , L denotes L aws , · denotes row v ar iable. Almost the same seriation is obtaine d when any of the three metr ics a re used to compute dista nces be t ween R epublic a nd the other b o ok s. Similarly , almost the same ser iation is o btained when any o f the three metrics are used to compute distances b etw een L aws and the other b o oks. Most c learly , Politicus is clo sest to L aws and furthest from the Rep ublic . Timaeus and Sophist , as a pa ir, a re close st to Re public and furthes t from L aws . How e ver, Sophist is b oth clo s er to L aws and to R epublic than Timaeus . F ro m these calcula tions, as ide fro m Politicus , Philebus is closest to L aws and furthest from R epublic . This is then followed by Criticus . All of this p oints to the o r dering: R epubli c , { S ophist , Timaeus } , Criticus , Phileb us , { Politicus , L aws } . This ordering is co ns istent with the o rdering pro duced data analy tica lly in Sec- tion 3 and with the order ing based o n the exp onential mo del of Cox a nd Brand- woo d [ 11 ]. In Ahn et a l. [ 1 ], a total of ten bo o ks were used for analy sis. They found “roughly three clus ters” (6 18): { Tim. , Soph , Crit. , Pol. * } { L aws , Phil. } , { R ep , *,* } . Here ∗ denotes a b o ok not a nalyzed in our work. Their ﬁnal orde r ing based on a cluster analysis using the Euclidean metr ic is R epublic , Timaeusus , Criticus , Sophist , Politicus , Philebus , L aws . T ab le 16 R anking of b o ok in r ow b ase d on distanc e in c olumn Bo ok d H ( R, · ) d T V ( R, · ) d V ( R, · ) d H ( L, · ) d T V ( L, · ) d V ( L, · ) Tim. 2 2 2 5 5 5 Soph. 1 1 1 4 4 4 Po l. 6 5 6 1 1 1 Crit. 3 3 3 3 3 3 Phil. 4 4 4 2 2 2 La ws. 5 6 5 0 0 0 Pr oje ction pursuit for discr ete data 285 A.2. Some metri cs Let p = ( p 1 , . . . p n ), q = ( q 1 , . . . q n ) be probability vectors. T hus p 1 ≥ 0 a nd p 1 + . . . + p n = 1, and the s ame holds for q . Three widely used metrics are : T otal V ariation: d T V ( p, q ) = 1 2 P i | p i − q i | . Hellinger: d H ( p, q ) = P i ( √ p i − √ q i ) 2 . V asser stein: d V ( p, q ) = min X,Y E ( d ( X, Y )) . where the minim um is over all join t distributions of X and Y with marginals p a nd q . These metrics, their strengths, weaknesses and relations are discussed in Dudley [ 21 ], Villani [ 48 ] a nd Diac onis et al. [ 20 ]. In Section A.1, we used these metr ics betw een vectors o f p ositive entries which did not necess arily have sum one. This was done by for ming ¯ p = P i p i , ¯ q = P i q i , ˜ p = p i ¯ p , ˜ q = q i ¯ q . W e used the distance b etw e e n ˜ p and ˜ q a nd added a p enalty ter m to account for diﬀerences in ma ss b etw een the proﬁles p and q . F or total v a riation, the pena lty was | ¯ p − ¯ q | . W e co mputed and c o mpared tw o p enalty ter ms for Hellinger: bo th | ¯ p − ¯ q | and ( √ ¯ p − √ ¯ q ) 2 . Thu s, the distance s b etw e en the ten-vector of adjusted s econd or der mar gins of R epubli c and the other b o oks, using V asserstein is given in T able 17 . F or completeness , we note that the V asserstein metric requires an underlying distance o n a probability spa ce; in our case , this amounts to a n under lying distance betw een the ten entries in each table. W e take these e n tries to b e binary 5-tuples containing tw o ones. W e use the distance b etw een tw o o f these as the minimu m nu m ber of pa ir wise adjacent switches req uired to bring one to the o ther. Thus the distance b etw ee n 1 1000 a nd 0 0011 is 6. F urther background can b e found in Diaco - nis et a l. [ 20 ] or Tho mps o n [ 46 ]. With this choice speciﬁed, the minimization prob- lem is equiv alent to the Monge-K antoro vich T ranss hipmen t problem. W e c o mputed distances using the CS-2 co de of Andrew Goldb erg ( www/avglab.org/ andrew ). A.3. Using al l aﬃne pr oj e ctions The da ta ana lysis o f Section 2 used pro jections into ﬁr st and second order marg ins . The general theory develop ed later p o ints to all aﬃne pro jections as a natura l base for analysis. In this se ction, we complete o ur ana lysis of the Plato da ta by lo oking at all aﬃne pro jectio ns. In the following, x and z range ov er all binary 5-tuples. If f ( x ) is the prop o rtion of s e ntences in a ﬁxed b o ok (eg. R epublic ) with rhyming pattern x , the pro jection of f in direction z is X x · z =0 f ( x ) , X x · z =1 f ( x ) . T ab le 17 d V for Republic to other b o oks Bo ok V ass. Dist. Mass diﬀ T ot a l Rank La ws 109 951 1060 5 Phil. 119 748 8 67 4 Po l. 112 952 1064 6 Soph. 82 97 179 1 Tim. 41 263 3 04 2 Crit. 71 675 746 3 286 P. Diac onis and J. Salzman T ab le 18 (00010 ) (01100) (11000 ) Rep. 1 2 1 Tim. 2 3 2 Soph. 3 4 4 Po l. 4 5 7 Phil. 7 6 3 La ws. 6 7 5 Crit. 5 1 6 T o use the information that Rep ublic was wr itten ear ly and L aws was w r itten late, we ﬁnd 5- tuples, z , that max imize X x · z =0 f ( x ) − X x · z =1 f ( x ) ! − X x · z =0 g ( x ) − X x · z =1 g ( x ) ! . where g ( x ) co des patterns fo r Laws. The largest three diﬀerences o ccur at z = (00010 ) , (011 00) and (1100 0). F or ea ch of these, we ca lculated X x · z =0 h ( x ) − X x · z =1 h ( x ) for ea ch of the bo oks (where h co des the patterns for a particula r b o ok ), and use the linear o r der of these v alues to order the bo oks. The r ank order r esulting from the thr ee binary 5-tuples, z , with the largest three diﬀere nces ab ov e z = (00010 ) , (011 00) and (1100 0) are given in T able 18 . The ﬁrst co lumn th us gives the ranking : R ep. , Tim . Soph. , Pol. , Crit. , L aws , Phil. This is based o n the diﬀerence betw een a s ing le sylla ble (second from the end). It is close to, but not the same as the rank ing ba sed on adjusted se c ond order margins found ab ov e. The other co lumns diﬀer a nd show that not ‘any o ld’ pro jection g ives the same ranking . Ac kno wle dgment s. This pap er is written in tribute to David F r eedman with thanks for his integrity and brilliance. References [1] Ahn, J. S., Hofmann, H. and Cook, D. (2003). A pro jection pursuit metho d on the multidimensional squared contingency table. Comput. Statist. 18 605– 6 26. MR20193 85 [2] A tkinso n, A. C. (1970). A metho d for discriminating b etw e e n models. J . R oy. Statist. So c. Ser. B 3 2 323–3 53. [3] Bailey, R. (2004). Asso ciation Schemes: D esigne d Exp eriment s , Algebr a and Combinatorics . Cambridge Univ. P ress. MR20473 11 [4] Bolker, E. (19 87). The ﬁnite Radon tr ansform. Contemp. Math. 63 2 7–50. MR08763 12 [5] Bonev a, L. I. (197 1). A new approach to a pr oblem of chronological a sso ciated with the works of P lato. In Mathematics in t he Ar chae olo gic al and Historic al Scienc es (R. R. Ho dso n, D. G. Kendall and F. T autu, eds.). Edinburgh Univ. Press. [6] Brandw o od, L. (1976 ). A Wor d Index to Plato. W. S. Ma ney , Leeds. Pr oje ction pursuit for discr ete data 287 [7] Camer on, P . (1976). Par al lelisms of Complete Designs . Cambridge Univ. Press. MR04192 45 [8] Charnomordic, B. and Holmes, S. (20 01). Corres po ndence a nalysis with R. Statist. Comput. Gr aph. 12 1 9–25 . [9] Cheng, T. T. (1949). The no rmal appr oximation to the Poisson distribu- tion a nd a conjecture of Rama nujan. Bul l. Amer. Math. S o c. 55 396–4 01. MR00294 87 [10] Const antine, G. M. (1 9 87). Combinatorial The ory and Statistic al Designs . Wiley , New Y ork. MR08 9118 5 [11] Cox, D. E . and Brandw ood, L . (195 9). On a discriminato r y problem con- nected with the works of Plato. J. R oy. Statist. So c. Ser. B 21 19 5 –200 . MR01091 02 [12] Critchlow, D. (1988). Metric Metho ds for Analyzing Partial ly Ranke d Data. Springer-V erlag, Be r lin. MR08189 86 [13] Dedeo, M. and Velasquez, E. (200 3). The Radon tra nsform on Z k n . SIAM J. Discr et e Math. 1 8 472 –478. MR21344 09 [14] Dembro wski, P. (1968). Finite Ge ometries . Springer , New Y ork. MR02332 75 [15] Diaconis, P. and Freedman, D. ( 1982). The mo de o f an empirical his- togram. Paciﬁc J. Math. 100 3 5 9–38 5. MR06693 30 [16] Diaconis, P . and Freedman, D . (19 82). Asymptotics of g r aphical pro jec- tion pursuit. Ann. Statist. 12 793–8 15. MR07512 74 [17] Diaconis, P. and Graham, R. (19 8 5). Finite Radon transforms o n Z k 2 . Pa- ciﬁc J. Math. 118 323 –345 . MR0 78917 4 [18] Diaconis, P . (19 88). Gr oup r epr esentations in pr ob abil ity and statistics . In IMS L e ctur e Notes – Mono gr aph Series 11 (S. S. Gupta, ed.). Institute o f Mathematical Statistics, Hawyard CA. MR0964 069 [19] Diaconis, P. (1989). The 1987 W ald Memoria l Lectures: A genera lization o f sp ectral analysis with application to ranked data. Ann. Statist. 1 7 94 9–97 9. MR10151 33 [20] Diaconis, P., H olmes, S. , Janson, S., Lal ley, S. and Pemantl e, R. (1995). Metrics on comp ositions and coincidences among renew al pro cesses. In R ando m D iscr ete Structu r es 81–101 . IMA V ol. Math. A ppl. 76 . Springer, New Y ork. [21] Dudley, R. M. (2002). Re al Analysi s and Pr ob ability. Cambridge Univ. Press. MR19323 58 [22] Donoho, D. (1981 ). O n minimum entrop y deconv olution. In Applie d Time Series Analy sis (D. F. Findley , ed.) I I 565– 608. Academic Press, New Y o rk. [23] Feller, W. (19 71). An Intr o duction t o Pr ob ability The ory and Its A pplic a- tions . I I , 2 nd ed. Wiley , New Y ork. [24] Fill, J. A. (198 9). The Ra don transfor m on Z n . SIAM J . Discr ete Math. 2 262–2 83. MR09904 56 [25] Fligner, M. A. and V erducci, J. S. (1993). Pr ob ability Mo dels and Sta- tistic al Analyses for R anking Data . Springer , New Y ork. MR12371 97 [26] Friedman, J., Stuetzl e, W. and Schroeder, A. (198 4). Pr o jection pur - suit density estimation. J. Amer. S tatist. Asso c. 79 59 9–608 . MR0763579 [27] Friedman, J. an d Stuetzle, W. (1981). Pro jection pursuit regression. J. Amer. Statist. Asso c. 76 81 7–82 3 . MR06508 92 [28] Friedman, J. and Tukey, J. W. T. (1974 ). A pro jection pursuit algor ithm for explora to ry data analy sis. IEEE T r ans. Comput. 9 881– 890. [29] Goldberg, A. CS-2 alg o rithm. Av ailable at www.avglab.org/a ndrew . [30] Hall, P. (1989). On polyno mial-based pro jection indices for explorato r y pro- 288 P. Diac onis and J. Salzman jection pursuit. Ann. S tatist. 17 589–6 05. MR09942 52 [31] Heda y a t, A. S., Sloane, N . J. and Stufken, J. (1999). Ortho gonal A rr ays: The ory and Applic ations . Spring er, New Y ork . MR16934 98 [32] Huber, P. (198 5). Pr o jection purs uit. Ann. Statist. 13 435 –475. MR0 79055 3 [33] Hw ang, J.-N., La w, S-R. and Lippman, A. (1994 ). Nonpara metric m ulti- v aria te densit y estimation: a compar ative study . I EEE T r ans. S ignal Pr o c essing 42 2795 – 2810 . [34] James, G. D . (1978 ). The Rep r esent ation The ory of the Symmetric Gr oups. Springer, New Y or k. MR05138 28 [35] Kruskal, J. B . (1 969). T ow ard a practical metho d which helps uncov er the structure of a set of multiv ar iate observ ations b y ﬁnding the linea r transforma- tion that o ptimizes a new index o f condensation. In Statist ic al Computation. (R. C. Milton a nd J . A. Nelder , eds.). Academic P ress, New Y ork . [36] Kruskal, J. B. (1972). Linear tra nsformation of multiv ar iate data to reveal clustering. In Multidimensional Sc ali ng: The ory and Applic ations in t he Be- havior al Scienc es 1 . The ory. Seminar Pre s s, New Y ork. [37] Kung, J. (197 9 ). The Ra don transform of a combinatorial geometr y . I. J. Com- bin. The ory A 37 97– 102. MR05302 81 [38] Lander, E. (198 2). Symmetric Designs, an Algebr aic A ppr o ach . Cambridge Univ. Press . MR069 7566 [39] Marden, J. I. (19 95). Analyzing and Mo deling Ra nk Data . Chapman and Hall, New Y or k . MR1 34610 7 [40] Pearson, K. (1968 ). T ables of the Inc omple te Beta-F unction , 2nd ed. Ca m- bridge Univ. Pr ess. MR02268 15 [41] Polzehl , J. (19 93). Pr oje ct ion Purs uit Discriminant Analysis . CO RE Dis- cussion Paper , Universit’e Catho lique de Louv ain. [42] Posse, C. (1995). T o ols for tw o-dimensional pro jection pursuit. J. Comput. Gr aph. Statist . 4 83–10 0. [43] Pra tt, J. (195 9). On a g e ne r al conce pt of “ In pro bability”. Ann. Math. Statist. 30 54 9–558 . MR01042 83 [44] Solomo n, H. (1961). Studies in It em Anal ysis and Pr e dictio n. Stanfo r d Univ. Press. MR01207 58 [45] Stein, C. (1992). A way o f using auxilia ry r andomization. In Pr ob ability The- ory ( Singap or e , 1989 ), 15 9–18 0 . de Gr uyter, Berlin. MR118 8718 [46] Thompson, G. L. (1993). Generalize d permutation p olytop es and explo ratory graphical metho ds for r anked data. Ann. Statist. 21 14 01–1 4 30. MR1241272 [47] Velasquez, E . (1997). The Ra don transform on ﬁnite symmetric spaces. Pa- ciﬁc J. Math. 177 369 –376 . MR14447 87 [48] Villani, C. (2003). T opics in Optimal T r ansp ortation. Amer. Math. So c., Providence, RI. MR19644 83 [49] Wishar t, D. and Leach, S. V. (1970). A multiv ariate a nalysis of Platonic prose rhythm. Computer Stu dies 3 90–9 9 .

Projection pursuit for discrete data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment