Error Probabilities for Halfspace Depth

Data depth functions are a generalization of one-dimensional order statistics and medians to real spaces of dimension greater than one; in particular, a data depth function quantifies the centrality of a point with respect to a data set or a probabil…

Authors: Michael Burr, Robert Fabrizio

ERR OR PR OBABILITIES FOR HALFSP A CE DEPTH MICHAEL A. BURR AND ROBER T F ABRIZI O Abstra ct. Data depth functions are a generalization of one-dimensional order statistics and medi- ans t o real spaces of d imension greater than one; in particular, a data depth fun ction q uantifies the central ity of a p oint with resp ect t o a data set or a p robabilit y distribution. One of the most com- monly studied data depth functions is halfspace depth. It is of interest to computational geometers b ecause it is highly geometric, and it is of interest to statisticians b ecause it shares many desirable theoretical properties with the one-dimensional median. As the sample size increases, the h alfspace depth for a sample conv erges to the h alfspace depth for th e underlying d istribution, almost surely . In this pap er, w e use the geometry of halfspace depth to impro ve the explicit b oun ds on the rate of conv ergence. 1. Introduction Data depth fu nctions generalize order statistics and the median in one dim en sion to higher dimensions; in particular, they pro vide a quantita tiv e estimate for the cen tralit y of a p oin t r elativ e to a data set or a probabilit y distribu tion (see [12] an d [11] f or surv eys). F or these functions, larger v al ues at a p oin t indicate that the p oin t is deep er or more central with resp ect to a data set or distribution. The p oin t with largest depth is defined to b e the me dian of a data set or distrib ution. F or d ata sets, data depth fu nctions are typically defined in terms of the geometry of the data, and they refl ect the geometric com binatorics of the data set. Halfspace d epth is a data d epth measure that was first in tro du ced by Ho dges [10] and T ukey [15]. Halfspace d epth has attracted the interest of compu tational geometers b ecause of its strong geometric p r op erties (see, for example, [1 ] and [3]) and is of in terest to statisticia ns b ecause it shares m an y theoretical prop erties with the one-dimensional median [17]. W e recall the definition of halfspace depth f or d istributions and data sets, briefly u sing th e notation H ( q ) for the s et of halfspaces in R d con taining q ∈ R d . Definition 1.1. Let X b e an R d -v alued rand om v ariable. F or a p oin t q ∈ R d , the halfsp ac e depth of q with resp ect to X is the follo wing m inim um o ve r all halfspaces H of R d con taining q : HD( q ; X ) = min H ∈H ( q ) Pr( X ∈ H ) . Let X ( n ) = ( X 1 , · · · , X n ) b e a fi nite samp le of n p oin ts in R d . F or a p oin t q ∈ R d , the halfsp ac e depth of q with resp ect to X ( n ) is the follo wing minimum o v er all halfspaces H of R d con taining q : HD( q ; X ( n ) ) = n − 1 · min H ∈H ( q ) # { X ( n ) ∩ H } . In [8], th e authors pro v e that for iid X 1 , X 2 , . . . random v ariables equal in distrib ution to X , as n → ∞ sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | → 0 a.s. Date : A ugust 7, 2018. Key wor ds and phr as es. Data depth , Halfspace depth, Con vergence, Gilvenk o-Cantell i. This work w as partially supp orted by grants from the Simons F ound ation (#282399 to Michael Burr) and the NSF (# CCF-1527193). 1 2 MICHAEL A. BURR AN D R OBER T F ABRIZIO In this p ap er, we use the geometry of halfspace dep th to impr o v e explicit b oun ds on the rate of con v ergence of this limit. In particular, we sho w: Theorem 1.2 . Let X b e an R d -v alued r andom v ariable (ob eying certain Lipschitz con tin uit y conditions defined b elo w) and let ( X 1 , X 2 , · · · ) b e a sequence of iid rand om v ariables, equal in distribution to X . Fix ε > 0. Then, for n sufficiently large, there is a constant C su c h that Pr sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | ≤ ε ! ≥ 1 − C n 3 2 ( d − 1) e − 2 nε 2 . This th eorem repr esen ts an impro ve ment b y a factor of n 1 2 d + 7 2 o v er the previous b ound. The remainder of th is pap er is organized as follo ws: I n Section 2, w e provide the necessary bac kground on halfspace depth and discuss what is kn o wn ab out its con v ergence. In Section 3, w e qu ote r esults from computational geometry and p robabilit y theory that w e need in the main results. In this section, we also p ro vide the notation for the Lipsc hitz contin uit y conditions ment ioned in Theorem 1.2. In Section 5, we illustrate the main theorem with a few examples and sp ecial cases, and , finally , w e conclud e in S ection 6. 2. Back ground on Halfsp a ce Depth and its Convergence W e b egin this section by p resen ting equiv alen t form ulations of halfspace d ep th and discuss the geometric prop erties and notations that are used in the remainder of this pap er. In the s econd p art of this pap er, we r ecall probabilistic estimate s on the error of the empirical measure of a sample and apply these estimates to halfspace d epth. 2.1. Equiv a le nt Definit ions of Halfspace Depth. Halfspace depth is a commonly studied an d used d ata depth function b ecause it is simple to define, can b e computed efficiently , and satisfies all of the desirable prop erties for a data depth function as defined in [17]. In the con text of this pap er, we recall t wo equiv ale nt formulati ons of halfspace depth. Observe , fi rst, that a p arallel translation of the b oundary of a halfspace in the d irection of the h alfspace only decreases the measure or n umber of p oin ts in the halfsp ace. Therefore, in the definition of h alfsp ace depth, it is enough to only consider h alfspaces where q lies in the b oundary of H . Th erefore, we can rewrite the definition of halfspace depth as follo ws: Prop osition 2.1. Let X be an R d -v alued random v ariable and X ( n ) = ( X 1 , · · · , X n ) a finite sample of n p oints in R d . F or a p oin t q ∈ R d , the halfspace dep th of q w ith resp ect to X or X ( n ) is the f ollo wing min im um o v er all h alfspaces H of R d whose b oundary ∂ H con tains q : HD( q ; X ) = min ∂ H ∋ q Pr( X ∈ H ) and HD( q ; X ( n ) ) = n − 1 · min ∂ H ∋ q # { X ( n ) ∩ H } . Next, we r ein terpret this d efinition in terms of pro jections on to one-dimensional subsp aces of R d as in [8]; this allo ws us to in terpret a h igh-d imensional p roblem as a collecti on of one-dimensional problems. F or q ∈ R d , the set of h alfsp aces H with q ∈ ∂ H can b e parametrized by p oints in the ( n − 1)-dimensional sph ere, S n − 1 . More p recisely , eac h d irection in S n − 1 corresp onds to a v ector v w hic h describ es a halfsp ace H as f ollo ws: the b oun ding h yp erplane ∂ H of th e halfsp ace p asses through q and has n ormal v an d the halfspace op ens in the direction aw a y from v . W e no w tak e this idea and lo ok at it fr om a different p ersp ectiv e; instead of fixing the p oint q and considering all halfspaces whose b ound ary p asses through q , we fo cus on the parameterizati on giv en by S d − 1 . In order to sim p lify the discuss ion, w e u se the f ollo wing notation: Notation 2.2. Let θ ∈ S d − 1 , d efine u θ to b e the ve ctor in R d p oint ing in the direction of θ , ℓ θ to b e the line throu gh the origin in the d ir ection of θ , and π θ : R d → ℓ θ to b e the orthogonal pro jection on to ℓ θ . Moreo v er, f or a p oin t p ∈ R d , define d θ ( p ) = h p, u θ i (where h· , ·i is the standard inner pro du ct in R d ) to b e the signed length of p in th e direction of u θ . In other words, π θ ( p ) = d θ ( p ) u θ . ERROR PR OBABILITIES F OR HALFSP A CE DEPTH 3 Using th is n otatio n, we define probabilit y distributions and fi nite samp les on R corresp onding to eac h direction in S d − 1 . Notation 2.3. Let X b e an R d -v alued random v ariable and X ( n ) = ( X 1 , · · · , X n ) a fi nite sample of n p oin ts in R d . F or eac h θ ∈ S n − 1 , X θ is the R -v alued random v ariable d θ ( X ), and F θ is the cdf for this v ariable, i.e., F θ ( t ) = Pr( X θ ≤ t ). Sim ilarly , F n,θ is the empirical cdf for the p oin ts of X ( n ) in the direction of θ , i.e., F n,θ ( t ) = n − 1 · # { i : d θ ( X i ) ≤ t } . F or eac h t and θ , th er e is a h alfspace H θ , t suc h that F θ ( t ) = Pr( X ∈ H θ , t ) and F n,θ ( t ) = n − 1 · # { X ( n ) ∩ H θ , t } . In particular, H θ , t is the halfspace w h ose b oun ding h yp erp lane passes th rough the p oin t t · u θ , whose b ounding hyp erplane has norm al u θ , and the halfspace op ens in the direction of − u θ . Since the p oint q lies in the b ound ing h yp erplane for H θ , t iff d θ ( q ) = t , we can rein terpret Prop osition 2.1 as follo ws (see Figure 1 ): Prop osition 2.4. Let X be an R d -v alued random v ariable and X ( n ) = ( X 1 , · · · , X n ) a finite sample of n p oints in R d . F or a p oin t q ∈ R d , the halfspace dep th of q w ith resp ect to X or X ( n ) is the f ollo wing min im um o v er directions θ ∈ S d − 1 : HD( q ; X ) = min θ ∈ S d − 1 F θ ( d θ ( q )) and HD( q ; X ( n ) ) = min θ ∈ S d − 1 F n,θ ( d θ ( q )) . ℓ θ H θ , d θ ( q ) u θ q π θ ( q ) Figure 1. Th e n umb er of p oints in the h alfspace H θ , d θ ( q ) (the un shaded h alfspace) equals the n umb er of p oin ts suc h that d θ ( X i ) ≤ d θ ( q ). In the diagram, th ese are the p oint s s u c h that π θ ( X i ) is to th e left of π θ ( q ). A similar statemen t and d iagram can b e made for a R d -v alued rand om v ariable X . These equiv ale nt formulatio ns for h alfspace depth illustrate that the halfspace depth of a p oint q measures how extreme q is und er all orthogonal p ro jections on to one-dimensional su bspaces. Th is form ulation giv es rise to th e common descrip tion of the halfspace depth of a p oin t q relativ e to a sample X ( n ) as the smallest f raction of p oin ts that m ust b e remov ed f r om the sample so that q is ou tsid e the con v ex hull of the remaining sample p oin ts. The form ulation of halfspace depth in Prop osition 2.4 giv es the k ey app roac h that is exploited in the main result of this pap er. 2.2. Prior C on v ergence Estimates. S upp ose that X is R d -v alued random v ariable and that ( X 1 , X 2 , · · · ) is a sequence of iid rand om v ariables equal in distribu tion to X . I n [8], the authors pro ve that for all q ∈ R d , HD( q ; X ( n ) ) → HD( q ; X ) almost surely as n → ∞ . This result is p ro v ed b y ob s erving that the collection of all halfspaces in R d satisfy the Glive nko-Can telli prop erty (for additional details, see [14]), so that, u niformly for all halfspaces H , n − 1 · # { X ( n ) ∩ H } → Pr ( X ∈ H ) a.s. as n → ∞ . 4 MICHAEL A. BURR AN D R OBER T F ABRIZIO The conv erge nce giv en b y the Gliv enko -Cante lli p rop erty can b e strengthened b y observ in g that the set of halfspaces are a V apnik- ˇ Chervo nenkis class. In particular, it is sho wn in [9] that halfspaces in R d cannot shatter sets of size d + 2 (and con v ex sets of size d + 1 can b e sh attered). W e w rite m ( n ) for th e maxim um n umber of s ubsets formed by inte rsecting fi nite samples of size n with halfsp aces in R d . Then , in [16] and [5], the f ollo wing r esu lt is p r o v ed: Prop osition 2.5 (See [14, Chapter 26]) . Let H b e the s et of all halfsp aces in R d and sup p ose that ε > 0. Define m to b e the fun ction as describ ed ab o v e. Then, for n suffi cien tly large, (1) Pr( sup H ∈H | n − 1 · # { X ( n ) ∩ H } − Pr( X ∈ H ) | ≥ ε ) ≤ 4 m (2 n ) e − nε 2 / 8 . (2) Pr( sup H ∈H | n − 1 · # { X ( n ) ∩ H } − Pr( X ∈ H ) | ≥ ε ) ≤ 4 m ( n 2 ) e − 2 nε 2 . Moreo v er, m ( r ) ≤ 3 2 · r d +1 ( d +1)! . W e improv e these b ounds on the error by decreasing the exp onen t or the degree of the p olynomial co efficien t. 3. Additional Tools Since d ata d epth com bines d iscrete geometry with pr ob ab ility and statistics, our main resu lt requires results from b oth of th ese fields. In this section, for con ve nience, we collect a few additional theorems and n otatio ns th at are us ed in the r emainder of the pap er. 3.1. Spherical Co v ering. As observ ed in Section 2.1 , the halfspace d ep th is related to the direc- tions in a ( d − 1)-dimensional sphere. In our main result, w e consider sm all neigh b orho od s on a ( d − 1)-dimensional sph ere. Th e follo wing result ind icates ho w man y of these neigh b orh o o ds are necessary . Lemma 3.1 ([2, Corollary 1.2]) . F or any 0 < ψ < arccos( d − 1 / 2 ), there is an absolute constan t C 2 suc h that the ( d − 1)-dimensional u nit sphere can b e co v ered by C 2 cos ψ sin d − 1 ψ ( d − 1 ) 3 2 ln  1 + ( d − 1 ) cos 2 ψ  ≤ C 2 √ d ψ ! d − 1 ( d − 1 ) 3 2 ln( d ) spherical b alls of radius ψ (i.e., the radius on the surf ace of the sphere is ψ ). The final inequalit y in this lemma follo ws from the facts th at cos( ψ ) ≤ 1 and for 0 < ψ < arccos( d − 1 / 2 ), sin ( ψ ) ≥ ψ · cos ( ψ ) > d − 1 / 2 ψ . Th e √ d could b e rep laced b y a constan t b y us in g a b etter b oun d on sin( x ), suc h as sin( x ) ≥ x − x 3 3! . W e lea v e the details to the inte rested reader. 3.2. The Dv oretzky-Kiefer-W olfowitz Ineqaulit y. In the one-dimen sional case, the b ound s in Prop osition 2.5 can b e impro v ed with the Dv oretzky-Kiefer-W olfo witz inequalit y . Lemma 3.2 (See, for example, [6]) . Let X b e an R -v alued random v ariable, and let ( X 1 , X 2 , · · · ) b e a sequence of iid rand om v ariables equ al in distribution to X . Le t F b e th e cdf of X , and let F n b e the emp ir ical cdf, i.e., F n ( t ) = n − 1 · # { i ≤ n : X i ≤ t } . F or eac h ε > 0, Pr(sup t ∈ R | F ( t ) − F n ( t ) | ≥ ε ) ≤ 2 e − 2 nε 2 . In this pap er, we use the Dv oretzky-Kiefer-W olfo witz inequalit y on one-dimensional pro jections, as in Prop osition 2.4, in order to impro ve the b ound on th e con v ergence rate of th e sample halfspace depth. ERROR PR OBABILITIES F OR HALFSP A CE DEPTH 5 3.3. Lipsc hitz and T ail Beha vior. I n this section, w e defin e a few conditions on the probabilit y distributions that we use for our main result. Let X b e an R d -v alued random v ariable with d > 1. Definition 3.3. W e sa y that an R d -v alued r andom v ariable X de c ays quickly if there is some λ > 0 suc h that Pr( | X | > R ) = O  R 3 d − 5 · e − λ R 2 2  . In other w ords, there exists a constan t C 1 > 0 such that for R > 1, Pr ( | X | > R ) < C 1 · R 3 d − 5 · e − λ R 2 2 . W e say that λ is the de c ay r ate of X . F or example, the m ultiv ariate n ormal with the origin the mean and co v a riance m atrix I deca ys exp onent ially with λ = 1. Definition 3.4. W e say that X is Lipschitz c ontinuous in pr o je ction if there is a constan t L π > 0 suc h th at for all θ ∈ S d − 1 , F θ is Lipschitz contin uous with Lip s c hitz constant L π . W e sa y that X is r adial ly Lipschitz c ontinuous if there is a constant L θ suc h that for any fixed t , the fu nction F θ ( t ) is L ip sc hitz contin uous as a function of θ on the unit circle with Lip sc hitz constant L θ . W e say that L π is the pr oje ction Lipschitz c onstant and L θ is the r adial Lipschitz c onstant for X . F or example, radially sym m etric distrib utions ab out the origin hav e L θ = 0. These tw o Lipschitz constan ts indicate that as θ and t c hange, F θ ( t ) changes con tin uously . In other words, these t w o constan ts can b e used to b ound the differen ce in the pr obabilit y b etw een tw o halfspaces and to sho w that this difference v a ries con tin uously . 4. Main Resul t W e b egin this section b y highligh ting the difference b et w een the previous b ounds and our ap- proac h. In Prop osition 2.5, the conv ergence rates are compu ted o v er all halfspaces in R d , and th e goal is to find a b ound on Pr( sup H ∈H | n − 1 · # { X ( n ) ∩ H } − Pr( X ∈ H ) | ≥ ε ) . In our appr oac h, ho w ev er, the goal is to fin d b ounds on the family of one-dimensional cdf s F θ and F n,θ . In p articular, in our main result, w e fin d a un iform up p er b ound on the follo wing p r obabilit y: Pr( sup t ∈ R θ ∈ S d − 1 | F θ ( t ) − F n,θ ( t ) | ≥ ε ) . Then, by applying Prop osition 2.4, wh en | F θ ( t ) − F n,θ ( t ) | < ε for all t and θ , it follo ws that for all q ∈ R d , | HD( q ; X ) − HD( q ; X ( n ) ) | < ε . The adv an tage of this ap p roac h is that b y consider in g only one-dimensional ob jects, we can use the impro v ed b ound s of the Dv oretzky-Kiefer-W olfo witz inequalit y . Throughout the remainder of this section, w e assume that the R d -v alued random v a riable X with d > 1 has d eca y rate λ with constan t C 1 , pro jection Lipschitz constan t L π , and radial Lipschitz constan t L θ . Also, let ( X 1 , X 2 , · · · ) b e a sequence of iid ran d om v ariables equal in distribu tion to X . Our pro of follo ws th e follo wing steps: (1) First, w e quant ify how small c hanges in θ affect the cdfs F θ and F n,θ . (2) Second, w e use this description on F θ and F n,θ to argue that it is sufficien t to examine finitely man y θ ’s instead of the infin itely many p ossible θ ’s in S d − 1 . (3) Finally , w e carefully choose a few parameters in order to ac hiev e a b ound whic h improv es up on the b oun ds ac hiev ed in Prop osition 2.5. F or our first step, we b egin by collecting, for later con v enience, a consequence of the Lip sc hitz condition, and then p ro vide a geometric description of the b eha vior of d θ as θ c hanges. F or n ota- tional con v enience, for θ , ϕ ∈ S d − 1 , we write | θ − ϕ | f or the spherical distance b et w een them, i.e., the central angle b etw een the t wo p oints. Observ at ion 4.1. F or all θ , ϕ ∈ S d − 1 and t ∈ R , | F θ ( t ) − F ϕ ( t ) | ≤ L θ | θ − ϕ | . 6 MICHAEL A. BURR AN D R OBER T F ABRIZIO Lemma 4.2. L et θ , ϕ ∈ S d − 1 and x ∈ R d . Then, | d θ ( x ) − d ϕ ( x ) | ≤ k x k · | θ − ϕ | . Pr o of. Observe first that | d θ ( x ) − d ϕ ( x ) | = |h x, u θ − u ϕ i| ≤ k x k · k u θ − u ϕ k . Not e that the vec tor u θ − u ϕ is the c hord of an arc on the great circle b et wee n θ and ϕ ; sin ce c hords are shorter than the arcs they cut off, and the length of the arc is | θ − ϕ | , the r esult follo ws.  Moreo v er, Lemma 4.2 allo ws us to b oun d F n,ϕ in terms of F n,θ . More precisely: Corollary 4.3. Fix R > 0 and θ , ϕ ∈ S d − 1 . Supp ose that for all i ≤ n , k X i k ≤ R . T h en, F n,θ ( t − R | θ − ϕ | ) ≤ F n,ϕ ( t ) ≤ F n,θ ( t + R | θ − ϕ | ) . Pr o of. This r esult follo ws fr om Lemma 4.2, b ecause w e kno w that if d ϕ ( X i ) ≤ t , then d θ ( X i ) ≤ t + R · | θ − ϕ | , and if d θ ( X i ) ≤ t − R · | θ − ϕ | , then d ϕ ( X i ) ≤ t .  With Observ ation 4.1 and Corollary 4.3 in hand , we know h o w the d istributions F θ and F n,θ v ary as θ v arie s. T h is leads to the follo wing prop osition, whic h b ounds the error b et w een F θ and F n,θ as θ v aries. Prop osition 4.4. Fix R > 0, ε > 0, δ > 0, and θ , ϕ ∈ S d − 1 . S u pp ose that for all i ≤ n , k X i k ≤ R . Supp ose that sup t ∈ R | F θ ( t ) − F n,θ ( t ) | < ε · (1 + δ ) − 1 . Then , sup t ∈ R | F ϕ ( t ) − F n,ϕ ( t ) | < ε (1 + δ ) − 1 + L θ | θ − ϕ | + L π R | θ − ϕ | . Pr o of. By Corollary 4.3, w e kno w th at F n,θ ( t − R | θ − ϕ | ) − F ϕ ( t ) ≤ F n,ϕ ( t ) − F ϕ ( t ) ≤ F n,θ ( t + R | θ − ϕ | ) − F ϕ ( t ) . Then, it follo ws that | F n,ϕ ( t ) − F ϕ ( t ) | ≤ max {| F n,θ ( t − R | θ − ϕ | ) − F ϕ ( t ) | , | F n,θ ( t + R | θ − ϕ | ) − F ϕ ( t ) |} . W e pro ceed to b ound the fir st of the expression in the maxim um (the other expr ession is similar). By the triangle inequalit y , | F n,θ ( t − R | θ − ϕ | ) − F ϕ ( t ) | ≤ | F n,θ ( t − R | θ − ϕ | ) − F θ ( t − R | θ − ϕ | ) | + | F θ ( t − R | θ − ϕ | ) − F θ ( t ) | + | F θ ( t ) − F ϕ ( t ) | . By assumption, the fi rst expression is b ound ed ab o v e by ε · (1 + δ ) − 1 . By the p ro jection Lipschitz constan t, the second expr ession is b ound ed ab o v e by L π · R · | θ − ϕ | . Finally , b y Obser v ation 4.1, the third exp ression is b ound ed by L θ · | θ − ϕ | . Com bining th ese b oun ds, the result follo ws.  By c ho osing | θ − ϕ | to b e sufficien tly small, as in th e follo wing corollary , we can insist that the errors are all b oun ded ab o v e b y ε . Corollary 4.5. Fix R > 0, ε > 0, δ > 0, and θ ∈ S d − 1 . Supp ose that for all i ≤ n , k X i k ≤ R . Let ϕ ∈ S d − 1 b e su c h th at | θ − ϕ | < ε · δ · (1 + δ ) − 1 · ( L θ + L π · R ) − 1 . Sup p ose that sup t ∈ R | F θ ( t ) − F n,θ ( t ) | < ε · (1 + δ ) − 1 . Then , sup t ∈ R | F ϕ ( t ) − F n,ϕ ( t ) | < ε. Corollary 4.5 completes the second main step of the pro of and gives us a w a y to study neigh- b orho o d s on S d − 1 instead of individu al pro jections. In particular, if { θ 1 , · · · , θ k } satisfy the con- ditions of Corollary 4.5 , then we ha ve estimates on the error in the k sp h erical balls of radius ε · δ · (1 + δ ) − 1 · ( L θ + L π · R ) − 1 cen tered at the θ i ’s. W e no w determine the θ i ’s and use the Dv oretzky-Kiefer-W olfo witz b ound in these directions to get our in itial estimate on the error b e- t w een the halfspace depth of a sample and th e und erlying distribution. In particular, ERROR PR OBABILITIES F OR HALFSP A CE DEPTH 7 Prop osition 4.6. Fix R > 1, ε > 0, and δ > 0. Sup p ose that ε · δ · (1 + δ ) − 1 · ( L θ + L π · R ) − 1 < arccos( d − 1 / 2 ). Then, Pr sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | ≤ ε ! ≥ 1 − 2 C 2 (1 + δ )( L θ + L π R ) √ d εδ ! d − 1 ( d − 1) 3 2 ln( d ) e − 2 nε 2 (1+ δ ) − 2 − C 1 nR 3 d − 5 e − λ R 2 2 . (1) Pr o of. Observe first, that, in eac h of the s tatemen ts ab ov e, we require that for all i ≤ n , k X i k ≤ R . By the exp onen tial deca y p rop ert y , for eac h i , the probabilit y of th is o ccurrin g is b ounded b elo w b y 1 − C 1 R 3 d − 5 e − λ R 2 2 . Next, observ e that if θ ∈ S d − 1 is s uc h that sup t ∈ R | F θ ( t ) − F n,θ ( t ) | < ε · (1 + δ ) − 1 , then for all ϕ ∈ S d − 1 within a spherical ball of radius ε · δ · (1 + δ ) − 1 · ( L θ + L π · R ) − 1 satisfy the conditions of Corollary 4.5. Therefore, we co v er the ( d − 1)-dimensional sphere with balls of radius ε · δ · (1 + δ ) − 1 · ( L θ + L π · R ) − 1 . By Lemma 3.1, C 2 ·  (1 + δ )( L θ + L π R ) √ d  d − 1 · ( εδ ) 1 − d · ( d − 1) 3 2 · ln ( d ) suc h spherical balls are required. Fix θ to b e the center of one of these spherical b alls. Applying the Dvo retzky-Kiefer-W olfo witz inequality , Lemm a 3.2, it follo ws that Pr(sup t ∈ R | F θ ( t ) − F n,θ ( t ) | ≤ ε (1 + δ ) − 1 ) ≥ 1 − 2 e − 2 nε 2 (1+ δ ) − 2 . Since the spherical balls cov er S d − 1 , for any ϕ ∈ S d − 1 , there is some sp herical ball w ith cen ter θ such that | θ − ϕ | < ε · δ · (1 + δ ) − 1 · ( L θ + L π · R ) − 1 . Hence, by Corollary 4.5, it follo ws that sup t ∈ R | F ϕ ( t ) − F n,ϕ ( t ) | < ε . Finally , using the fact that for a, b > 0, (1 − a )(1 − b ) > 1 − ( a + b ), and that th er e are n sample p oints and C 2 ·  (1 + δ )( L θ + L π R ) √ d  d − 1 · ( εδ ) 1 − d · ( d − 1) 3 2 · ln ( d ) spher ical b alls, the result follo ws.  W e no w s implify the expr ession ab o v e by eliminating the parameters R an d δ . S ince the b ound in Inequalit y (1) has t wo exp onentials, w e can c ho ose R = ε · 2 √ n √ λ (1+ δ ) to equate the exp onen tials. This choic e results in the follo wing corollary: Corollary 4.7. Fix ε > 0 and δ > 0. S upp ose that ε · δ · (1 + δ ) − 1 ·  L θ + L π · ε · 2 √ n √ λ (1+ δ )  − 1 < arccos( d − 1 / 2 ) and 2 ε √ n > √ λ (1 + δ ). Then, Pr sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | ≤ ε ! ≥ 1 −   2 C 2 ( L θ √ λ (1 + δ ) + 2 L π √ nε ) √ d εδ √ λ ! d − 1 ( d − 1) 3 2 ln( d ) + C 1  ε √ λ (1 + δ )  3 d − 5 n 3 2 ( d − 1)   e − 2 nε 2 (1+ δ ) − 2 . (2) Finally , we c ho ose δ = n − 1 in ord er to eliminate δ . Add itionally , this choic e causes the exp onential to simplify to e − 2 nε 2 . More p recisely , 8 MICHAEL A. BURR AN D R OBER T F ABRIZIO Theorem 4.8. Fix ε > 0 and s u pp ose that ε · ( n + 1) − 1 ·  L θ + L π · ε · 2 n 3 / 2 √ λ ( n +1)  − 1 < arccos( d − 1 / 2 ) and 2 εn 3 2 > √ λ ( n + 1). Th en, Pr sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | ≤ ε ! ≥ 1 −   2 C 2 ( L θ √ λ ( n + 1) + 2 L π n 3 2 ε ) √ d ε √ λ ! d − 1 ( d − 1) 3 2 ln( d ) + C 1  εn √ λ ( n + 1)  3 d − 5 n 3 2 ( d − 1)   e 4 e − 2 nε 2 . (3) Pr o of. By setting δ = n − 1 in Inequalit y (2 ) and expanding the exp onen tial, th e b ound in Corollary 4.7 b ecomes Pr sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | ≤ ε ! ≥ 1 −   2 C 2 ( L θ √ λ ( n + 1) + 2 L π n 3 2 ε ) √ d ε √ λ ! d − 1 ( d − 1) 3 2 ln( d ) + C 1  εn √ λ ( n + 1)  3 d − 5 n 3 2 ( d − 1)   e − 2 ε 2  n − 2 n 2 + n n 2 +2 n +1  . Since ε is a d ifference b et wee n t wo quan tities wh ose v alues are b et we en 0 and 1, only ε ≤ 1 has con ten t. Moreo ver, the qu otien t in the exp onen tial is decreases to − 2 for n p ositiv e; th er efore, the exp onent is b oun ded ab o v e b y − 2 nε 2 + 4, and the resu lt follo ws.  W e summarize Theorem 4.8 for n sufficien tly large. Corollary 4.9. Fix ε > 0. Then , for n sufficientl y large, there is a constan t C dep ending on λ , ε , C 1 , C 2 , d , and L π suc h that Pr sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | ≤ ε ! ≥ 1 − C n 3 2 ( d − 1) e − 2 nε 2 . W e can compare this result with th e kno wn con v ergence r ates in Pr op osition 2.5. Th e exp onen tial in Inequalit y (1) is m uc h larger (the exp onen t is a negativ e n umber with smaller magnitude) th an what app ears in Corollary 4.9, so our b ound is tigh ter. O n the other h and, the b est co efficien t in Inequalit y (2 ) is n 2 d +2 , whic h is a factor of n 1 2 d + 7 2 times larger than our b ound. 5. Examples In this section, w e apply Theorem 4.8 to sev eral examples. W e tak e sp ecial care to d ev elop explicit b ounds on the L ipsc hitz constants whenev er p ossible. Example 5.1. Con s ider the case wher e X is distrib uted according to an elliptically symmetric distribution with p d f f ( x ) = det(Σ) − 1 2 ψ (( x − µ ) T Σ − 1 ( x − µ )) where x and µ are in R d , Σ is a p ositive d efi nite symmetric matrix, and ψ : [0 , ∞ ) → [0 , ∞ ) (see , f or example, [4]). Since Σ is p ositiv e defin ite, w e can decomp ose Σ = QD 2 Q T where the en tries of Q are th e orthogonal eigen v ectors of Σ and D is a diagonal matrix whose entries are the sq u are ro ots of the eigen v alues of Σ. Consider the affine transformation Y = D − 1 Q T ( X − µ ); under this transformation, the p df of Y b ecomes ψ ( P y 2 i ). Recall that since halfsp ace depth is in v arian t und er affin e transformations (see, for example, [17 ]). Therefore, sup q ∈ R d | HD( q ; X ) − HD( q ; X ( n ) ) | = sup q ∈ R d | HD( q ; Y ) − HD( q ; Y ( n ) ) | , ERROR PR OBABILITIES F OR HALFSP A CE DEPTH 9 and we can stud y the con v ergence of X b y studying the conv ergence of Y . Thr oughout the remain- der of this example all statemen ts refer to Y . W e supp ose, additionally , that ψ has the follo wing pr op erties: (1) R ∞ 0 r d − 1 ψ ( r 2 ) dr = (V ol d − 1 ( S d − 1 )) − 1 . Th is condition guarant ees that th e in tegral of the p df for Y o ver R d is 1. (2) There exists a λ > 0 suc h that R ∞ R r d − 1 ψ ( r 2 ) dr = O ( R 3 d − 5 e − λ R 2 2 ). T his condition guaran- tees that X has a d eca y rate of λ . (3) The function Ψ( t ) = R R d − 1 ψ ( t 2 + x 2 1 + · · · + x 2 d − 1 ) dx 1 · · · dx d − 1 is b ounded. This f unction is the d eriv ative of F e d ( t ), where e d is th e north p ole of S d − 1 . The upp er b ound on this function is a L ip sc hitz constan t for F e d ( t ). When Conditions (1) and (2) hold, this is not a strong condition; for example, it holds wh en ψ is b ounded. Lik e all distribu tions which are sp herically symm etric ab out the origin, for Y , L θ = 0. Moreo v er, b y C ondition (2), Y has a deca y rate of λ . Finally , since this is a sph erically symmetric distribu - tion, F θ ( t ) is indep enden t of θ . By Condition 3, F ′ e d ( t ) = R R d − 1 ψ ( t 2 + x 2 1 + · · · + x 2 d − 1 ) dx 1 · · · dx d − 1 is b ound ed. Therefore, the pr o j ection Lipschitz constan t is L π = su p t ∈ R   R R d − 1 ψ ( t 2 + x 2 1 + · · · + x 2 d − 1 ) dx 1 · · · dx d − 1   . In particular cases, w e can derive more precise b ounds and constan ts. Example 5.2. Con s ider the case wh ere X is distribu ted according to a non-degenerate n ormal distribution in R d with mean µ and (p ositiv e defi nite) co v ariance matrix Σ. By ap p lying Example 5.1, it is enough to consider the stand ard normal distrib ution in R d cen tered at th e origin with co v ariance matrix I . In this case, th e p df of X is (2 π ) − d/ 2 e − 1 2 P x 2 i . Moreo v er, via sp herical in tegration it follo ws that Pr( | X | > R ) = O ( R d − 2 e − R 2 2 ), so the deca y rate of X is 1. Additionally , for any θ , F θ is the cdf of a standard normal distribu tion in one v ariable, since the Lip sc hitz constant for F θ is b oun ded b y the ab s olute v alue of the d eriv at ive , L π = 1 √ 2 π . The computation for C 1 in Definition 3.3 is more tec hn ical (not p articularly in teresting); we observ e that sin ce Pr( | X | > R ) = O ( R d − 2 e − R 2 2 ), we could, in fact, replace the 3 d − 5 by d − 2 in Definition 3.3 and subsequent computations to achiev e a b etter b ound with C 1 app earing only in a lo w er ord er term. T herefore, we c ho ose to ignore the C 1 term. W e lea v e the d etails to the in terested r eader. Example 5.3. F or the tw o-dimensional normal, we can say ev en more. More p r ecisely , assume that X is distribu ted according to a b iv ariate normal cente red at the origin with co v ariance I . The b ound s ab o v e app ly , but can b e m ade ev en sharp er. F or example, we can co v er a circle using at most π w + 1 interv als of wid th 2 w . Moreo v er, Pr( | X | > R ) = e − 1 2 R 2 . Therefore, in this case, the b ound in Th eorem 4.8 h as a smaller constan t, an d , explicitly , the b ound b ecomes 1 −  2 √ 2 π n 3 2 + n + 2  e 4 e − 2 nε 2 . Additionally , in the t wo- dimens ional case, th e function m for Pr op osition 2.5 can b e computed explicitly . Th e largest n umb er of sub sets of a set of s ize n o ccurs wh en the p oints of n are in con v ex p osition. In this case, there are n 2 − n + 2 subsets f orm ed fr om in tersections with halfsp aces, so m ( r ) = r 2 − r + 2. Ev en with this smaller degree p olynomial for m , our b ound s are still an impro ve ment by a factor of √ n . 6. Conclusion The resu lts in this p ap er illustrate how, using the geometry and top ology of h alfspace d epth and R d , one can ac hiev e b etter conv erge nce b oun ds for the sample v ersion of halfspace depth, as compared to general Gliv enko -Cant elli b ound s. With impro v ed b ounds in this pap er, we ha v e impro ve d estimates on the qualit y of the halfspace median statistic (see [12 ] and [1]), whic h is 10 MICHAEL A. BURR AN D R OBER T F ABRIZIO applicable in statistics. T h e approac h in this pap er is related to the id eas of the p ro jection pur suit in [7] and [13]; it is p ossible that incorp orating su ch tec hniques ma y further impr o v e th e conv ergence rates of halfspace depth; we lea v e suc h impro v ements as futu r e w ork. The authors w ould lik e to thank their colleagues at Clemson Univ ersit y , in particular, Billy Bridges, Brian F ralix, P eter Kiessler, and Ju ne Luo f or their constru ctive feedback on earlier v ersions of this w ork. Referen ces [1] Greg Aloupis. Geometric measures of data depth . In Data depth: r obust multi variate analysis, c om putational ge ometry and applic ations , volume 72 of DI MACS Series on Discr ete Mathematics and The or etic al Computer Scienc e , pages 147–158. American Mathematical S ociety , Pro vidence, RI, 2006. [2] K ´ aroly B¨ or¨ oczky , Jr. and Gergely Wintsc he. Cov ering the sphere by equal sph erical balls. In Discr ete and c omputat ional ge ometry , volume 25 of Algorithms Combinatorics , pages 235–251 . Springer, Berlin, 2003. [3] Michael A. Burr, Eynat Rafalin, and Diane L. Souv aine. Dynamic maintenance of half-space depth for p oints and contours. T echnical Rep ort arXiv:1109.1517 [cs.CG], arXiv, 2011. [4] M. A. Chmielewski. Ellipticall y symmetric distributions: a review and bibliograph y . International Statistic al R eview. R evue I nternat ional de Statistique , 49(1):67–74, 1981. [5] Lu c D evro ye. Upp er and low er class sequences for minimal uniform spacings. Zeitschrift f¨ ur W ahrscheinlichkeit- sthe ori e und V erwandte Gebiete , 61(2):237–254, 1982. [6] Lu c Dev ro ye and G´ ab or Lu gosi. Combinatorial metho ds in density estimation . S p ringer Series in S t atistics. Springer-V erla g, New Y ork, 2001. [7] Persi Diaconis and David F reedman. Asy mptotics of graphical pro jection pu rsuit. The Annals of Statistics , 12(3):793– 815, 1984. [8] D a vid L. Donoho and Miriam Gask o. Breakdown properties of location estimates based on halfspace d epth and pro jected outlyingness. The A nnals of Statistics , 20(4):1803–1827 , 1992. [9] R ic hard M. Dudley . Balls in R k do not cut all subsets of k + 2 p oin ts. A dvanc es i n Mathematics , 31(3):306–308 , 1979. [10] Joseph L. Ho dges, Jr. A biv ariate sign test. Annals of Mathematic al Statistics , 26:523–527, 1955. [11] Regina Y. Liu. Data dep th: center-out wa rd ordering of m ultiv ariate d ata and nonp arametric multiv ariate sta- tistics. I n Michael G. Akritas and Dimitris N. Politis, editors, R e c ent advanc es and tr ends in nonp ar amet ric statistics , p ages 155–167. Elsevier B. V., A msterdam, 2003. [12] Regina Y. Liu, Jesse M. P arelius, and K esar Singh. Multiv ariate analysis by data depth: descriptive statistics, graphics and inference. The Annals of Statist ics , 27(3):78 3–858, 1999. [13] Elizab eth S. Mec kes. Quantitative asymptotics of graphical pro jection pu rsuit. Ele ctr onic Communic ations in Pr ob abil ity , 14:176–1 85, 2009. [14] Galen R . Sh orac k and Jon A . W ellner. Empiric al pr o c esses with appli c ations to statistics . Wiley Series in Prob- abilit y and Mathematical Statistics: Probabilit y and Mathematical S tatistics. John Wiley & S on s, Inc., New Y ork, 1986. [15] John W. T u key . Mathematics and the picturing of data. In Pr o c e e dings of the I nternational Congr ess of Mathe- maticians (V anc ouver, B. C. , 1974), Vol. 2 , pages 523–531. Canadian Mathematical Congress, Mon treal, Que., 1975. [16] V. N. V apnik and A . Ja. ˇ Cerv onenk is. The uniform conv ergence of frequencies of the app earance of events to their probabilities. Akademija Nauk SSSR. T e orija V er ojatnoste ˘ ı i e e Pri menenija , 16:264–279, 1971 . [17] Yijun Zuo and Rob ert Serflin g. General notions of statistical depth function. The A nnals of Statistics , 28(2):461– 482, 2000. Dep ar tme nt of Ma thema tical Scie nces, Clemson University, Clemson, SC 29634 E-mail addr ess : burr2@clemso n.edu Dep ar tme nt of Ma thema tical Scie nces, Clemson University, Clemson, SC 29634 E-mail addr ess : rfabriz@clem son.edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment