Local depth-based classification of directional data

Lo cal depth-based classiﬁcation of directional data Giusepp e Gismondi ∗ , Reb ecca Rivieccio † , and Giusepp e Pandolfo ∗ ∗ Dept. of Ec onomics and Statistics, University of Naples F e deric o II, Nap oli, Italy † Dept. of Physics, University of Naples F e deric o II, Nap oli, Italy Abstract Directional data arise in man y applications where observ ations are naturally represented as unit vectors or as observ ations on the surface of a unit h yp ersphere. In this context, statistical depth functions provide a center–out ward ordering of the data. This w ork aims at prop osing the use of a lo cal notion of data depth function to be applied in the DD-plot (Depth vs. Depth plot) to classify directional data. The prop osed method is inv estigated through an extensive sim ulation study and tw o real-data examples. 1 In tro duction Directional data analysis is a branc h of statistics that is concerned with the exploration and mo delling of data expressed as angles or unit vectors. Suc h data lie on the surface of the unit h yp ersphere S q − 1 := { x ∈ R q : ∥ x ∥ 2 = 1 } of q − 1 dimensions, where || x || 2 := p P q i =1 x 2 i and x = ( x 1 , ..., x q ) . They naturally arise in a v ariety of real-w orld contexts where the v ectors represent directions, rotations, or cyclic phenomena. Prominent applications can b e found in geology , where the orientation of magnetic ﬁelds in rocks is studied, as well as in meteorology and psyc hology , in the analysis of wind directions or the p erception of the spatial orientation. F urther examples and theoretical dev elopments are discussed b y Mardia and Jupp ( 1999 ), whic h remains a fundamental reference in the ﬁeld of directional statistics, later complemen ted by the more recen t work of Ley and V erdeb out ( 2017 ). There are several other application domains for which the orientation of v ectors in the space con tains ric her information than the magnitude, suc h as compositional data (i.e. when vectors consist of nonnegative comp onents that sum up to one) suc h as the relativ e frequencies of w ords in a document (see P andolfo and D’Ambrosio , 2021 ). As noted by Stephens ( 1982 ), applying a square-root transformation to eac h v ector maps these comp ositions to directional data lying on the surface of a ( q − 1) -dimensional unit hypersphere. When dealing with this t yp e of data, sev eral c hallenges arise due to the absence of a natural reference direction and the lack of a unique deﬁnition of orientation or sense of rotation. Moreo ver, since directional data do not ha v e a natural ordering, the dev elopment of suitable depth functions can b e quite useful. Indeed, depth functions provide a notion of centralit y , allowing a center– out ward ordering of lo cations on the manifold ( Agostinelli and Romanazzi , 2013 ) b y generalizing the univ ariate notions of median and rank to the m ultiv ariate setting. Sev eral notions of depth for directional data ha ve b een proposed and emplo y ed as feature spaces for implemen ting supervised classiﬁcation metho ds ( Pandolfo and D’Am brosio , 2021 ; Dey and Jana , 2025 ). T raditional glob al angular depth functions aim to describe the ov erall centralit y of a p oin t with respect to the en tire data distribution, pro viding a single measure of how cen tral or p eripheral an observ ation is. Ho wev er, this approach is only reliable when dealing with unimodal and conv exly distributed data. In the case of multimodality or non-conv ex data structures, typically arising in mixture models or clustering problems, the global depths fail to pro vide a meaningful represen tation of cen trality , since m ultiple local centers may exist ( Painda v eine and V an Bever , 2013 ). T o ov ercome this limitation, local depth functions ha ve b een prop osed to pro vide a more reﬁned assessmen t of cen trality . Speciﬁcally , they aim to ev aluate the p osition of a point within a restricted neigh b orho od of the data. This wa y it is possible to capture the cen trality at a sp eciﬁc scale of 1 lo calit y ( Agostinelli and Romanazzi , 2011 ). Such approac h allows for a more ﬂexible characteri- zation of complex data structures, making local depths particularly eﬀective also for classiﬁcation tasks. Hence, the goal of this w ork is to deﬁne a lo cal version of the cosine distance depth (CDD) prop osed by Pandolfo et al. ( 2018 ) to b e exploited for directional data classiﬁcation purp oses. More speciﬁcally , we consider its application to the tw o-step pro cedure kno wn as DD-plot (Depth vs. Depth plot) in tro duced by Liu et al. ( 1999 ) and later used to p erform classiﬁcation b y Li et al. ( 2012 ) (i.e. the DD-classiﬁer). Roughly sp eaking, for t wo given samples, the corresp onding DD-plot represen ts the depth v alues of those sample p oin ts with resp ect to the tw o underlying distributions, and thus transforms the samples in an y dimension to a simple tw o-dimensional scatter plot. Then, a curve that b est separates the tw o samples in their DD-plot is applied, in the sense that the separation yields the smallest classiﬁcation error in the DD space. The remainder of the pap er is organized as follo ws. Sections 2 and 3 brieﬂy recall the concept of data depth for directional data and the classiﬁcation via the Depth vs. Depth (DD)-plot. Section 4 in tro duces a notion lo cal cosine distance depth then used to built a classiﬁer in tro duced in Section 5 . Section 6 presents an extensive simulation study to inv estigate the p erformances of the prop osed metho d. Section 7 provides tw o real-data examples. Finally , some concluding remarks are oﬀered in Section 8 . 2 Data depths for directional data Statistical depth functions extend univ ariate ordering to higher dimensions. P articularly , they oﬀer a cen ter-outw ard ordering b y pro viding a measure of ho w central a p oin t is with resp ect to a certain distribution. The concept of data depth was ﬁrst extended to the analysis of directional data b y Small ( 1987 ) and later by Liu and Singh ( 1992 ). Accordingly , directional depth functions measure the degree of centralit y of a p oint with respect to a directional distribution, and they pro vide a cen ter-outw ard ordering on circles or on h yp erspheres. Within the literature w e can ﬁnd the angular tuk ey depth (A TD) and the angular simplicial depth (ASD), whic h represent the directional extensions of the T ukey’s halfspace depth ( T ukey , 1975 ) and the simplicial depth originally introduced for data in R q , resp ectiv ely . Such depths are also kno wn as geometric depths because they are based on geometric stuctures (i.e. hemispheres and simplices). Because of that their main drawbac k is related to their high computational cost, whic h makes them unfeasible when q > 3 . F or such computational issue, here we focus on the class of distance-based depth functions introduced b y Pandolfo et al. ( 2018 ), which includes the arc distance depth (ADD) of Liu and Singh ( 1992 ), the c hord distance depth (ChDD) and the cosine distance depth (CDD). Suc h distance-based depths are computationally feasible ev en in high dimensions and are strictly p ositiv e ev erywhere on S q − 1 , except for in the uninteresting case of a point mass distribution, whereas ASD and A TD ma y tak e zero v alues, whic h can cause issues in sup ervised classiﬁcation. In addition, they do not pro duce ties in the sample case. The computational adv an tage of CDD stems from the fact that it requires only pairwise inner pro ducts ⟨ x i , x j ⟩ , which can b e computed eﬃcien tly even in high dimensions. This contrasts with geometric depths that require solving complex optimization problems ov er hemispheres or simplices. One more notion of depth for directional data is the angular Mahalanobis depth, which was studied by Ley et al. ( 2014 ) and developed b y using the concept of directional quantiles. Ho wev er, its application is often limited by the necessary prior c hoice of a spherical locational functional. Here w e focus on the CDD because of its computational ease and its prop erties that are par- ticularly useful in deﬁning our prop osal. In the following, w e recall its deﬁnition. Deﬁnition 1 (Cosine Distance Depth) . The c osine distanc e depth of a p oint x ∈ S q − 1 with r esp e ct to the distribution F on S q − 1 is deﬁne d as: CDD( x, F ) := 2 − E F [ d cos ( x, W )] , wher e d cos ( x, w ) = 1 − ⟨ x, w ⟩ is the c osine distanc e, and W ∼ F . The sample version is obtained by replacing F by its empirical coun terpart ˆ F n calculated from the sample x 1 , . . . , x n . The CDD satisﬁes all the follo wing prop erties: P1. Rotation inv ariance: CDD( x, F ) = CDD( O x, O F ) for an y q × q orthogonal matrix O . 2 P2. Maximality at cen ter: max x ∈ S q − 1 CDD( x, F ) = CDD( x 0 , F ) for any F with cen ter at x 0 . P3. Monotonicit y on rays from the deepest p oin t: CDD( · , · ) decreases along any geo desic path t 7→ x t from the deep est point x 0 to its antipo dal point − x 0 . P4. Minimalit y at the antipo dal p oin t to the center: CDD( − x 0 , F ) = inf x ∈ S q − 1 CDD( x, F ) for an y F with cen ter at x 0 . While CDD pro vides a robust global measure of cen trality , it ma y not adequately capture lo cal structure in complex data distributions. In the next section, w e address this issue by introducing a lo cal v ersion of CDD that adapts to the scale of locality . 3 Classiﬁcation in the depth space After the ﬁrst suggestion in Liu and Singh ( 1992 ), the use of data depth to p erform sup ervised classiﬁcation has b een suggested and in vestigated b y man y authors. T wo main approaches ha ve b een proposed in the literature: (i) the maxim um depth classiﬁer and (ii) the Depth vs. Depth (DD)-classiﬁer. The ﬁrst simply assigns a p oin t x to the distribution (or group) with resp ect to it attains the highest depth v alue for any considered depth D( · , · ) , that is: D( x, ˆ F i ) > D( x, ˆ F j ) i  = j, ⇒ assign x to ˆ F i . where D( x, ˆ F i ) and D( x, ˆ F j ) are the empirical depths of x w.r.t. the i -th and the j -th distribution, resp ectiv ely . The latter was prop osed by Li et al. ( 2012 ) and is a reﬁnemen t of the maximum depth classiﬁer. It is based on the DD-plot (Depth vs. Depth plot), introduced b y Liu et al. ( 1999 ) whic h is a tw o- dimensional scatterplot where eac h data p oin t is represen ted with coordinates giv en b y its depth ev aluated with resp ect to tw o distributions. Then some classiﬁcation rule s ( · ) is directly applied in the DD-space C s ( x ) = ( D( x, ˆ F i ) > s  D( x, ˆ F j )  ⇒ assign x to ˆ F i , D( x, ˆ F i ) ≤ s  D( x, ˆ F j )  ⇒ assign x to ˆ F j . where s ( · ) is a real increasing function. Li et al. ( 2012 ) suggested to lo ok for a p olynomial separator that is c hosen in order to minimize the empirical misclassiﬁcation error rate on the training sample. Note that when s ( · ) is the iden tity function, the classiﬁcation rule becomes the maxim um depth classiﬁer. The maximum depth classiﬁer is certainly quite intuitiv e and easy to implement. In addition, it can deal with classiﬁcation problems inv olving a large num b er of groups. Conv ersely , the DD- classiﬁer is more ﬂexible, but it requires the degree of the p olynomial function for whic h the misclassiﬁcation rate is minimized to b e searc hed for. The pro cedure can b e applied to an y kind of data, providing that a corresponding depth function exists. F or instance, DD-plot for functional data hav e b een developed ( Cuesta-Alb ertos et al. , 2017 ) and also to directional data by means of standard global depths ( P andolfo and D’Ambrosio , 2021 ; Demni et al. , 2019 ). Ob viously , an important asp ect to be considered regards the c hoice of the depth function. Indeed, from a classiﬁcation p ersp ectiv e, it m ust b e noted that the halfspace and simplicial depths assign zero depth v alue to all those points which do es not belong to the conv ex h ull of the supp ort of distribution. This implies that sample p oints lying outside the conv ex hull of the training set ha ve zero empirical depth, th us it is not p ossible to assign an observ ation to one of the comp eting groups. On the contrary , this do es not o ccur by adopting distance-based depths since they are alw ays positive for an y data p oint in the sample space. While existing works hav e applied DD-classiﬁers with glob al depth functions, this approach ma y not capture local structure in complex directional distributions. In the next sections, we in tro duce a lo c al version of the cosine distance depth and develop a corresp onding DD-classiﬁer that adapts to the scale of lo calit y . 3 4 Lo cal Cosine Distance Depth T o develop a notion of depth capable of describing local features and mode(s) in directional dis- tributions, Agostinelli and Romanazzi ( 2012 ) proposed a local v ersion of the AS D by constraining the size of the spherical simplices, while later on Pandolfo ( 2022 ) prop osed a local exten tion of the distance-based depths b y restricting a global distance-based depth measure by considering only the p oin ts within a certain distance to a giv en p oin t. Dra wing inspiration from the work of P aindav eine and V an Bev er ( 2013 ), w e prop ose a local v ersion of the cosine distance depth. This is deriv ed by calculating the depth with respect to the empirical distribution asso ciated with the sample obtained b y adding the reﬂections of the original observ ations with respect to a given point x to the original sample. W e provide the deﬁnition of this depth below and brieﬂy discuss wh y this approach can be problematic or even fail altogether with other measures of angular depth. T o do that, we need ﬁrst to deﬁne a symmetric reﬂector op erator on the unit hypersphere. Giv en a p oint x i ∈ S q − 1 , another p oin t x j ∈ S q − 1 can b e reﬂected symmetrically through x i as follo ws: R ( x j , x i ) = 2 x i ⟨ x i , x j ⟩ − x j , where ⟨ · , · ⟩ is the scalar pro duct b et ween the t wo vectors. This op erator has the follo wing prop- erties: 1. R ( − x i , x i ) = − x i , 2. R ( x j , x i ) = R ( x j , − x i ) , 3. Giv en a distance function d ( · , · ) deﬁned on the unit hypersphere: d ( x i , R ( x j , x i )) = d ( x i , x j ) . Consider a sample X on S q − 1 and an y given point x i ∈ X . Let X − i := X \ { x i } denote the sample without x i . The set of all suc h reﬂected p oin ts is denoted b y R i := { R ( x j , x i ) : x j ∈ X − i } , and the augmented sample is X R i := X ∪ R i . Here w e assume that | X | = n , so | X − i | = n − 1 and | X R i | = 2 n − 1 . x i should b e the depth median of its o wn reﬂected region, ho wev er this is not alw ays true. When considering CDD, it is p ossible to iden tify a condition that precisely deﬁnes when this o ccurs. Prop osition 1. Given a p oint x i ∈ X ⊆ S q − 1 and the r eﬂe cte d r e gion X R i , we have: x i = argmax x j ∈ X CDD  x j , X R i  if 1 + 2 X k  = i ⟨ x i , x k ⟩ > 0 argmax x k ∈ X − i d cos ( x i , x k ) = argmax x j ∈ X CDD  x j , X R i  if 1 + 2 X k  = i ⟨ x i , x k ⟩ < 0 X = argmax x j ∈ X CDD  x j , X R i  if 1 + 2 X k  = i ⟨ x i , x k ⟩ = 0 wher e d cos ( x, y ) = 1 − ⟨ x, y ⟩ . Henc e, x i is either a depth me dian or antip o dal to the depth me dian of the r e gion. Pr o of. Let X = { x 1 , . . . , x n } ⊂ S q − 1 . F or a ﬁxed x i , deﬁne the reﬂected set through x i : X R i =  x R i k : x R i k = 2 x i ⟨ x i , x k ⟩ − x k , k  = i  . The CDD of a p oin t x j with resp ect to ( X, X R i ) is CDD  x j , X R i  = 2 − 1 2( n − 1)   X k  = j d cos ( x j , x k ) + X k  = i d cos  x j , x R i k    . 4 Maximizing CDD is equiv alen t to minimizing f ( x j ) = X k  = j d cos ( x j , x k ) + X k  = i d cos  x j , x R i k  . F or the ﬁrst sum, since d cos ( x j , x j ) = 0 , X k  = j d cos ( x j , x k ) = n X k =1 [1 − ⟨ x j , x k ⟩ ] = n − ⟨ x j , S ⟩ , where S = P n k =1 x k . F or the second sum, using ⟨ x j , x R i k ⟩ = 2 ⟨ x i , x k ⟩⟨ x j , x i ⟩ − ⟨ x j , x k ⟩ , w e obtain d cos  x j , x R i k  = 1 − 2 ⟨ x i , x k ⟩⟨ x j , x i ⟩ + ⟨ x j , x k ⟩ . Summing ov er k  = i : X k  = i d cos  x j , x R i k  = ( n − 1) − 2 ⟨ x j , x i ⟩ X k  = i ⟨ x i , x k ⟩ + ⟨ x j , X k  = i x k ⟩ . Th us f ( x j ) =  n − ⟨ x j , S ⟩  +  ( n − 1) − 2 ⟨ x j , x i ⟩ A + ⟨ x j , X k  = i x k ⟩  , where A = P k  = i ⟨ x i , x k ⟩ . Since S = x i + P k  = i x k , we ha v e −⟨ x j , S ⟩ + ⟨ x j , X k  = i x k ⟩ = −⟨ x j , x i ⟩ . Therefore f ( x j ) = 2 n − 1 − ⟨ x j , x i ⟩ − 2 ⟨ x j , x i ⟩ A. Let v = ⟨ x j , x i ⟩ , and g ( v ) = f ( x j ) . Then g ( v ) = 2 n − 1 − v (1 + 2 A ) . Since g is linear in v , and v ∈ [ − 1 , 1] for unit vectors, the minimizer dep ends on the sign of C = 1 + 2 A : • If C > 0 : g is decreasing in v . Minim um o ccurs at the largest p ossible v in X , which is v = 1 attained at x j = x i . Hence x i is the unique minimizer of f ⇒ x i maximizes CDD. • If C < 0 : g is increasing in v . Minim um o ccurs at the smallest p ossible v in X . The smallest v is − 1 if − x i ∈ X ; otherwise it is min x j ∈ X ⟨ x j , x i ⟩ , whic h corresponds to max x j ∈ X d cos ( x i , x j ) . Hence the maximizer of d cos ( x i , · ) in X − i maximizes CDD. • If C = 0 : g is constan t w.r.t. v ⇒ all x j ∈ X give the same CDD ⇒ every point in X is a maximizer. In the case C < 0 , the farthest p oin t from x i in cosine distance is closest to the antipo de − x i . Th us, x i is either the depth median of X R i (when C > 0 ), or its an tipo de (when C < 0 ). This result has an intuitiv e in terpretation: when x i is cen trally located relativ e to other p oin ts (p ositiv e C ), it becomes the depth median of its reﬂected region. When x i is p eripheral (negativ e C ), its an tip ode becomes more cen tral in the reﬂected region. Note that it ma y happ en that a p oin t x i is not the depth median of its own reﬂected region. In particular, when a p oin t x i is an tip o dal to the depth median of X R i , then the p oints should b e reordered in an increasing order of depth. In suc h case, points with lo wer depth are more similar to x i according to prop ert y P3 . No w, we can deﬁne the depth-based neigh b ourho od of a certain p oint x i at level β by using an y angular depth measures AD( · , · ) . 5 Deﬁnition 2. ( β -Depth b ase d neighb ourho o d). The β -Depth neighb ourho o d of a p oint x i ∈ X ⊆ S q − 1 note d D N ( β ) i , with β ∈ (0 , 1] , is deﬁne d as the set of the ﬁrst β ( n − 1) p oints in X − i r e or der e d in the fol lowing way: AD  x j , X R i  > AD  x k , X R i  > . . . > AD  x n , X R i  if x i = argmax x j AD  x j , X R i  ; AD  x j , X R i  < AD  x k , X R i  < . . . < AD  x n , X R i  if x i = argmin x j AD  x j , X R i  . Prop osition 2. Consider a p oint x i ∈ X ⊆ S q − 1 and the β -depth b ase d neighb ourho o d DN ( β ) i , and let C N i denote the set of the ﬁrst β ( n − 1) p oints in X − i r e or der e d by incr e asing c osine distanc e fr om x i : d cos ( x j , x i ) < d cos ( x k , x i ) < . . . < d cos ( x n , x i ) , then if the CDD is adopte d as the angular depth me asur e, we have D N ( β ) i = C N i . Pr o of. F rom Prop osition 1 , w e know there are three cases dep ending on C = 1 + 2 P k  = i ⟨ x i , x k ⟩ : Case 1: C > 0 Here x i = argmax x k CDD( x k , X R i ) , so b y Deﬁnition 2 , points are ordered by decreasing CDD. W e need to show: CDD( x j , X R i ) > CDD( x l , X R i ) ⇐ ⇒ d cos ( x i , x j ) < d cos ( x i , x l ) . F rom the proof of Prop osition 1 , w e hav e: f ( x j ) = 2 n − 1 − ⟨ x i , x j ⟩ (1 + 2 A ) , where A = P k  = i ⟨ x i , x k ⟩ . Since maximizing CDD is equiv alent to minimizing f , we ha ve: CDD( x j , X R i ) > CDD( x l , X R i ) ⇐ ⇒ f ( x j ) < f ( x l ) . No w, f ( x l ) − f ( x j ) = − [ ⟨ x i , x l ⟩ − ⟨ x i , x j ⟩ ](1 + 2 A ) . Since C = 1 + 2 A > 0 in this case: f ( x l ) − f ( x j ) > 0 ⇐ ⇒ ⟨ x i , x l ⟩ − ⟨ x i , x j ⟩ < 0 ⇐ ⇒ ⟨ x i , x j ⟩ > ⟨ x i , x l ⟩ . Con verting to cosine distances: ⟨ x i , x j ⟩ > ⟨ x i , x l ⟩ ⇐ ⇒ 1 − d cos ( x i , x j ) > 1 − d cos ( x i , x l ) ⇐ ⇒ d cos ( x i , x j ) < d cos ( x i , x l ) . Th us, in Case 1, ordering by decreasing CDD is equiv alent to ordering b y increasing cosine distance from x i . Case 2: C < 0 Here x i = argmin x k CDD( x k , X R i ) , so b y Deﬁnition 2 , p oin ts are ordered b y increasing CDD. W e need to show: CDD( x j , X R i ) < CDD( x l , X R i ) ⇐ ⇒ d cos ( x i , x j ) < d cos ( x i , x l ) . Since f ( x j ) = 2 n − 1 − ⟨ x i , x j ⟩ (1 + 2 A ) and C = 1 + 2 A < 0 , f is increasing in v = ⟨ x i , x j ⟩ . Therefore: f ( x j ) < f ( x l ) ⇐ ⇒ ⟨ x i , x j ⟩ < ⟨ x i , x l ⟩ . Recall that maximizing CDD is equiv alent to minimizing f , so: CDD( x j , X R i ) > CDD( x l , X R i ) ⇐ ⇒ f ( x j ) < f ( x l ) ⇐ ⇒ ⟨ x i , x j ⟩ < ⟨ x i , x l ⟩ . 6 T aking the con trapositive: CDD( x j , X R i ) < CDD( x l , X R i ) ⇐ ⇒ ⟨ x i , x j ⟩ > ⟨ x i , x l ⟩ . Con verting to cosine distances: ⟨ x i , x j ⟩ > ⟨ x i , x l ⟩ ⇐ ⇒ 1 − d cos ( x i , x j ) > 1 − d cos ( x i , x l ) ⇐ ⇒ d cos ( x i , x j ) < d cos ( x i , x l ) . Th us, in Case 2, ordering by increasing CDD is also equiv alent to ordering b y increasing cosine distance from x i . Case 3: C = 0 Here all points hav e equal CDD, so an y ordering yields the same set D N ( β ) i , whic h equals C N i trivially . Therefore, in all cases, D N ( β ) i = C N i . Hence, deﬁning the β -depth neigh b ourhoo d of a p oin t x i is equiv alent to looking for the β ( n − 1) nearest neigh b ours of x i . This result was already encountered b y Painda veine and V an Bever ( 2013 ) for depths in R q for q = 1 , while here it holds for q ≥ 1 as long as the CDD is considered. This allo ws us to compute the reﬂected region and the depth of a p oin t by lo oking for the nearest points. This wa y , the lo cal cosine distance-depth can b e deﬁned as follows. Deﬁnition 3. (L o c al c osine distanc e depth). The lo c al c osine distanc e depth of a p oint x i ∈ X ⊆ S q − 1 at a lo c ality level β ∈ (0 , 1] is deﬁne d as fol lows: LCDD ( β ) ( x i , X ) = CDD  x i , D N ( β ) i  . When β is set equal to one, the global cosine distance depth is obtained. It is w orth noting that the chord or the arc distance depths do not guaran tee that a p oint x i is a depth median or an tip o dal to the depth median of X R i . This mak es reordering the p oin ts according to depth in a giv en region diﬃcult, if p ossible at all. This o ccurs because c hord and arc distances lack the linear structure that makes the minimization problem in Prop osition 1 tractable. Speciﬁcally , the function analogous to f ( x j ) in the pro of do es not simplify to a linear function of ⟨ x i , x j ⟩ for these distances, making it imp ossible to derive a simple condition for when x i is the depth median of X R i . Note that since cosine distance and neighbourho ods are rotation-in v arian t, so to o are nearest neigh b ours and local cosine distance depth, i.e. LCDD ( β ) ( O x i , O X ) = LCDD ( β ) ( x i , X ) . LCDD is also bounded and strictly positive, i.e. 0 < LCDD ( β ) ( x i , X ) ≤ 2 for a non-point mass X , since neigh b ourho ods are ﬁnite and the CDD is strictly positive on S q − 1 . LCDD do es not satisfy the monotonicit y prop ert y ( P3 ). This is exp ected and desirable: it is designed to capture lo c al centralit y , and ma y exhibit m ultiple local maxima in m ultimo dal distributions. So, it is monotone in lo calit y . As β → 1 , LCDD ( β ) ( x ) → CDD ( x ) , reco vering the global depth. Figure 1 depicts the behaviour of the prop osed lo cal cosine distance compared with its global v ersion in the case of a trimodal spherical distribution, showing their con tour plots (for β = 0 . 25 ) alongside the densit y con tours of the data. F or the sake of illustration, the plots are t wo-dimensional and the data are reported in spherical co ordinates. As one can see, the CDD fail to capture the three mo des, iden tifying only a single global center. In contrast, the prop osed LCDD (with β = 0 . 25 ) clearly identiﬁes all the three modes. Theorem 3 (Sample b eha viour in β -neigh b orho ods) . F or 0 < β 1 < β 2 ≤ 1 , 1. DN ( β 1 ) x ⊆ DN ( β 2 ) x 2. LCDD ( β 1 ) ( x, X ) ≥ LCDD ( β 2 ) ( x, X ) Pr o of. By Prop osition 2 , w e hav e DN ( β ) x = CN ( β ) x for all β . Let r (1) ( x ) ≤ r (2) ( x ) ≤ · · · ≤ r ( n − 1) ( x ) denote the ordered cosine distances from x to points in X \ { x } . Deﬁne k β = ⌊ β ( n − 1) ⌋ , the in teger part of β ( n − 1) . Since β 1 < β 2 , we ha v e k β 1 < k β 2 . 7 (a) (b) (c) Figure 1: Density con tour plot of a trimo dal distribution on the sphere (a), along with the corre- sp onding con tour plots CDD (b) and LCDD with β = 0 . 25 (c). By deﬁnition, CN ( β 1 ) x = { x j ∈ X \ { x } : r j ( x ) ≤ r ( k β 1 ) ( x ) } and CN ( β 2 ) x = { x j ∈ X \ { x } : r j ( x ) ≤ r ( k β 2 ) ( x ) } . Since r ( k β 1 ) ( x ) ≤ r ( k β 2 ) ( x ) , every point in CN ( β 1 ) x also satisﬁes r j ( x ) ≤ r ( k β 2 ) ( x ) , hence b elongs to CN ( β 2 ) x . Therefore, DN ( β 1 ) x = CN ( β 1 ) x ⊆ CN ( β 2 ) x = DN ( β 2 ) x . Recall the lo cal cosine depth for a p oin t x at level β is deﬁned as LCDD ( β ) ( x, X ) = 2 − 1 k β k β X j =1 r ( j ) ( x ) . Let S β 1 = k β 1 X j =1 r ( j ) ( x ) and S β 2 = k β 2 X j =1 r ( j ) ( x ) . Since k β 2 > k β 1 , we can write S β 2 = S β 1 + ∆ S , where ∆ S = k β 2 X j = k β 1 +1 r ( j ) ( x ) . No w consider the a v erage distances: S β 1 k β 1 = 1 k β 1 k β 1 X j =1 r ( j ) ( x ) 8 and S β 2 k β 2 = 1 k β 2 ( S β 1 + ∆ S ) . Since r ( j ) ( x ) are non-decreasing, for an y j with k β 1 < j ≤ k β 2 , w e hav e r ( j ) ( x ) ≥ r ( k β 1 ) ( x ) . Moreo ver, r ( k β 1 ) ( x ) ≥ S β 1 k β 1 , b ecause the maxim um of the ﬁrst k β 1 distances is at least their av erage. Therefore, ∆ S = k β 2 X j = k β 1 +1 r ( j ) ( x ) ≥ ( k β 2 − k β 1 ) · S β 1 k β 1 . This inequality implies S β 1 + ∆ S ≥ S β 1 + ( k β 2 − k β 1 ) · S β 1 k β 1 = S β 1  1 + k β 2 − k β 1 k β 1  = S β 1 · k β 2 k β 1 . Dividing by k β 2 giv es S β 2 k β 2 ≥ S β 1 k β 1 . Finally , since LCDD ( β ) ( x, X ) = 2 − S β k β , we obtain LCDD ( β 1 ) ( x, X ) = 2 − S β 1 k β 1 ≥ 2 − S β 2 k β 2 = LCDD ( β 2 ) ( x, X ) . This theorem establishes tw o key prop erties: (i) neigh b orho ods expand as β increases (nesting prop ert y), and (ii) local depth decreases as neighborho o ds expand. This is in tuitive: as w e consider larger neighborho ods, the a verage distance to neighbors increases, reducing the measure of lo cal cen trality . 4.1 P opulation v ersion and prop erties While Deﬁnition 3 pro vides a computationally tractable measure for ﬁnite samples, theoretical analysis requires its p opulation coun terpart. Deﬁnition 4 (Population Local Cosine Distance Depth) . L et F b e a distribution on S q − 1 with c ontinuous density f that is b ounde d away fr om zer o on its supp ort. F or x ∈ S q − 1 and β ∈ (0 , 1] , let ρ ( β ) ( x ) b e the unique r adius satisfying: F  { y ∈ S q − 1 : d cos ( x, y ) ≤ ρ ( β ) ( x ) }  = β The p opulation lo cal cosine distance depth of x with r esp e ct to F at lo c ality level β is deﬁne d as: LCDD ( β ) ( x, F ) := 2 − 1 β  E Y ∼ F h d cos ( x, Y ) · I { d cos ( x, Y ) ≤ ρ ( β ) ( x ) } i  Remark 1. Equivalently, LCDD ( β ) ( x, F ) = CDD ( x, F β ,x ) , wher e F β ,x denotes the c onditional distribution of F r estricte d to the ge o desic b al l of pr ob ability mass β ar ound x . This makes explicit the c onne ction b etwe en lo c al and glob al depth me asur es. Monotonicit y in β holds at the population level, as the conditional exp ectation of d cos is non- decreasing in the radius r ( β ) ( x ) . Prop osition 4 (Con tinuit y with resp ect to β ) . L et F b e a distribution on S q − 1 with c ontinuous density f b ounde d away fr om zer o on its supp ort. F or β ∈ (0 , 1) and ∆ β > 0 with β + ∆ β ≤ 1 , sup x ∈ S q − 1   LCDD ( β ) ( x, F ) − LCDD ( β +∆ β ) ( x, F )   = O (∆ β ) . 9 Pr o of. F or ﬁxed x ∈ S q − 1 , let ρ ( β ) ( x ) b e the unique radius satisfying F  { y ∈ S q − 1 : d cos ( x, y ) ≤ ρ ( β ) ( x ) }  = β . Since f is con tinuous and bounded aw a y from zero, the quan tile function β 7→ ρ ( β ) ( x ) is Lipsc hitz con tinuous in β , uniformly in x . Deﬁne the conditional exp ectations: µ β ( x ) = E [ d cos ( x, Y ) | d cos ( x, Y ) ≤ ρ ( β ) ( x )] and µ β +∆ β ( x ) = E [ d cos ( x, Y ) | d cos ( x, Y ) ≤ ρ ( β +∆ β ) ( x )] . Let A x = { y : ρ ( β ) ( x ) < d cos ( x, y ) ≤ ρ ( β +∆ β ) ( x ) } , so that F ( A x ) = ∆ β . Then w e can write: µ β +∆ β ( x ) = β β + ∆ β µ β ( x ) + ∆ β β + ∆ β µ A ( x ) , where µ A ( x ) = E [ d cos ( x, Y ) | Y ∈ A x ] . Rearranging giv es: µ β +∆ β ( x ) − µ β ( x ) = ∆ β β + ∆ β ( µ A ( x ) − µ β ( x )) . No w, since ρ ( β ) ( x ) ≤ d cos ( x, y ) ≤ ρ ( β +∆ β ) ( x ) for y ∈ A x , we ha v e: µ A ( x ) ∈ [ ρ ( β ) ( x ) , ρ ( β +∆ β ) ( x )] . Also, by deﬁnition, µ β ( x ) ≤ ρ ( β ) ( x ) . Therefore, | µ A ( x ) − µ β ( x ) | ≤ ρ ( β +∆ β ) ( x ) − µ β ( x ) ≤ ρ ( β +∆ β ) ( x ) − 0 ≤ 2 , since d cos ∈ [0 , 2] . How ev er, we can obtain a tighter bound using the contin uity of ρ ( β ) ( x ) . F rom the Lipsc hitz con tinuit y of ρ ( β ) ( x ) in β , there exists L > 0 such that for all x : | ρ ( β +∆ β ) ( x ) − ρ ( β ) ( x ) | ≤ L ∆ β . Th us, | µ A ( x ) − µ β ( x ) | ≤ | ρ ( β +∆ β ) ( x ) − µ β ( x ) | ≤ | ρ ( β +∆ β ) ( x ) − ρ ( β ) ( x ) | + | ρ ( β ) ( x ) − µ β ( x ) | ≤ L ∆ β + | ρ ( β ) ( x ) − µ β ( x ) | . The term | ρ ( β ) ( x ) − µ β ( x ) | is bounded b y a constant M uniformly in x because f is b ounded a wa y from zero, ensuring the conditional distribution is not to o concentrated at the b oundary . Hence, | µ A ( x ) − µ β ( x ) | ≤ L ∆ β + M . Returning to the diﬀerence: | µ β +∆ β ( x ) − µ β ( x ) | = ∆ β β + ∆ β | µ A ( x ) − µ β ( x ) | ≤ ∆ β β ( L ∆ β + M ) = O (∆ β ) . Since LCDD ( β ) ( x, F ) = 2 − µ β ( x ) , we ha v e: | LCDD ( β ) ( x, F ) − LCDD ( β +∆ β ) ( x, F ) | = | µ β +∆ β ( x ) − µ β ( x ) | = O (∆ β ) . The b ound holds uniformly in x b ecause all constants ( L , M , and the implicit constant in O (∆ β ) ) are independent of x due to the compactness of S q − 1 and the uniform b ounds on f . Corollary 1 (Limit behavior) . L et F b e a distribution on S q − 1 with c ontinuous density f b ounde d away fr om zer o on its supp ort. Then for al l x ∈ S q − 1 , lim β → 0 + LCDD ( β ) ( x, F ) = 2 , lim β → 1 − LCDD ( β ) ( x, F ) = CDD( x, F ) , wher e CDD( x, F ) = 2 − E F [ d cos ( x, Y )] is the p opulation c osine distanc e depth. 10 Pr o of. Recall that LCDD ( β ) ( x, F ) = 2 − µ β ( x ) , where µ β ( x ) = E [ d cos ( x, Y ) | d cos ( x, Y ) ≤ ρ ( β ) ( x )] and ρ ( β ) ( x ) satisﬁes F { d cos ( x, Y ) ≤ ρ ( β ) ( x ) } = β . (i) As β → 0 + , the radius ρ ( β ) ( x ) → 0 b ecause the densit y f is bounded a w ay from zero. More precisely , since f is con tinuous and positive, for small β we ha ve ρ ( β ) ( x ) = O ( β ) . F or an y Y with d cos ( x, Y ) ≤ ρ ( β ) ( x ) , w e ha ve 0 ≤ d cos ( x, Y ) ≤ ρ ( β ) ( x ) → 0 . By dominated con vergence (since d cos ≤ 2 ), w e obtain: lim β → 0 + µ β ( x ) = lim β → 0 + E [ d cos ( x, Y ) | d cos ( x, Y ) ≤ ρ ( β ) ( x )] = 0 . Therefore, lim β → 0 + LCDD ( β ) ( x, F ) = 2 − 0 = 2 . (ii) As β → 1 − , the radius ρ ( β ) ( x ) increases tow ard ρ max ( x ) = inf { t ≥ 0 : F { d cos ( x, Y ) ≤ t } = 1 } , whic h is the smallest radius suc h that the ball of that radius around x con tains the en tire support of F . The conditional distribution giv en d cos ( x, Y ) ≤ ρ ( β ) ( x ) con v erges w eakly to the unconditional distribution F as β → 1 − . By the bounded conv ergence theorem (since d cos is b ounded), lim β → 1 − µ β ( x ) = E F [ d cos ( x, Y )] . Therefore, lim β → 1 − LCDD ( β ) ( x, F ) = 2 − E F [ d cos ( x, Y )] = CDD( x, F ) , where CDD( x, F ) is the p opulation version of the cosine distance depth deﬁned as the exp ected v alue of 2 − d cos ( x, Y ) . These limits hav e intuitiv e interpretations: as β → 0 + , the neighborho o d shrinks to a p oin t, so the a verage distance goes to 0 and LCDD approaches its maxim um v alue 2. As β → 1 − , the neigh b orho od expands to cov er the entire distribution, reco vering the global CDD. Remark 2. Under appr opriate r e gularity c onditions (c ontinuity of F and b ounde dness away fr om zer o) the sample version (Deﬁnition 3 ) is a uniformly c onsistent estimator of the p opulation LCDD: as n → ∞ , the empiric al β ( n − 1) ne ar est neighb ors c onver ge to the p opulation ge o desic b al l c on- taining mass β , and the sample aver age of c osine distanc es c onver ges to the p opulation exp e ctation. The following lemma establishes that this con vergence is uniform o ver the h yp ersphere. Lemma 1 (Uniform consistency of LCDD) . L et F n denote the empiric al me asur e of a r andom sample X 1 , . . . , X n b e i.i.d. fr om a distribution F on S q − 1 with density f that is c ontinuous and b ounde d away fr om zer o on its supp ort. F or any β ∈ (0 , 1] , we have sup x ∈ S q − 1    LCDD ( β ) ( x, F n ) − LCDD ( β ) ( x, F )    a.s. − − → 0 as n → ∞ , wher e LCDD ( β ) ( x, F n ) denotes the sample LCDD b ase d on X n = { X 1 , . . . , X n } . Pr o of. Let k n = ⌊ β ( n − 1) ⌋ . F or x ∈ S q − 1 , let r (1) n ( x ) ≤ r (2) n ( x ) ≤ · · · ≤ r ( n ) n ( x ) denote the ordered cosine distances { d cos ( x, X i ) } n i =1 . Recall the deﬁnitions: LCDD ( β ) ( x, F ) = 2 − µ β ( x ) , where µ β ( x ) = 1 β E F  d cos ( x, Y ) I { d cos ( x, Y ) ≤ ρ ( β ) ( x ) }  , with ρ ( β ) ( x ) satisfying F { d cos ( x, Y ) ≤ ρ ( β ) ( x ) } = β . 11 The sample LCDD is: LCDD ( β ) ( x, F n ) = 2 − ˆ µ n ( x ) , where ˆ µ n ( x ) = 1 k n k n X i =1 r ( i ) n ( x ) . W e prov e uniform conv ergence in three steps. Step 1 Deﬁne the empirical pro cess F n ( t ; x ) = 1 n n X i =1 I { d cos ( x, X i ) ≤ t } , and its p opulation coun terpart F ( t ; x ) = P ( d cos ( x, Y ) ≤ t ) . Since the class of sets {{ y : d cos ( x, y ) ≤ t } : x ∈ S q − 1 , t ∈ [0 , 2] } is a VC-class (as geodesic balls on the sphere), we ha ve b y the uniform Glivenk o-Can telli theorem: sup x ∈ S q − 1 sup t ∈ [0 , 2] | F n ( t ; x ) − F ( t ; x ) | a.s. − − → 0 . Let ρ ( β ) ( x ) b e the p opulation β -quantile: F ( ρ ( β ) ( x ); x ) = β . Deﬁne the empirical quan tile ˆ ρ n ( x ) = r ( k n ) n ( x ) , which satisﬁes F n ( ˆ ρ n ( x ); x ) = k n /n → β . By the uniform con tin uity of F ( t ; x ) in t (implied b y f b eing b ounded aw ay from zero) and uniform conv ergence of F n , we obtain: sup x ∈ S q − 1 | ˆ ρ n ( x ) − ρ ( β ) ( x ) | a.s. − − → 0 . Step 2 Deﬁne the truncated empirical pro cess: G n ( t ; x ) = 1 n n X i =1 d cos ( x, X i ) I { d cos ( x, X i ) ≤ t } , and its p opulation coun terpart G ( t ; x ) = E [ d cos ( x, Y ) I { d cos ( x, Y ) ≤ t } ] . The class of functions F = { ( x, y ) 7→ d cos ( x, y ) I { d cos ( x, y ) ≤ t } : x ∈ S q − 1 , t ∈ [0 , 2] } is uniformly bounded (b y 2) and, as a product of a Lipsc hitz function d cos ( x, · ) with an indicator of a VC-class, is itself a Gliv enko-Can telli class. Therefore, sup x ∈ S q − 1 sup t ∈ [0 , 2] | G n ( t ; x ) − G ( t ; x ) | a.s. − − → 0 . No w, by the con tinuous mapping theorem and Step 1, G n ( ˆ ρ n ( x ); x ) a.s. − − → G ( ρ ( β ) ( x ); x ) = β µ β ( x ) uniformly in x . But also, G n ( ˆ ρ n ( x ); x ) = 1 n n X i =1 d cos ( x, X i ) I { d cos ( x, X i ) ≤ ˆ ρ n ( x ) } = k n n · ˆ µ n ( x ) . Since k n /n → β almost surely , we conclude: sup x ∈ S q − 1 | ˆ µ n ( x ) − µ β ( x ) | a.s. − − → 0 . Step 3 F rom Steps 1 and 2, we ha ve: sup x ∈ S q − 1 | LCDD ( β ) ( x, F n ) − LCDD ( β ) ( x, F ) | = sup x ∈ S q − 1 | ˆ µ n ( x ) − µ β ( x ) | a.s. − − → 0 . This establishes the desired uniform almost sure conv ergence. 12 Corollary 2. Under the assumptions of L emma 1 , for any β ∗ ∈ (0 , 1] : sup x ∈ S q − 1    LC D D ( β ∗ ) ( x, ˆ F n ) − LC D D ( β ∗ ) ( x, F )    p − → 0 Note that the con tinuit y result in Prop osition 4 do es not hold in the empirical case, where the LCDD is in general a piecewise constant function of β . Sp eciﬁcally , for a ﬁxed p oin t x and k ∈ { 1 , 2 , . . . , n − 1 } , the LCDD ( β ) ( x, ˆ F n ) remains constan t on each in terv al  k n − 1 , k +1 n − 1  , while con tinuit y at β = k / ( n − 1) holds if and only if the cosine distance of the k nearest p oin ts coincides with the a verage of the previous distances. Since this equality is highly unlikely in non-degenerate samples, the empirical LCDD is typically discon tinuous in β . 5 DD-classiﬁer with lo cal depth The depth of a given p oin t characterizes its lo cation w.r.t. the whole distribution. Th us the classiﬁers whic h use an y global depth function p erform w ell only if the considered distributions ha ve some global prop erties like symmetry or unimo dalit y . T o obtain goo d p erformance also in more general settings, the use of some lo cal depth should b e preferred. The problem that emerged and need to b e handle is c hoice of lo calization lev el. The ﬁrst classiﬁer which emplo yed local depth w as proposed in 2013 b y Hlubink a and V encalek ( 2013 ) who used a weigh ted halfspace depth. Later on, Painda veine and V an Bev er ( 2013 ) dev eloped the more complex approach we tak e inspiration from for our prop osal. Ho wev er, the just cited works which enables lo calization of any global depth function which is then used in the maxim um depth classiﬁer. Here, we use the prop osed lo cal cosine distance depth to b e applied in the DD-plot, where then a p olynomial separating function is adopted to discriminate b et ween groups. W e focus on tw o-class classiﬁcation problem. Let { X 1 , . . . , X m } ( ≡ X ) and { Y 1 , . . . , Y n } ( ≡ Y ) b e t wo random samples from F 1 and F 2 , resp ectiv ely , whic h are distributions deﬁned on S q − 1 . As seen in Pandolfo ( 2022 ) and from the deﬁnition of the DD-plot, if F 1 = F 2 , then DD-plot should b e concentrated along the 45-degree line. Conv ersely , if the tw o distributions diﬀer, the DD-plot w ould exhibit a departure from the 45-degree line. Hence, given a locality lev el β , the prop osed classiﬁer is then deﬁned as C β ,s ( x ) = ( 2 , if LC D D ( β ) ( x, ˆ F 2 ) ≥ s  LC D D ( β ) ( x, ˆ F 1 )  , 1 , otherwise. F or any given s ( · ) , we then dra w a curv e corresponding to y = s ( x ) in the DD-plot, and assign the observ ations ab ov e the curv e to F 1 and those b elo w it to F 2 , and then calculate the empirical misclassiﬁcation rate, that is ˆ R s = π 1 m m X i =1 I { LC D D ( β ) ( X i , ˆ F 1 ) ≤ s ( LC D D ( β ) ( X i , ˆ F 2 )) } + π 2 n n X i =1 I { LC D D ( β ) ( Y i , ˆ F 1 ) >s ( LC D D ( β ) ( Y i , ˆ F 2 )) } . (1) Where π i are the prior probabilities of the t wo classes, N = ( m, n ) , and I A is the indicator function whic h tak es 1 if A is true and 0 otherwise. Hence, s ( · ) is estimated to minimize ˆ R s . F ollowing Li et al. ( 2012 ), here we consider s ( x ) = k 0 X i =1 a i x i , where k 0 is the giv en degree of the p olynomial and a = ( a 1 , . . . , a k 0 ) ∈ R k 0 is the coeﬃcient vector of the p olynomial. Theorem 5 (Bay es Consistency of LCDD-DD Classiﬁer) . L et F 1 , F 2 b e distributions on S q − 1 with c ontinuous densities f 1 , f 2 b ounde d away fr om zer o, and let π 1 , π 2 > 0 b e class priors with π 1 + π 2 = 1 . L et R b e a c omp act class of functions (p ointwise c omp act) c onsisting of p olynomials of de gr e e at most k 0 . 13 Deﬁne the empiric al LCDD depths D ( β ) ˆ F j ( z ) := LCDD ( β ) ( z , ˆ F j ) for j = 1 , 2 , wher e ˆ F 1 , ˆ F 2 ar e the empiric al distributions fr om samples of sizes n 1 , n 2 r esp e ctively, with total sample size N = n 1 + n 2 . Consider the classiﬁer: C β ,s ( z ) = ( 2 if D ( β ) ˆ F 2 ( z ) ≥ s  D ( β ) ˆ F 1 ( z )  1 otherwise Assume the fol lowing c onditions hold: (A1) Ther e exists a unique β ∗ ∈ (0 , 1] and s B ∈ R such that the classiﬁer C β ∗ ,s B e quals the Bayes classiﬁer almost everywher e, i.e., C β ∗ ,s B ( z ) = I { π 2 f 2 ( z ) > π 1 f 1 ( z ) } a.e. (A2) F or j = 1 , 2 and any β ∈ (0 , 1] , sup z ∈ S q − 1   D ( β ) ˆ F j ( z ) − D ( β ) F j ( z )   p − → 0 as n j → ∞ . (A3) R is c omp act in the top olo gy of p ointwise c onver genc e. (A4) F or e ach N , let B N ⊂ (0 , 1] b e a ﬁnite grid such that max β ,β ′ ∈ B N | β − β ′ | → 0 as N → ∞ . Deﬁne ˆ s N ,β = argmin s ∈R e R N ( s, β ) , wher e e R N is the empiric al risk, and sele ct ˆ β N = argmin β ∈ B N CV( ˆ s N ,β , β ) , wher e CV denotes cr oss-validate d err or. Final ly, set ˆ s N = ˆ s N , ˆ β N and ˆ C N = C ˆ β N , ˆ s N . Then, as N → ∞ with n j / N → λ j ∈ (0 , 1) for j = 1 , 2 , we have: (i) ˆ β N p − → β ∗ ; (ii) ˆ s N p − → s B p ointwise; (iii) R ( ˆ C N ) p − → R Bay es = R ( C β ∗ ,s B ) . Assumption (A1) r e quir es that ther e exists some lo c ality level β ∗ for which the LCDD-b ase d classi- ﬁer achieves the Bayes err or. This is a r e asonable assumption when the data have lo c al structur e that c an b e c aptur e d at an appr opriate sc ale. In pr actic e, even if the exact Bayes classiﬁer is not achievable with p olynomial sep ar ators in the LCDD sp ac e, the the or em guar ante es that our data-driven pr o c e dur e wil l appr o ach the b est p ossible p erformanc e within the class R . Pr o of. W e adapt the framework of Li et al. ( 2012 ) to our LCDD-based classiﬁer with data-driv en selection of the lo calit y parameter β . Step 1: By Proposition 4 , for any β ∈ (0 , 1) and ∆ β > 0 with β + ∆ β ≤ 1 , sup x ∈ S q − 1   LCDD ( β ) ( x, F j ) − LCDD ( β +∆ β ) ( x, F j )   = O (∆ β ) , j = 1 , 2 , uniformly in x . In particular, for each ﬁxed z , the map β 7→ D ( β ) F j ( z ) is contin uous on (0 , 1] . The misclassiﬁcation error of the classiﬁer C β ,r is: R ( β , s ) = π 1 P F 1 ( C β ,s ( Z ) = 2) + π 2 P F 2 ( C β ,s ( Z ) = 1) . Since the classiﬁer depends on z only through ( D ( β ) F 1 ( z ) , D ( β ) F 2 ( z )) and the indicator of the decision region, the dominated conv ergence theorem implies that for each ﬁxed s ∈ R , the function β 7→ R ( β , s ) is contin uous on (0 , 1] . 14 Step 2: Fix β ∈ (0 , 1] and j ∈ { 1 , 2 } . By Lemma 1 (which implies assumption A2), sup z ∈ S q − 1   D ( β ) ˆ F j ( z ) − D ( β ) F j ( z )   a.s. − − → 0 . Since B N is ﬁnite for each N , a union bound yields: max β ∈ B N sup z ∈ S q − 1   D ( β ) ˆ F j ( z ) − D ( β ) F j ( z )   a.s. − − → 0 as n j → ∞ . Consider the class of decision sets in the depth space: D =  { ( u, v ) ∈ [0 , 2] 2 : v ≤ s ( u ) } : s ∈ R  . Since R consists of p olynomials of b ounded degree, D has ﬁnite V C dimension ( Li et al. , 2012 , Theorem 4). Therefore, D is a Glivenk o-Cantelli class, whic h implies: sup β ∈ B N sup s ∈R   b R N ( s, β ) − R ( β , s )   p − → 0 , where b R N ( s, β ) is the empirical risk of C β ,s . Step 3: Deﬁne the optimal risk function: R ∗ ( β ) = inf s ∈R R ( β , s ) , β ∈ (0 , 1] . F rom Step 1, R ( β , s ) is contin uous in β for each s , and b y compactness of R , the inﬁmum is attained and R ∗ ( β ) is contin uous. Assumption (A1) implies that β ∗ is the unique minimizer of R ∗ ( β ) ov er (0 , 1] , with R ∗ ( β ∗ ) = R Bay es . F or eac h β ∈ B N , ˆ s N ,β minimizes the empirical risk. By the uniform conv ergence in Step 2 and standard M-estimation theory: sup β ∈ B N   R ( β , ˆ s N ,β ) − R ∗ ( β )   p − → 0 . Since R ∗ is contin uous with unique minimizer β ∗ , and B N b ecomes dense in (0 , 1] as N → ∞ b y (A4), w e obtain: ˆ β N = argmin β ∈ B N CV( ˆ s N ,β , β ) p − → β ∗ . Step 4: F or ﬁxed β = β ∗ , the empirical risk minimizer satisﬁes ˆ s N ,β ∗ p − → s B b y standard consis- tency results for classiﬁcation with VC classes. The con vergence ˆ β N p − → β ∗ and con tinuit y of the risk function imply: ˆ s N = ˆ s N , ˆ β N p − → s B . Finally , for the risk con v ergence: | R ( ˆ C N ) − R Bay es | ≤ | R ( ˆ β N , ˆ s N ) − R ( ˆ β N , s B ) | + | R ( ˆ β N , s B ) − R ( β ∗ , s B ) | . The ﬁrst term con v erges to 0 because ˆ s N p − → s B and r 7→ R ( β , s ) is contin uous uniformly in β on compacts. The second term conv erges to 0 because ˆ β N p − → β ∗ and β 7→ R ( β , s B ) is contin uous. Hence, R ( ˆ C N ) p − → R ( β ∗ , s B ) = R Bay es . In the following sections, we inv estigate the practical p erformance of the LCDD-DD classiﬁer through sim ulations (Section 6 ) and real-data applications (Section 7 ). W e compare it with global depth-based classiﬁers and other directional classiﬁcation metho ds 15 6 Sim ulations Among the v arious applications of data depth, supervised classiﬁcation represents the most promi- nen t one, particularly in the con text of directional data, where the absence of a natural ordering p oses sp eciﬁc c hallenges. In this study , the prop osed lo cal depth function is ev aluated against its global coun terpart through a simulation exp eriment, in which b oth serv e as the underlying mea- sures for DD-classiﬁer training. This section presen ts the sim ulation study designed to compare the p erformance of the Global and Lo cal CDD when incorp orated in to the DD-classiﬁer. T wo distinct sim ulation scenarios are considered, eac h describ ed in detail in the following subsections and including three exp erimen tal setups. 6.1 The sim ulation design The ﬁrst sim ulation scenario aims to assess the potential classiﬁcation improv ement achiev ed by the lo cal depth compared to its global coun terpart when observ ations b elonging to the same class are distributed across multiple clusters. The second sim ulation scenario inv estigates a diﬀerent t yp e of data distribution, c haracterized b y a non-conv ex structure com bined with a strongly pronounced m ultimo dalit y . F or eac h setup, w e generated 100 datasets of size n = 500 . The neighborho od parameter, expressed as the prop ortion β of nearest units within eac h class, tak es v alues β ∈ { 0 . 05 , 0 . 10 , 0 . 25 } , while the data dimensionality v aries across d ∈ { 3 , 10 , 25 } . In accordance with Guyon ( 1997 ), 70% of the observ ations w ere allo cated to the training set and the remaining 30% to the test set. In this study we fo cus on the binary classiﬁcation setting with the implementation of DD- classiﬁer. Let W 1 i , i = 1 , . . . , n 1 , and W 2 i , i = 1 , . . . , n 2 , denote indep enden t random samples dra wn from the distributions F 1 and F 2 on S q − 1 , resp ectively , where n 1 and n 2 are the cardinalit y of the tw o classes. The prop ortion b et ween the tw o classes w as randomly c hosen to lie betw een 35% and 50%. The ev aluation metric considered for each sim ulated dataset is the misclassiﬁcation rate (MR), as deﬁned in eq. 1 . The sim ulation was fully implemented in R , using the ddalpha pac k age ( P okot ylo et al. , 2019 ), whic h allows the customization of the DD-classiﬁer training procedure to incorp orate the prop osed depth function and to automatically select the p olynomial degree p through a cross-v alidation sc heme p erformed in the depth space. 6.1.1 Scenario 1 The data of the ﬁrst simulated scenario w ere generated according to the v on Mises–Fisher (vMF) distribution, which plays for data on the unit hypersphere S q − 1 the same role as the normal distribution do es for unconstrained Euclidean data. A ( q − 1) -dimensional unit random vector x ∈ S q − 1 is said to follo w a vMF distribution if its probabilit y densit y function is f q ( x | µ, κ ) = C q ( κ ) exp( κµ ′ x ) , where ∥ µ ∥ = 1 , κ ≥ 0 , and q ≥ 2 . The normalizing constant is giv en by C q ( κ ) = κ q / 2 − 1 (2 π ) q / 2 I q / 2 − 1 ( κ ) , where I b denotes the modiﬁed Bessel function of the ﬁrst kind and order b . The vMF distribution is characterized b y the mean direction µ and the concen tration parameter κ , which con trols the degree of dispersion of the observ ations around the mean v ector. In the limiting cases, when κ = 0 , the distribution reduces to the uniform density on S q − 1 , whereas as κ → ∞ it degenerates in to a p oin t mass at µ . F or the sim ulation study , three experimental setups were considered. F or eac h setup, the concen tration parameter κ w as randomly dra wn to induce lo w, medium, and high noise lev els in the data. Sp eciﬁcally , κ low ∼ U [15 , 17] , κ medium ∼ U [10 , 12] , and κ high ∼ U [5 , 7] . Eac h class w as generated as a mixture of vMF distributions, each one indicated as F v M F ( µ w j ,κ ) . The c hoice of eac h cen ter µ w j , where j denotes the component within class w , w as made randomly but constrained to lie at speciﬁc cosine distances d cos ( · , · ) from the other centers. Starting from the same initial cen ter µ 1 1 = ( η 1 , . . . , η q − 1 , η q ) , where η 1 = 1 and η t = 0 for all t = 2 , . . . , q , the remaining centers w ere generated according to the following setups: 16 • Setup 1: The second class cen ter µ 2 1 is randomly generated and constrained to satisfy d cos ( µ 1 1 , µ 2 1 ) ∈ [0 . 3 , 0 . 5] . Th us, points for each class are drawn from vMF distributions with diﬀeren t mean directions but equal concen tration parameter κ : F 1 = F vMF ( µ 1 1 ,κ ) and F 2 = F vMF ( µ 2 1 ,κ ) . • Setup 2: The second component of the ﬁrst class, µ 1 2 , is randomly generated under the constrain t d cos ( µ 1 1 , µ 1 2 ) ∈ [0 . 6 , 0 . 8] . Then, µ 2 1 is generated so that d cos ( µ 1 1 , µ 2 1 ) = d cos ( µ 1 2 , µ 2 1 ) ∈ [0 . 25 , 0 . 45] . Finally , µ 2 2 is generated under the constrain ts d cos ( µ 2 2 , µ 2 1 ) ∈ [ d cos ( µ 1 1 , µ 1 2 ) − ϵ, d cos ( µ 1 1 , µ 1 2 ) + ϵ ] and d cos ( µ 1 2 , µ 2 2 ) ∈ [ d cos ( µ 1 1 , µ 2 1 ) − ϵ, d cos ( µ 1 1 , µ 2 1 ) + ϵ ] , where ϵ = 0 . 1 . In this case, eac h class is modeled as an equally weigh ted mixture of t wo vMF distributions with diﬀeren t mean directions and the same concen tration parameter κ : F 1 = 1 2 F vMF ( µ 1 1 ,κ ) + 1 2 F vMF ( µ 1 2 ,κ ) and F 2 = 1 2 F vMF ( µ 2 1 ,κ ) + 1 2 F vMF ( µ 2 2 ,κ ) . • Setup 3: The second comp onent of the ﬁrst class, µ 1 2 , is randomly generated suc h that d cos ( µ 1 1 , µ 1 2 ) ∈ [0 . 4 , 0 . 6] . Then, µ 2 1 is generated under the constrain ts d cos ( µ 1 1 , µ 2 1 ) ∈ [0 . 4 , 0 . 6] and d cos ( µ 1 2 , µ 2 1 ) ∈ [0 . 8 , 1] . Finally , µ 2 2 is generated so that d cos ( µ 1 1 , µ 2 2 ) ∈ [0 . 4 , 2] , d cos ( µ 1 2 , µ 2 2 ) ∈ [0 . 4 , 2] , and d cos ( µ 2 1 , µ 2 2 ) ∈ [0 . 8 , 2] . Similarly to Setup 2, eac h class is represented b y an equally w eighted mixture of tw o vMF components: F 1 = 1 2 F vMF ( µ 1 1 ,κ ) + 1 2 F vMF ( µ 1 2 ,κ ) and F 2 = 1 2 F vMF ( µ 2 1 ,κ ) + 1 2 F vMF ( µ 2 2 ,κ ) . The outcomes of this scenario are displa yed in Figure 2 , whic h rep orts the distributions of the misclassiﬁcation rates across setups, dimensions, and noise levels. A detailed in terpretation of these ﬁndings is provided in 6.2 . 6.1.2 Scenario 2 The W atson distribution was selected as the generating model for the second scenario of the sim ulation study . Although originally deﬁned for axial data, the W atson distribution can eﬀectively pro duce non-conv ex and bip olar structures, making it suitable for assessing the b eha vior of local depth functions in complex directional settings. A random unit vector x ∈ S q − 1 follo ws a W atson distribution if its probabilit y density function is given b y f ( x | µ, κ ) = M  1 2 , q 2 , κ  − 1 exp { κ ( µ ′ x ) 2 } , where M (1 / 2 , q / 2 , · ) denotes the Kummer’s function. The parameter µ represents the mean axis, while κ determines the axial concen tration of the data. F or κ > 0 , the distribution is bip olar, and as κ increases, it b ecomes increasingly concen trated around ± µ . F or κ < 0 , it b ecomes a symmetric girdle distribution, with data concentrated in the subspace orthogonal to µ , and the degree of disp ersion is gov erned by the magnitude of κ . As in the ﬁrst scenario three setups w ere built using the W atson distribution, indicated with F W at ( µ,κ ) , and random v alues of κ w ere used to pro duce low, medium, and high levels of noise in the data. Sp eciﬁcally , κ low ∼ U [15 , 17] , κ medium ∼ U [10 , 12] , and κ high ∼ U [5 , 7] . In each setup, the mean axes of the t wo classes are randomly generated and constrained to hav e a cosine distance b et ween 0 . 5 and 0 . 7 , starting from an initial cen ter deﬁned as µ 1 = ( η 1 , . . . , η q − 1 , η q ) , where η 1 = 1 and η t = 0 for all t = 2 , . . . , q . T o ev aluate diﬀerent data conﬁgu- rations, the sign of the concentration parameter κ v aries across the setups: • Setup 1: The parameter κ is p ositive for both classes, resulting in t w o bip olar distributions: F 1 = F W at ( µ 1 ,κ ) and F 2 = F W at ( µ 2 ,κ ) . • Setup 2: The parameter κ is negative for b oth classes, producing tw o girdle-shaped distri- butions: F 1 = F W at ( µ 1 , − κ ) and F 2 = F W at ( µ 2 , − κ ) . • Setup 3: The parameter κ is p ositiv e for one class and negativ e for the other, leading to t wo response populations with diﬀerent shapes: F 1 = F W at ( µ 1 ,κ ) and F 2 = F W at ( µ 2 , − κ ) . The results of this scenario are rep orted in Figure 3 . The distributions of the misclassiﬁcation rates are displa y ed, further conditioned by the setup, data dimension, and noise lev el. A detailed discussion of these results is provided in 6.2 . 17 Figure 2: Simulation results for Scenario 1. The ro ws of the table indicate the sp eciﬁc Setup, the columns the num b er of dimensions, and the three diﬀerent colors indicate noise levels: Low (L)–green, Medium (M)–blue, High (H)–red. 6.2 Results Considering the results shown in Figure 2 and Figure 3 for the ﬁrst and second scenario, respec- tiv ely , the prediction error consistently increases with b oth the data dimension and the noise lev el, regardless of the sp eciﬁc neighborho od size (including the CDD). In b oth scenarios, there is ef- fectiv ely no diﬀerence b et ween the lo cal and global approac hes when the dimension is 25 and the noise lev el is high, as their classiﬁcation errors b ecome almost indistinguishable. Moreov er, within the lo cal framew ork, the three neighborho od proportions considered ( 5% , 10% , and 25% ) yield v ery similar performance. F o cusing on the ﬁrst scenario, which relies on the vMF distribution, in the ﬁrst setup, that do es not inv olve multimodality or non-con vexit y , all metho ds exhibit nearly identical performances. Nev ertheless, when the dispersion b ecomes v ery high, the t wo classes tend to o verlap, making it sligh tly more adv antageous to use larger neigh b orho od proportions for the depth computation, although the improv emen t is mo dest. As the structure b ecomes more complex, as in the second setup, the CDD b egins to display a higher prediction error compared to the LCDD, indicating that a local p erspective is preferable in this setting, particularly under low noise. F or higher noise lev els and 10 dimensions or more, ho wev er, the p erformance of all metho ds becomes essentially indistinguishable. In the third setup of the same scenario, except for the previously discussed cases regarding the general behaviour in high dimensions and high noise, the global approach consisten tly underp er- forms the LCDD. In particular, for 3 and 10 dimensions under lo w noise, the lo cal depth ac hieves a misclassiﬁcation error b elo w 1% , demonstrating excellent performance ev en in the presence of a highly structured and challenging data conﬁguration. The classiﬁcation p erformance of the CDD was previously in vestigated by P andolfo and D’Ambrosio ( 2021 ), who compared it with other depth measures and sho wed that it p erforms v ery w ell in sev- eral settings in volving the vMF distribution. How ever, in the context of the second scenario, which is based on the W atson distribution instead, the results are straigh tforward to interpret: the CDD consisten tly underp erforms compared to the LCDD under all imp osed conditions. This indicates that the global CDD lac ks the ﬂexibilit y required to adapt to scenarios in whic h classes are not 18 Figure 3: Simulation results for Scenario 2. The ro ws of the table indicate the sp eciﬁc Setup, the columns the num b er of dimensions, and the three diﬀerent colors indicate noise levels: Low (L)–green, Medium (M)–blue, High (H)–red. w ell separated, as is the case in this sim ulation design. T o further enric h the comparison b et ween the CDD and the LCDD, the next section presen ts an application to real datasets. This additional analysis allo ws us to assess the practical performance of the lo cal depth approac h in real-world situations and highligh ts its p oten tial as an eﬀectiv e and ﬂexible strategy for directional-data classiﬁcation through depth-based metho ds. 7 Real data examples In this section we compare the p erformance of the proposed classiﬁer in comparison with its global v ersion b y means of tw o real-world datasets. W e ran a 10-fold cross-v alidation rep eated on 10 diﬀeren t training sets ( P aindav eine and V an Bev er , 2013 ) in order to select the best possible v alue of β in the set { 0 . 01 , 0 . 05 , 0 . 1 , 0 . 25 , 0 . 5 , 1 } according to the low est a verage misclassiﬁcation rate (MR). 7.1 Wholesale customers The ﬁrst real data refers to clients of a wholesale distributor. When dealing with marketing applications, the target groups can themselves b e comp osed of diﬀeren t segments of the p opulation, whic h usually presen t diﬀeren t sp ending habits. Thus, it could be more eﬃcien t to in tro duce more ﬂexibilit y when classifying these units through the use of depth functions, allo wing for a more lo cal fo cus. There are a total of 440 observ ations and 7 v ariables. The ﬁrst tw o are categorical v ariables. Of these tw o, we are interested in the v ariable Channel, which can b e either Horeca (Hotel/Restauran t/Cafè) or Retail c hannel (Nominal), and will deﬁne our t w o classes for this problem. The last 5 v ariables ha ve information about the ann ual sp ending in monetary units on div erse pro duct categories: (1) fresh pro ducts, (2) milk products and (3), (4) frozen products, (5) detergen ts and paper pro ducts and (6) delicatessen pro ducts. W e treat this data as comp ositional data, exploiting the square-ro ot transformation so that the points lie on a unit h yp ersphere. As apparen t from Fig. 4 , here the repeated cross-v alidation will select β = 0 . 05 , with an a verage MR of 0 . 15 , and shows an increasing trend in the av erage missclassiﬁcation rate starting from β = 0 . 05 19 to β = 1 . In this example, the lo cal approac h has brought an av erage improv ement o ver the global approac h of about 4 . 5 p ercen tage p oin ts. Figure 4: Repeated 10-fold cross-v alidation results for the Wholesales dataset. On the x–axis there are the β s, on the y–axis the cross–v alidated MR, and the dotted red line highlights the v alue of β achieving the minim um MR. 7.2 SP AM database The second dataset w e propose classiﬁes 4601 emails as spam or non-spam. What in terested us ab out this textual dataset w as b oth the relev ance of the application, since spam detection is an imp ortan t issue, as spam e-mails can range from simply annoying to actually dangerous, and the complexit y of the data, which is of high dimensionalit y . In fact, there are a total of 57 v ariables, of whic h we select the last one, which con tains information ab out the classiﬁcation, and the ﬁrst 48, whic h instead con tain information on the p ercen tage of words in the e-mail that matc h a certain w ord. Since the p ercen tages of the 48 w ords c hosen to classify the email do not sum up to 1, w e added one last v ariable, which is the complement of 1 of the sum of these percentages. This is done to correctly visualize the p oint on a hypersphere through the square ro ot transformation, since normalizing the data without it would giv e a very diﬀeren t interpretation to the results. T reating this as compositional data allows us to ov ercome the biases driven by the diﬀerent lengths of the emails, which is the usual strategy in text mining applications ( Dhillon and Mo dha , 2004 ). Again, Fig. 5 paints a clear picture of the c hoice of β . Quite interestingly , something that did not occur during our simulations, the b est p erforming β is 0.01, with an av erage MR of 0 . 12 , p ossibly due to the presence of a higher num ber of total observ ations. The ﬁgure also sho ws a trend that increases steadily up to β = 0 . 5 , and an a verage diﬀerence b et ween the chosen lo cal and the global depth of 8 p ercen tage p oints. 8 Conclusion This pap er introduces the lo cal cosine distance depth (LCDD) for directional data on the hy- p ersphere. LCDD extends the cosine distance depth to capture lo cal cen trality in multimodal distributions using a neighbourho od approach. When applied to the Depth vs. Depth (DD) classiﬁer, LCDD enables b etter separation, ev en for non-conv ex class structures. The prop osed LCDD-based classiﬁer is compared with its global counterpart in an extensiv e simulation study . The results demonstrate the eﬀectiveness of the local approach, regardless of the c hosen β lev el, except in cases of high noise and dimensions, where the global and local depth functions pro duce similar results. These results are further conﬁrmed b y t wo real-data examples. F uture researc h will fo cus on extending the approac h to multiclass settings and diﬀerent manifolds. 20 Figure 5: Repeated 10-fold cross-v alidation results for the Spam dataset. On the x–axis there are the β s, on the y–axis the cross–v alidated MR, and the dotted red line highlights the v alue of β ac hieving the minim um MR. References Agostinelli, C. and Romanazzi, M. (2011). Lo cal depth. Journal of Statistic al Planning and Infer enc e , 141(2):817–830. Agostinelli, C. and Romanazzi, M. (2012). Depth analysis of directional data. In Pr o c e e dings of the 46th Scientiﬁc Me eting of the Italian Statistic al So ciety (SIS 2012) , Rome, Italy . Italian Statistical So ciet y (SIS). Agostinelli, C. and Romanazzi, M. (2013). Nonparametric analysis of directional data based on data depth. Envir onmental and Ec olo gic al Statistics , 20(2):253–270. Cuesta-Alb ertos, J. A., F ebrero-Bande, M., and Oviedo de la F uente, M. (2017). The dd g-classiﬁer in the functional setting. T est , 26(1):119–142. Demni, H., Messaoud, A., and P orzio, G. C. (2019). The cosine depth distribution classiﬁer for directional data. In Applic ations in Statistic al Computing: F r om Music Data A nalysis to Industrial Quality Impr ovement , pages 49–60. Springer. Dey , S. and Jana, N. (2025). Classiﬁcation rules for axial data: Parametric and nonparametric approac hes. Journal of Classiﬁc ation , pages 1–31. Dhillon, I. S. and Modha, D. S. (2004). Concept decompositions for large sparse text data using clustering. Machine L e arning , 42:143–175. Guy on, I. M. (1997). A scaling law for the v alidation-set training-set size ratio. Hlubink a, D. and V encalek, O. (2013). Depth-based classiﬁcation for distributions with nonconv ex supp ort. Journal of Pr ob ability and Statistics , 2013(1):629184. Ley , C., Sabbah, C., and V erdeb out, T. (2014). A new concept of quantiles for directional data and the angular mahalanobis depth. Ley , C. and V erdeb out, T. (2017). Mo dern Dir e ctional Statistics . Chapman & Hall/CR C In terdis- ciplinary Statistics. CRC Press, Boca Raton, FL. Li, J., Cuesta-Alb ertos, J. A., and Liu, R. Y. (2012). Dd-classiﬁer: Nonparametric classiﬁcation pro cedure based on dd-plot. Journal of the Americ an Statistic al Asso ciation , 107(498):737–753. Liu, R. Y., P arelius, J. M., and Singh, K. (1999). Multiv ariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder b y Liu and Singh). The A nnals of Statistics , 27(3):783 – 858. 21 Liu, R. Y. and Singh, K. (1992). Ordering directional data: Concepts of data depth on circles and spheres. The A nnals of Statistics , 20(3):1468–1484. Mardia, K. V. and Jupp, P . E. (1999). Dir e ctional Statistics . Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Chichester. P aindav eine, D. and V an Bever, G. (2013). F rom depth to lo cal depth: a fo cus on centralit y . Journal of the A meric an Statistic al Asso ciation , 108(503):1105–1119. P andolfo, G. (2022). The gld-plot: a depth-based graphical tool to inv estigate unimodality of directional data. Journal of Statistic al Computation and Simulation , 92(11):2372–2385. P andolfo, G. and D’Am brosio, A. (2021). Depth-based classiﬁcation of directional data. Exp ert Systems with Applic ations , 169:114433. P andolfo, G., Painda v eine, D., and P orzio, G. C. (2018). Distance-based depths for directional data. Canadian Journal of Statistics , 46(4):593–609. P okot ylo, O., Mozharovskyi, P ., and Dyc kerhoﬀ, R. (2019). Depth and depth-based classiﬁcation with r pack age ddalpha. Journal of Statistic al Softwar e , 91(5):1–46. Small, C. G. (1987). Measures of cen trality for m ultiv ariate and directional distributions. The Canadian Journal of Statistics / L a R evue Canadienne de Statistique , 15(1):31–39. Stephens, M. A. (1982). Use of the von mises distribution to analyse con tin uous prop ortions. Biometrika , 69(1):197–203. T ukey , J. W. (1975). Mathematics and the picturing of data. In Pr o c e e dings of the international c ongr ess of mathematicians , v olume 2, pages 523–531. V ancouver. 22

Local depth-based classification of directional data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment