The notion of data depth has long been in use to obtain robust location and scale estimates in a multivariate setting. The depth of an observation is a measure of its centrality, with respect to a data set or a distribution. The data depths of a set of multivariate observations translates to a center-outward ordering of the data. Thus, data depth provides a generalization of the median to a multivariate setting (the deepest observation), and can also be used to screen for extreme observations or outliers (the observations with low data depth). Data depth has been used in the development of a wide range of robust and non-parametric methods for multivariate data, such as non-parametric tests of location and scale [Li and Liu (2004)], multivariate rank-tests [Liu and Singh (1993)], non-parametric classification and clustering [Jornsten (2004)], and robust regression [Rousseeuw and Hubert (1999)]. Many different notions of data depth have been developed for multivariate data. In contrast, data depth measures for functional data have only recently been proposed [Fraiman and Muniz (1999), L\'{o}pez-Pintado and Romo (2006a)]. While the definitions of both of these data depth measures are motivated by the functional aspect of the data, the measures themselves are in fact invariant with respect to permutations of the domain (i.e. the compact interval on which the functions are defined). Thus, these measures are equally applicable to multivariate data where there is no explicit ordering of the data dimensions. In this paper we explore some extensions of functional data depths, so as to take the ordering of the data dimensions into account.
Deep Dive into Functional analysis via extensions of the band depth.
The notion of data depth has long been in use to obtain robust location and scale estimates in a multivariate setting. The depth of an observation is a measure of its centrality, with respect to a data set or a distribution. The data depths of a set of multivariate observations translates to a center-outward ordering of the data. Thus, data depth provides a generalization of the median to a multivariate setting (the deepest observation), and can also be used to screen for extreme observations or outliers (the observations with low data depth). Data depth has been used in the development of a wide range of robust and non-parametric methods for multivariate data, such as non-parametric tests of location and scale [Li and Liu (2004)], multivariate rank-tests [Liu and Singh (1993)], non-parametric classification and clustering [Jornsten (2004)], and robust regression [Rousseeuw and Hubert (1999)]. Many different notions of data depth have been developed for multivariate data. In contrast,
In functional data analysis, each observation is a real function x i , i = 1, . . . , n, defined on a common interval in R. Functional data is observed in many disciplines, such as medicine (e.g. EEG traces), biology (e.g. gene expression time course data), economics and engineering (e.g., financial trends, chemical processes). Many multivariate methods (e.g. analysis of variance, and classification) have been extended to functional data (see Ramsay and Silverman [22]). A basic building block of such statistical analyses is a location estimate, i.e. the mean curve for a group of data objects, or data objects within a class. When analyzing functional data, outliers can affect the location estimates in many different ways, e.g. altering the shape and/or magnitude of the mean curve. Since measurements are frequently noisy, statistical analysis may thus be much improved by the use of robust location estimates, such as 104 S. López-Pintado and R. Jornsten the median or trimmed mean curve. Data depth provides the tools for constructing these robust estimates.
We first review the concept of data depth in the multivariate setting, where data depth was introduced to generalize order statistics, e.g. the median, to higher dimensions. Given a distribution function F in R d , a statistical depth assigns to each point x a real, non-negative bounded value D(x|F ), which measures the centrality of x with respect to the distribution F . Given a sample of n observations X = {x 1 , . . . , x n }, we denote the sample version by D(x|F n ) or D(x|X). D(x|X) is a measure of the centrality of a point x with respect to the sample X (or the empirical distribution function F n ). The point x can be a sample observation, or constitute independent “test data”. For x = x i ∈ X, D(x i |F n ) provides a center-outward ordering of the sample observations x 1 , . . . , x n .
Many depth definitions have been proposed for multivariate data (e.g. Mahalanobis [19], Tukey [26], Oja [20], Liu [12], Singh [25], Fraiman and Meloche [3], Vardi and Zhang [27] and Zuo [31]). To illustrate the data depth principle and the variety of depth measures, we will briefly review two very different notions of depth: the simplicial depth of Liu [12], and the L 1 depth of Vardi and Zhang [27] (a detailed discussion of the different types of data depths can be found in Liu, Parelius and Singh [14] and Zuo and Serfling [32]). To compute the simplicial depth of a point x ∈ R d with respect to the sample X = {x 1 , . . . , x n }, we start by partitioning the sample into a set of n d+1 unique (d+1)-simplices. Consider the two-dimensional case illustrated in Figure 1a. We depict a subset of the n 3 3-simplices (triangles) in R 2 defined by a set of objects (x 1 , x 2 , x 3 ) ∈ X. A point x is considered deep within the sample X if many simplices contain it, and vice versa. Formally, the simplicial depth of a point x is defined as
I{x ⊂ simplex(x i1 , . . . , x i d+1 )}, where I{A} is an indicator of the event A, equal to 1 if A is true and 0 otherwise. We can see from Figure 1a that the point marked with a triangle is covered by many simplices, resulting in a high depth measure, whereas the point marked with a plus attains the minimum depth measure (i.e. n 3 -1 if the point is a sample observation, and 0 otherwise). To calculate the L 1 data depth of a point x with respect to the sample X, we start by forming the unit vectors e(x, x i ) that point from x to x i ∈ X (Figure 1b). The L 1 depth of x is defined as
I.e., ē(x) is the sample average of the unit vectors e(x, x i ). If x is on the periphery of the sample, all e(x, x i ) unit vectors point in an almost identical direction such that ē(x) ≃ 1, and LD(x|X) ≃ 1 -1 = 0 (point marked with a plus in Figure 1b). If x is in the center of the sample, the unit vectors e(x, x i ) will point in many different directions and almost cancel out in the computation of ē(x), resulting in a high depth measure LD(x|X) ≃ 1 -0 = 1 (point marked with a triangle in Figure 1b). Focusing on the case when x is a sample observation, x ∈X, we see from the above examples that data depth can be used to rank-order the data set X, from the deepest to the least deep. We classify x with high D(x|X) as the most representative of the sample X, and x with low D(x|X) as the most extreme observations, that may be considered outliers. The deepest observation is a generalization of the median to a multivariate setting, and the center-outward ordering can be used to construct trimmed mean estimates. Robust multivariate estimates based on data depth have been used in a wide range of non-parametric analyses, such as non-parametric testing of location and/or scale (Li and Liu [11]), multivariate rank-test (Liu and Singh [15]), non-parametric classification and clustering (Jornsten [10]), and robust regression (Rousseeuw and Hubert [23]).
In this paper we discuss data depth measures for functional data. We review the band depths of López-Pintado and Ro
…(Full text truncated)…
This content is AI-processed based on ArXiv data.