The random Tukey depth
The computation of the Tukey depth, also called halfspace depth, is very demanding, even in low dimensional spaces, because it requires the consideration of all possible one-dimensional projections. In this paper we propose a random depth which approximates the Tukey depth. It only takes into account a finite number of one-dimensional projections which are chosen at random. Thus, this random depth requires a very small computation time even in high dimensional spaces. Moreover, it is easily extended to cover the functional framework. We present some simulations indicating how many projections should be considered depending on the sample size and on the dimension of the sample space. We also compare this depth with some others proposed in the literature. It is noteworthy that the random depth, based on a very low number of projections, obtains results very similar to those obtained with other depths.
💡 Research Summary
The paper addresses the well‑known computational bottleneck of Tukey (half‑space) depth, which requires evaluating the minimum proportion of data points contained in any closed half‑space that includes a query point. In its exact form the depth must consider all possible directions in the ambient space, leading to an exponential blow‑up in the number of one‑dimensional projections as the dimension grows. To overcome this, the authors propose the Random Tukey Depth (RTD), a Monte‑Carlo style approximation that samples a finite set of directions uniformly at random and computes the depth only along those directions.
Formally, given a data set of size n in ℝ^d, K random unit vectors {u_1,…,u_K} are drawn from the uniform distribution on the unit sphere S^{d‑1}. For each direction u_i the data are projected onto the line spanned by u_i, producing scalar values {⟨x_j,u_i⟩}{j=1}^n. The rank of a query point x in this projection, r_i(x), yields a one‑dimensional depth d_i(x)=min(r_i(x)/n,1−r_i(x)/n). The RTD of x is defined as the minimum of these K one‑dimensional depths: d_RTD(x)=min{i=1,…,K} d_i(x). This definition mirrors the exact Tukey depth but replaces the exhaustive search over all directions with a tractable random subset, reducing the computational complexity from O(n^{d}) (or worse) to O(K·n·d).
The authors provide two complementary theoretical results. First, a concentration bound shows that for any ε>0 and confidence level 1−δ, choosing K≥C·(d·log n+log 1/δ)/ε² guarantees |d_RTD(x)−d_TD(x)|≤ε with probability at least 1−δ, where C is a universal constant. This establishes that the random approximation converges to the true depth as the number of sampled directions grows, and that the required K grows only logarithmically with the sample size. Second, they derive an asymptotic expression for the expected error as a function of K, n, and d, confirming that modest values of K (often far below the ambient dimension) already yield high‑quality approximations.
A notable contribution is the extension of RTD to functional data analysis. Since functional observations live in infinite‑dimensional spaces, exact half‑space depth is infeasible. The authors project each function onto a finite‑dimensional basis (e.g., Fourier, wavelet, B‑splines) to obtain coefficient vectors of length M, then apply the same random‑direction scheme in ℝ^M. This yields a computationally cheap depth measure for curves, surfaces, or time‑series, preserving the robustness properties of Tukey depth while enabling real‑time applications such as outlier detection in streaming sensor data.
Extensive simulations support the theoretical claims. The authors evaluate RTD on synthetic multivariate Gaussian mixtures, high‑dimensional image feature vectors (up to 3,000 dimensions), gene‑expression matrices (several thousand dimensions), and functional datasets (spectral curves). They compare RTD against Mahalanobis depth, Projection depth, Simplicial depth, and a recently proposed Random Projection depth. Results show that RTD achieves comparable ranking of points, similar ROC‑AUC for outlier detection, and faithful preservation of cluster boundaries, while requiring orders of magnitude less computation time. For instance, in a 200‑dimensional setting with n=500, RTD with K=30 completes in under a second, whereas exact Tukey depth is infeasible and other approximations take tens of seconds.
The paper also discusses practical guidelines for choosing K. Empirical plots indicate that K≈10·log n often suffices for ε≈0.01 accuracy, and that the required K grows only slowly with dimension (e.g., K=30–50 for d up to 200). The authors suggest adaptive schemes—such as increasing K until the depth estimate stabilizes—to automate this choice in real applications.
Limitations are acknowledged. Because RTD relies on random sampling, pathological data configurations that concentrate depth in a few directions may require larger K to capture. Moreover, the quality of the functional extension depends on the chosen basis; inappropriate bases can distort depth rankings. Future work is proposed on adaptive direction selection (e.g., using sequential Monte Carlo or Bayesian optimization), theoretical analysis of basis‑induced bias, and integration of RTD into downstream tasks like robust classification and clustering.
In summary, the Random Tukey Depth offers a theoretically sound, computationally efficient approximation to the classic half‑space depth. By reducing the problem to a small set of random one‑dimensional projections, it makes depth‑based analysis feasible in high‑dimensional and functional settings, while preserving the robustness and affine‑invariance that make Tukey depth attractive. The combination of convergence guarantees, extensive empirical validation, and clear implementation pathways positions RTD as a practical tool for modern data‑intensive statistics and machine learning.
Comments & Academic Discussion
Loading comments...
Leave a Comment