First versus full or first versus last: U-statistic change-point tests under fixed and local alternatives

Reading time: 6 minute
...

📝 Original Info

  • Title: First versus full or first versus last: U-statistic change-point tests under fixed and local alternatives
  • ArXiv ID: 2602.16789
  • Date: 2026-02-18
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (예시: Dehling, Bücher, Kojadinovic 등 관련 연구자들이 언급되었으나 정확한 저자 명단은 원문을 확인 필요) **

📝 Abstract

The use of U-statistics in the change-point context has received considerable attention in the literature. We compare two approaches of constructing CUSUM-type change-point tests, which we call the first-vs-full and first-vs-last approach. Both have been pursued by different authors. The question naturally arises if the two tests substantially differ and, if so, which of them is better in which data situation. In large samples, both tests are similar: they are asymptotically equivalent under the null hypothesis and under sequences of local alternatives. In small samples, there may be quite noticeable differences, which is in line with a different asymptotic behavior under fixed alternatives. We derive a simple criterion for deciding which test is more powerful. We examine the examples Gini's mean difference, the sample variance, and Kendall's tau in detail. Particularly, when testing for changes in scale by Gini's mean difference, we show that the first-vs-full approach has a higher power if and only if the scale changes from a smaller to a larger value -- regardless of the population distribution or the location of the change. The asymptotic derivations are under weak dependence. The results are illustrated by numerical simulations and data examples.

💡 Deep Analysis

📄 Full Content

The classical approach for testing the constancy of the mean of a time series X 1 , . . . , X n is the CUSUM test based on the test statistic (1.1)

where Xi:j denotes the sample mean computed from X i , . . . , X j for any two integers i, j such that 1 ≤ i < j ≤ n. The test statistic may equally be written as (1.2) T n = max 1≤k≤n k(n -k) n 3/2 X1:k -X(k+1):n .

These two representations (1.1) and (1.2) suggest two different views of the CUSUM test statistic: one may either view it, for each k, as a comparison of the sample mean of the first part of the sample X 1 , . . . , X k to the sample mean of the whole sample or as a comparison of the mean of the first part to the mean of the remaining part X k+1 , . . . , X n . When the CUSUM approach is extended to a general setting, i.e., when we want to test the constancy of some parameter θ ∈ R of the marginal distribution of the observed process, for which an estimator θn is available, we may consider either of the test statistics T 1 ( θn ) = max 1≤k≤n kn -1/2 | θ1:k -θ1:n | and T 2 ( θn ) = max 1≤k≤n k(n -k)n -3/2 | θ1:kθ(k+1):n | with θi:j being defined analogously to Xi:j . If θn is a linear statistic, i.e., if

for some function ξ, both test statistics are identical, but in general they are not. Both routes to generalizing the CUSUM test statistic, first-vs-full and first-vs-last, have been taken by different authors. In the present paper, we analyze the difference between T 1 ( θn ) and T 2 ( θn ) in the case θn is a U-statistic. Let (X i ) i∈Z be a p-dimensional, stationary stochastic process. A one-sample U-statistic or order 2 is defined as

where h : R p × R p → R is a kernel function satisfying h(x, y) = h(y, x). In the case of i.i.d. data, U n is an unbiased estimator of the parameter θ = E(h(X, Y )), where X, Y are two independent random variables with the same marginal distribution as X 1 . If the underlying data are weakly dependent, U n is still a consistent estimator of θ as long as the conditions of the U-statistics ergodic theorem are satisfied, see, e.g., Aaronson et al. (1996). Thus U n can be used to test for stationarity of a process against the alternative of a change in the parameter θ. Thus change-point tests based on U-statistics have been studied by many others, e.g., Sen (1983), Hawkins (1989), Gombay and Horváth (1995), Gombay (2001), Gombay and Horváth (2002), Chen and Qin (2010), and Matteson and James (2014). Tests for weakly dependent series have been considered by Bücher and Kojadinovic (2016) and Dehling et al. (2017). Recently, Liu et al. (2020), Wang et al. (2022), Boniece et al. (2024), and Zhao et al. (2024) consider the U-statistic-based changepoint tests in the high-dimensional settings. Other lines of research are devoted to sequential (or online) change-point tests (e.g. Gombay, 2000;Kirch and Stoehr, 2022) and the use of anti-symmetric (or two-sample) U-statistics (e.g. Yu and Chen, 2022;Dehling et al., 2022;Wegner and Wendler, 2024). However, the latter is not pursued here. We restrict our attention to symmetric kernels in the offline (or retrospective) setting.

We define the first-vs-full and the first-vs-last U-statistic change-point tests based on the kernel h as

respectively, where

for any 1 ≤ k < l ≤ n. The former test statistic has been considered, e.g., by Dehling et al. (2017) and the latter by Bücher and Kojadinovic (2016).

A prominent example for a U-statistic is Gini’s mean difference with the kernel h(x, y) = |x -y|. The corresponding first-vs-full CUSUM test has been considered by Gerstenberger et al. (2020) and was found to be a quite competitive change-point test for scale. Another popular U-statistic is Kendall’s tau. Its use in the change-point context has been studied in the first-vs-last version by Quessy et al. (2013) and in the first-vs-full version by Dehling et al. (2017). In fact, a discussion emerging during the review process of the latter paper has, to a large degree, prompted the research of the present paper. It has been argued that the first-vs-last version is presumably uniformly more powerful against one-sudden-change alternatives. However, this is not the case.

To illustrate the situation and to motivate all following derivations, we shall quote simulated powers for Gini’s mean difference (GMD) in a simple example setting. Consider a series X 1 , . . . , X n of univariate, independent and centered normal variables. At the beginning of the sequence, the observations have standard deviation 1 and at time [n/3], the standard deviation changes to σ > 0. We apply the studentized versions of the first-vsfull and the first-vs-last GMD test at the 5% significance level using the asymptotic null distribution and the variance estimator

where g n is the sample GMD of X 1 , . . . , X n . We observe the following: (S1) For sample size n = 4000 and σ = 1.08, the first-vs-full test has a power of 0.79 and the first-vs-last test has a power of 0.79. Their relative difference in power is less

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut