Effect of extreme data loss on long-range correlated and anti-correlated signals quantified by detrended fluctuation analysis

We investigate how extreme loss of data affects the scaling behavior of long-range power-law correlated and anti-correlated signals applying the DFA method. We introduce a segmentation approach to generate surrogate signals by randomly removing data segments from stationary signals with different types of correlations. These surrogate signals are characterized by: (i) the DFA scaling exponent $\alpha$ of the original correlated signal, (ii) the percentage $p$ of the data removed, (iii) the average length $\mu$ of the removed (or remaining) data segments, and (iv) the functional form of the distribution of the length of the removed (or remaining) data segments. We find that the {\it global} scaling exponent of positively correlated signals remains practically unchanged even for extreme data loss of up to 90%. In contrast, the global scaling of anti-correlated signals changes to uncorrelated behavior even when a very small fraction of the data is lost. These observations are confirmed on the examples of human gait and commodity price fluctuations. We systematically study the {\it local} scaling behavior of signals with missing data to reveal deviations across scales. We find that for anti-correlated signals even 10% of data loss leads to deviations in the local scaling at large scales from the original anti-correlated towards uncorrelated behavior. In contrast, positively correlated signals show no observable changes in the local scaling for up to 65% of data loss, while for larger percentage, the local scaling shows overestimated regions (with higher local exponent) at small scales, followed by underestimated regions (with lower local exponent) at large scales. Finally, we investigate how the scaling is affected by the statistics of the remaining data segments in comparison to the removed segments.

💡 Research Summary

The paper investigates how extreme data loss influences the scaling behavior of long‑range correlated (LRC) and anti‑correlated (AC) time series when analyzed with Detrended Fluctuation Analysis (DFA). The authors devise a segmentation procedure that creates surrogate signals by randomly removing contiguous data segments from stationary synthetic series with prescribed DFA exponents (α). Four parameters characterize each surrogate: (i) the original scaling exponent α of the intact signal, (ii) the percentage p of data removed, (iii) the average length μ of the removed (or remaining) segments, and (iv) the functional form of the segment‑length distribution (exponential, power‑law, or uniform). By systematically varying these parameters, the study examines both global and local scaling properties.

Global scaling results show a striking asymmetry between LRC and AC signals. For positively correlated signals (α > 0.5), the global DFA exponent remains essentially unchanged even when up to 90 % of the data are removed. This robustness indicates that the intrinsic power‑law correlations survive random, sparse deletions. In contrast, anti‑correlated signals (α < 0.5) are highly vulnerable: even a modest loss of 10 % of the points drives the global exponent toward 0.5, i.e., the uncorrelated (white‑noise) regime. Thus, the sign of the correlation determines the sensitivity to missing data.

Local scaling analysis—obtained by computing the DFA exponent α_loc(n) as a function of window size n—reveals scale‑dependent deviations. For AC signals, a loss of just 10 % already produces a noticeable upward drift of α_loc at large scales (n > 10³), indicating a transition from anti‑correlation to uncorrelated behavior. For LRC signals, the local exponent remains stable up to about 65 % data loss. Beyond this threshold, a characteristic pattern emerges: at small scales the exponent is overestimated (α_loc > α_original), while at large scales it is underestimated (α_loc < α_original). This “crossover” reflects the competing effects of short, intact fragments that preserve strong local correlations and long gaps that dilute the overall correlation structure.

The authors also explore how the statistics of the removed (or retained) segments affect scaling. When segment lengths follow a power‑law distribution, the presence of occasional long gaps produces larger distortions in α_loc than when lengths are exponentially distributed, even for the same p and μ. Long gaps are interpreted by DFA as trends, leading to systematic bias.

To validate the simulation findings, the methodology is applied to two real‑world data sets: human gait stride intervals and commodity price fluctuations. Gait data, which exhibit strong positive correlations (α ≈ 0.9), retain their global exponent after up to 80 % artificial data removal, confirming the robustness observed in synthetic LRC series. Commodity prices, displaying anti‑correlation (α ≈ 0.4), lose this property after only a 5 % data loss, rapidly converging to α ≈ 0.5, in line with the AC simulation results.

Implications: The study provides practical guidance for researchers handling incomplete time series. For processes known or suspected to be positively correlated, DFA can be applied without extensive preprocessing, as the global scaling exponent is resilient to substantial missing data. Conversely, for anti‑correlated processes, even modest gaps can invalidate DFA results; therefore, one should either minimize data loss, employ gap‑filling or interpolation techniques, or complement DFA with methods that explicitly account for missing observations. Moreover, knowledge of the segment‑length distribution is crucial: long, sparsely occurring gaps can introduce pronounced bias, suggesting that preprocessing steps that break up long gaps (e.g., segment concatenation) may improve DFA reliability.

In summary, the paper quantifies the differential impact of extreme data loss on LRC and AC signals, demonstrates that global DFA exponents are robust for positively correlated series but fragile for anti‑correlated ones, and elucidates the scale‑dependent nature of these effects. These insights advance the methodological toolkit for analyzing real‑world, often incomplete, time‑series data across physics, physiology, finance, and related fields.

💡 Research Summary

📜 Original Paper Content