Poissonian bursts in e-mail correspondence

Poissonian bursts in e-mail correspondence
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent work has shown that the distribution of inter-event times for e-mail communication exhibits a heavy tail which is statistically consistent with a cascading Poisson process. In this work we extend the analysis to higher-order statistics, using the Fano and Allan factors to quantify the extent to which the empirical data depart from the known correlations of Poissonian statistics. The analysis shows that the higher-order statistics from the empirical data is indistinguishable from that of randomly reordered time series, thus demonstrating that e-mail correspondence is no more bursty or correlated than a Poisson process. Furthermore synthetic data sets generated by a cascading Poisson process replicate the burstiness and correlations observed in the empirical data. Finally, a simple rescaling analysis using the best-estimate rate of activity, confirms that the empirically observed correlations arise from a non-homogeneus Poisson process.


💡 Research Summary

This paper revisits the long‑standing debate on whether email communication exhibits genuine burstiness and long‑range correlations beyond what a simple Poisson process can explain. Earlier studies reported heavy‑tailed inter‑event time distributions and interpreted them as evidence of priority‑queue dynamics or other complex mechanisms. More recent work, however, suggested that a non‑homogeneous (time‑varying) Poisson process—specifically a cascading Poisson model—might already reproduce the observed heavy tails.

To move beyond first‑order statistics (means and variances), the authors employ two higher‑order measures: the Fano factor and the Allan factor. The Fano factor quantifies over‑dispersion by comparing the variance of event counts in a fixed observation window to the mean count; for a homogeneous Poisson process this ratio equals one. The Allan factor examines the variance of differences between successive windows, thereby probing temporal correlations; again, a Poisson process yields a value close to one across all window sizes.

The empirical dataset consists of over three thousand users’ email logs spanning one year. For each user the raw event time series is extracted, and a surrogate series is created by randomly permuting the timestamps while preserving the total number of events. This random reordering destroys any genuine temporal ordering but retains the marginal rate distribution, providing a baseline that reflects pure Poissonian variability.

Both the original and the reordered series are analyzed across a wide range of window lengths (from minutes to days). The resulting Fano and Allan curves are virtually indistinguishable and remain close to the Poisson benchmark of one, indicating that the empirical data do not display excess variability or long‑range dependence beyond what is expected from a time‑varying Poisson rate.

To further validate the hypothesis, synthetic data are generated using a cascading Poisson process whose rate function λ(t) is calibrated to each user’s observed activity pattern. These synthetic series reproduce the same Fano and Allan behavior as the real data, confirming that the model captures the essential statistical structure.

A final rescaling test is performed: each user’s estimated instantaneous rate (\hat r(t)) is integrated to produce a transformed time axis (R(t)=\int_0^t \hat r(s) ds). When event times are expressed in this rescaled domain, all users’ Fano and Allan factors collapse onto the universal Poisson line, demonstrating that the apparent burstiness originates solely from the non‑homogeneous rate rather than from intrinsic correlations.

The authors discuss the implications of these findings. While email activity is clearly modulated by external factors such as work schedules, meetings, and deadlines, these modulations manifest as deterministic or slowly varying changes in the underlying Poisson rate. No additional stochastic clustering or memory effects are required to explain the data. This contrasts with other digital communication media (e.g., instant messaging, social media) where genuine long‑range correlations may be present.

In conclusion, the study provides robust statistical evidence that email correspondence is no more bursty or correlated than a non‑homogeneous Poisson process. The use of Fano and Allan factors offers a powerful framework for distinguishing true dynamical correlations from simple rate variability, and the methodology can be extended to other forms of human‑generated time series to assess the necessity of more elaborate generative models.


Comments & Academic Discussion

Loading comments...

Leave a Comment