Contrasting pre-vaccine COVID-19 waves in Italy through Functional Data Analysis
We use data from 107 Italian provinces to characterize and compare mortality patterns in the first two COVID-19 epidemic waves, which occurred prior to the introduction of vaccines. We also associate these patterns with mobility, timing of government restrictions, and socio-demographic, infrastructural, and environmental covariates. Notwithstanding limitations in the accuracy and reliability of publicly available data, we are able to exploit information in curves and shapes through Functional Data Analysis techniques. Specifically, we document differences in magnitude and variability between the two waves; while both were characterized by a co-occurrence of ’exponential’ and ‘mild’ mortality patterns, the second spread much more broadly and asynchronously through the country. Moreover, we find evidence of a significant positive association between local mobility and mortality in both epidemic waves and corroborate the effectiveness of timely restrictions in curbing mortality. The techniques we describe could capture additional signals of interest if applied, for instance, to data on cases and positivity rates. However, we show that the quality of such data, at least in the case of Italian provinces, was too poor to support meaningful analyses.
💡 Research Summary
This paper applies Functional Data Analysis (FDA) to mortality data from 107 Italian provinces in order to characterize and contrast the first two COVID‑19 epidemic waves that occurred before the introduction of vaccines. The authors compute differential mortality—daily excess deaths relative to the 2015‑2019 average—using ISTAT all‑cause death counts, then smooth each province’s 150‑day time series (first wave: 25 Feb 2020 – 23 Jul 2020; second wave: 1 Oct 2020 – 27 Feb 2021) with cubic B‑splines (one knot per week, 21 knots total). Smoothing parameters are chosen by minimizing generalized cross‑validation (GCV) across all curves. To align temporal dynamics, landmark registration is performed: curves are shifted so that their highest peak (between day 10 and day 100) coincides with the earliest observed peak (day 20 for wave 1, day 33 for wave 2). This produces “peak‑aligned” functional representations that can be directly compared across provinces and waves.
The analysis reveals distinct wave‑level patterns. Wave 1 exhibits a sharp, high‑amplitude peak concentrated in a limited set of provinces, especially in Lombardy (e.g., Bergamo). The mortality curve is highly synchronous across these hotspots, reflecting an exponential growth phase. Wave 2, by contrast, shows a lower‑amplitude but more spatially dispersed peak, with substantial asynchrony among provinces. Notably, provinces most severely hit in wave 1 tend to experience milder mortality in wave 2, suggesting behavioral adaptation, depletion of the most vulnerable population, or partial herd immunity.
To explore drivers of these patterns, the authors assemble six scalar covariates at the provincial level: proportion of population over 65, adults per family doctor (proxy for primary‑care capacity), average beds per hospital, average students per classroom, average employees per firm, and annual PM10 concentrations. Functional linear regression (FLR) models incorporating these covariates indicate that, while each has a statistically significant effect, their explanatory power is modest compared with mobility and policy timing variables.
Mobility is measured using Google’s “Grocery & Pharmacy” and “Workplace” categories, which capture short‑range movements that remained permissible even under restrictions. These daily mobility series are smoothed with the same FDA pipeline as mortality. Across both waves, higher mobility is strongly positively associated with higher mortality, confirming the intuitive link between movement and viral spread. Importantly, the variability of mobility curves differs markedly between waves: wave 1 shows a relatively uniform, steep decline, whereas wave 2 displays heterogeneous patterns reflecting region‑specific restriction regimes (color‑coded zones) and differing public compliance.
A novel policy‑timing variable is constructed: for each province, the area under the mortality curve up to the date when restrictions were first imposed (the “pre‑restriction cumulative mortality”). This variable emerges as the most powerful predictor of subsequent mortality in both waves, outperforming all socio‑demographic, infrastructural, and environmental factors. The result underscores that earlier implementation of restrictions curtails the epidemic’s momentum, while delayed action allows mortality to accumulate rapidly.
The paper also conducts a thorough data‑quality assessment. Official COVID‑19 death counts from the Italian Civil Protection (DPC) are compared with ISTAT differential mortality at the regional level. DPC figures substantially under‑report deaths, especially during wave 1, though consistency improves in wave 2 for several regions. Case counts suffer from severe inconsistencies: provincial aggregates do not match reported regional totals, with discrepancies as large as a factor of seven in Sicily. Consequently, the authors exclude case data from their main analyses, focusing solely on excess mortality as a more reliable proxy.
Methodologically, the study demonstrates the utility of FDA for epidemiological time‑series: by treating each province’s mortality trajectory as a smooth function, one can perform curve clustering, functional regression, and landmark alignment, thereby extracting information that would be obscured in point‑wise analyses. The approach also mitigates over‑fitting risks that arise when modeling high‑dimensional scalar covariates across many small spatial units.
In conclusion, the authors provide empirical evidence that (1) the two pre‑vaccine COVID‑19 waves in Italy differed fundamentally in magnitude, spatial spread, and synchrony; (2) mobility is a robust driver of mortality across both waves; (3) the timing of restrictive measures is critically important, with earlier interventions dramatically reducing subsequent deaths; and (4) socio‑demographic and environmental factors play secondary roles. The study highlights persistent data quality challenges in official COVID‑19 reporting and advocates for the broader adoption of functional data techniques in pandemic monitoring and policy evaluation.
Comments & Academic Discussion
Loading comments...
Leave a Comment