A Bayesian spatio-temporal model of panel design data: airborne particle number concentration in Brisbane, Australia

A Bayesian spatio-temporal model of panel design data: airborne particle   number concentration in Brisbane, Australia

This paper outlines a methodology for semi-parametric spatio-temporal modelling of data which is dense in time but sparse in space, obtained from a split panel design, the most feasible approach to covering space and time with limited equipment. The data are hourly averaged particle number concentration (PNC) and were collected, as part of the Ultrafine Particles from Transport Emissions and Child Health (UPTECH) project. Two weeks of continuous measurements were taken at each of a number of government primary schools in the Brisbane Metropolitan Area. The monitoring equipment was taken to each school sequentially. The school data are augmented by data from long term monitoring stations at three locations in Brisbane, Australia. Fitting the model helps describe the spatial and temporal variability at a subset of the UPTECH schools and the long-term monitoring sites. The temporal variation is modelled hierarchically with penalised random walk terms, one common to all sites and a term accounting for the remaining temporal trend at each site. Parameter estimates and their uncertainty are computed in a computationally efficient approximate Bayesian inference environment, R-INLA. The temporal part of the model explains daily and weekly cycles in PNC at the schools, which can be used to estimate the exposure of school children to ultrafine particles (UFPs) emitted by vehicles. At each school and long-term monitoring site, peaks in PNC can be attributed to the morning and afternoon rush hour traffic and new particle formation events. The spatial component of the model describes the school to school variation in mean PNC at each school and within each school ground. It is shown how the spatial model can be expanded to identify spatial patterns at the city scale with the inclusion of more spatial locations.


💡 Research Summary

This paper presents a Bayesian semi‑parametric spatio‑temporal modeling framework tailored to data that are dense in time but sparse in space, a situation that commonly arises when researchers employ a split‑panel design to monitor air quality with limited instrumentation. The case study focuses on hourly averaged particle number concentration (PNC), an indicator of ultrafine particles (UFPs), collected as part of the Ultrafine Particles from Transport Emissions and Child Health (UPTECH) project in the Brisbane metropolitan area. Two weeks of continuous measurements were taken at each of several government primary schools, with monitoring equipment moved sequentially from one school to the next. To augment the sparse school network, long‑term monitoring data from three fixed stations distributed across the city were incorporated, yielding a combined dataset that captures high‑resolution temporal dynamics but only a handful of spatial locations.

The authors address the statistical challenges of such an unbalanced design by constructing a hierarchical model with two distinct components. The temporal component uses penalised random walk (RW2) terms in a hierarchical fashion: a global RW2 captures common daily and weekly cycles present at all sites, while site‑specific RW2 terms model residual temporal variation unique to each school or monitoring station. This structure enables the model to separate the ubiquitous rush‑hour peaks (morning and afternoon) and weekend‑weekday differences from site‑level idiosyncrasies such as local traffic patterns or episodic new‑particle formation events.

The spatial component is modeled with a Gaussian Markov Random Field (GMRF) that represents the mean PNC at each location and, when data are available, within‑site variation (e.g., different points on a school campus). The GMRF is constructed via a stochastic partial differential equation (SPDE) approximation to a Matérn covariance, allowing the spatial field to be expressed as a sparse precision matrix suitable for fast computation.

Inference is performed using the Integrated Nested Laplace Approximation (INLA) framework in R. INLA provides deterministic, highly accurate approximations to the posterior marginals of all latent fields and hyper‑parameters, delivering results orders of magnitude faster than conventional Markov chain Monte Carlo (MCMC) while preserving the full Bayesian treatment of uncertainty. Weakly informative priors are placed on the RW2 smoothness parameters and the GMRF range and marginal variance, ensuring that the data drive the posterior while preventing over‑fitting.

Model validation proceeds through several complementary diagnostics: (1) cross‑validation of predictive performance (mean absolute error, root mean squared error) against held‑out observations, (2) posterior predictive checks comparing simulated PNC distributions to the observed data, and (3) information criteria (DIC, WAIC) to compare alternative specifications. The proposed hierarchical model consistently outperforms simpler alternatives such as independent site‑wise time series or a single global temporal trend.

Key findings include:

  1. Temporal dynamics – The global RW2 reveals a robust 24‑hour cycle and a distinct weekly pattern, with higher weekday peaks corresponding to commuter traffic. Site‑specific RW2 terms capture additional peaks that align with local rush‑hour timing and occasional spikes attributable to new‑particle formation, especially at schools located near major arterial roads.

  2. Spatial variability – The GMRF estimates indicate that mean PNC varies between schools by roughly 10–30 %, reflecting heterogeneous exposure environments across the city. Within‑school variation is also detectable, suggesting that micro‑scale factors (e.g., proximity to school entrances, on‑site traffic, vegetation cover) influence local particle concentrations.

  3. Exposure implications – By decomposing the temporal component into common and site‑specific parts, the model provides refined exposure estimates for schoolchildren, allowing public health researchers to attribute a portion of the UFP burden to city‑wide traffic trends and another portion to localized sources.

  4. Scalability – The authors demonstrate that the spatial framework can be readily expanded to a city‑wide network if additional monitoring locations become available. The sparse precision structure of the GMRF ensures that computational cost grows linearly with the number of sites, preserving the efficiency of INLA even for larger datasets.

In conclusion, the study showcases a powerful, computationally efficient Bayesian approach for analyzing split‑panel air‑quality data. By jointly modeling a common temporal backbone, site‑specific temporal deviations, and a spatially correlated mean field, the method captures the full complexity of ultrafine particle dynamics in an urban environment. The use of R‑INLA makes the approach accessible to practitioners who need to produce timely, uncertainty‑quantified exposure assessments for epidemiological studies or policy‑making. This framework is broadly applicable to any environmental monitoring scenario where high‑frequency temporal data are collected at a limited set of spatial locations.