Random effects compound Poisson model to represent data with extra zeros
This paper describes a compound Poisson-based random effects structure for modeling zero-inflated data. Data with large proportion of zeros are found in many fields of applied statistics, for example in ecology when trying to model and predict species counts (discrete data) or abundance distributions (continuous data). Standard methods for modeling such data include mixture and two-part conditional models. Conversely to these methods, the stochastic models proposed here behave coherently with regards to a change of scale, since they mimic the harvesting of a marked Poisson process in the modeling steps. Random effects are used to account for inhomogeneity. In this paper, model design and inference both rely on conditional thinking to understand the links between various layers of quantities : parameters, latent variables including random effects and zero-inflated observations. The potential of these parsimonious hierarchical models for zero-inflated data is exemplified using two marine macroinvertebrate abundance datasets from a large scale scientific bottom-trawl survey. The EM algorithm with a Monte Carlo step based on importance sampling is checked for this model structure on a simulated dataset : it proves to work well for parameter estimation but parameter values matter when re-assessing the actual coverage level of the confidence regions far from the asymptotic conditions.
💡 Research Summary
This paper introduces a novel hierarchical model for data sets that contain a large proportion of zero observations, a situation commonly referred to as zero‑inflation. Traditional approaches such as zero‑inflated Poisson (ZIP), zero‑inflated negative binomial (ZINB), or two‑part (hurdle) models treat the probability of a zero as a separate parameter and often lose coherence when the observational scale changes (e.g., aggregating counts over larger spatial or temporal units). To overcome this limitation, the authors build on the concept of a marked Poisson process: a homogeneous Poisson process with intensity λ generates “events,” and each event carries an independent binary mark Z. If Z = 1 the event contributes a positive measurement (drawn from a secondary distribution); if Z = 0 the event contributes nothing, resulting in a zero observation. In this construction zeros arise naturally as “unharvested” events, and a change of scale merely rescales λ, preserving the model’s structural integrity.
To capture heterogeneity across sampling units (e.g., different trawl stations), the model introduces random effects on both the log‑intensity and the log‑odds of the mark probability. Specifically, log λ_i = μ_λ + u_i and logit p_i = μ_p + v_i, where (u_i, v_i) follow a bivariate normal distribution with covariance matrix Σ. This hierarchical specification allows each unit to have its own mean occurrence rate and its own propensity to generate non‑zero values, while borrowing strength across units through the shared hyper‑parameters (μ_λ, μ_p, Σ).
Parameter estimation is performed via an Expectation–Maximization (EM) algorithm augmented with a Monte‑Carlo (MC) E‑step. The complete‑data likelihood involves the latent total number of Poisson events N_i and the latent marks Z_i for each observed count Y_i. Because N_i can be arbitrarily large, the authors employ importance sampling to approximate the conditional expectations E
Comments & Academic Discussion
Loading comments...
Leave a Comment