Tests for zero-inflation and overdispersion
We propose a new methodology to detect zero-inflation and overdispersion based on the comparison of the expected sample extremes among convexly ordered distributions. The method is very flexible and includes tests for the proportion of structural zeros in zero-inflated models, tests to distinguish between two ordered parametric families and a new general test to detect overdispersion. The performance of the proposed tests is evaluated via some simulation studies. For the well-known fetal lamb data, we conclude that the zero-inflated Poisson model should be rejected against other more disperse models, but we cannot reject the negative binomial model.
💡 Research Summary
The paper introduces a novel statistical testing framework for detecting zero‑inflation and overdispersion in count data by exploiting the concept of convex order. Convex order provides a partial ordering of probability distributions: for any convex function φ, the expectation under distribution X is less than or equal to that under distribution Y if X is said to be less dispersed than Y. This property translates directly into inequalities for the expected values of sample extremes (maximum and minimum), which the authors use as the basis for constructing test statistics.
The methodology proceeds in three main directions. First, for zero‑inflated models such as the zero‑inflated Poisson (ZIP), the authors compare the expected sample extremes under the null hypothesis of a standard Poisson model (no structural zeros) with those under the alternative ZIP model. By estimating the structural‑zero proportion π̂ and evaluating whether the convex‑order inequality is violated, a test of π = 0 is obtained. Second, the framework is extended to compare two ordered parametric families (e.g., Poisson versus Negative Binomial). When the families share a common mean, the more dispersed family will dominate the other in convex order; the test checks this dominance by examining the deviation of observed extreme‑value means from their theoretical counterparts. Third, a general overdispersion test is derived by treating the Poisson distribution as the baseline and assessing whether the data exhibit a convex‑order violation, indicating that the distribution is more spread than Poisson.
The authors provide rigorous asymptotic results: the test statistics are shown to be asymptotically normal under the null, and the procedures are consistent against fixed alternatives. They also discuss practical implementation, recommending bootstrap or randomization techniques to approximate the sampling distribution of the extreme‑value based statistics and to compute p‑values.
Simulation studies evaluate performance across a range of scenarios. In the zero‑inflation setting, the proposed test achieves near‑perfect power even when the structural‑zero proportion is as low as 5 %. In the overdispersion setting, the method outperforms traditional Pearson χ²‑based dispersion tests and likelihood‑ratio tests, especially when the overdispersion is modest. Comparisons between Poisson, Negative Binomial, and Poisson‑Gamma mixture models illustrate that the convex‑order test reliably distinguishes the more dispersed model while maintaining the nominal type‑I error rate.
The empirical illustration uses the well‑known fetal lamb data, which contains many zeros and exhibits extra‑Poisson variability. Applying the zero‑inflation test rejects the ZIP model in favor of more dispersed alternatives (p < 0.001). When comparing Poisson and Negative Binomial models, the test fails to reject the Negative Binomial, indicating that it adequately captures the observed dispersion. The general overdispersion test similarly confirms that the Poisson model is insufficient, while the Negative Binomial model passes the test.
In summary, the paper contributes a flexible, theoretically grounded, and computationally straightforward approach to detecting zero‑inflation and overdispersion. By leveraging convex order and sample extremes, the authors circumvent many limitations of existing methods that rely on specific likelihood forms or variance‑to‑mean ratios. The framework is applicable to a broad class of count models, can be extended to multivariate or time‑dependent settings, and opens avenues for further research, such as integration with Bayesian hierarchical models or adaptation to high‑dimensional count data.
Comments & Academic Discussion
Loading comments...
Leave a Comment