Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model
In 2004 the Dutch Department of Social Affairs conducted a survey to assess the extent of noncompliance with social security regulations. The survey was conducted among 870 recipients of social security benefits and included a series of sensitive questions about regulatory noncompliance. Due to the sensitive nature of the questions the randomized response design was used. Although randomized response protects the privacy of the respondent, it is unlikely that all respondents followed the design. In this paper we introduce a model that allows for respondents displaying self-protective response behavior by consistently giving the nonincriminating response, irrespective of the outcome of the randomizing device. The dependent variable denoting the total number of incriminating responses is assumed to be generated by the application of randomized response to a latent Poisson variable denoting the true number of rule violations. Since self-protective responses result in an excess of observed zeros in relation to the Poisson randomized response distribution, these are modeled as observed zero-inflation. The model includes predictors of the Poisson parameters, as well as predictors of the probability of self-protective response behavior.
💡 Research Summary
In 2004 the Dutch Department of Social Affairs carried out a nationwide survey of 870 recipients of unemployment‑insurance benefits to assess the prevalence of non‑compliance with the Unemployment Insurance Act. Because the five questions asked about potentially illegal behaviour, a forced‑response randomized response (RR) technique was employed: respondents rolled two virtual dice and answered “yes” or “no” according to a pre‑specified rule. A programming error caused the true probabilities of a “yes” response to be 0.9329 for non‑compliant respondents and 0.18678 for compliant respondents, rather than the intended 11/12 and 1/6.
Standard RR models assume that each respondent’s true status (violation or not) is mis‑classified according to known probabilities derived from the randomizing device. When the true number of violations across the five items, denoted S, follows a multinomial distribution, the observed count of “yes” answers, S*, follows a mixture of multinomial mis‑classification probabilities. However, empirical RR data often contain an excess of zero counts that cannot be explained by the multinomial‑RR model alone.
The authors argue that this excess stems from “self‑protective” (SP) behaviour: some respondents ignore the randomizing device and always give the non‑incriminating answer (i.e., they always answer “no”). Such behaviour forces the observed sum score to be zero regardless of the true underlying violations, creating a zero‑inflation problem. To address this, they propose a zero‑inflated Poisson randomized response (ZIP‑RR) regression model.
The model consists of two components: (1) a Poisson‑RR part that assumes the latent true violation count S follows a Poisson distribution with mean λ, truncated at the maximum possible count M = 5; (2) a zero‑inflation part that models the probability θ that a respondent is in the SP class, i.e., that the observed sum score is forced to zero. Both λ and θ are linked to covariates through log‑ and logit‑links, respectively.
Covariates for λ include demographic variables (gender, age, year of unemployment, education level, knowledge of the regulations) that are traditionally associated with rule‑breaking. Covariates for θ are two scales measuring respondents’ trust in the confidentiality of the forced‑response design and their understanding of how the dice rule works.
Maximum‑likelihood estimation is used to fit the model. The authors evaluate model fit with AIC, BIC, Pearson chi‑square statistics, and checks for over‑dispersion. They also test the Poisson assumption for the latent S by examining residuals and comparing the fitted distribution to the observed frequencies.
Results show that higher education and greater knowledge of the regulations are associated with larger λ, indicating a higher expected number of violations. Older respondents (> 26 years) and those who became unemployed in 2004 have higher probabilities of zero observed scores, reflecting either genuine compliance or SP behaviour. Trust in the survey design significantly reduces θ (the probability of SP), whereas the understanding score has a weaker effect. The ZIP‑RR model fits the data substantially better than a standard Poisson‑RR model, confirming that accounting for SP‑induced zero‑inflation improves prevalence estimation.
The paper’s contributions are threefold: (1) it formalizes self‑protective response behaviour as a zero‑inflation mechanism within the RR framework; (2) it integrates Poisson‑RR and zero‑inflated Poisson models into a unified regression structure that can accommodate covariate effects on both the true violation count and the SP probability; (3) it demonstrates empirically that respondents’ trust in the confidentiality of the randomizing device influences the degree of SP, highlighting the importance of design perception in sensitive surveys.
Overall, the study provides a robust statistical tool for researchers using randomized response techniques in fields such as public health, criminology, and social policy, where respondents may be motivated to conceal incriminating information. By jointly modelling the latent count of sensitive behaviours and the propensity to engage in self‑protective answering, the ZIP‑RR approach yields more accurate prevalence estimates and richer insight into the determinants of both rule‑breaking and response bias.
Comments & Academic Discussion
Loading comments...
Leave a Comment