Bayesian Estimation of Inequalities with Non-Rectangular Censored Survey Data

Bayesian Estimation of Inequalities with Non-Rectangular Censored Survey   Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Synthetic indices are used in Economics to measure various aspects of monetary inequalities. These scalar indices take as input the distribution over a finite population, for example the population of a specific country. In this article we consider the case of the French 2004 Wealth survey. We have at hand a partial measurement on the distribution of interest consisting of bracketed and sometimes missing data, over a subsample of the population of interest. We present in this article the statistical methodology used to obtain point and interval estimates taking into account the various uncertainties. The inequality indices being nonlinear in the input distribution, we rely on a simulation based approach where the model for the wealth per household is multivariate. Using the survey data as well as matched auxiliary tax declarations data, we have at hand a quite intricate non-rectangle multidimensional censoring. For practical issues we use a Bayesian approach. Inference using Monte-Carlo approximations relies on a Monte-Carlo Markov chain algorithm namely the Gibbs sampler. The quantities interesting to the decision maker are taken to be the various inequality indices for the French population. Their distribution conditional on the data of the subsample are assumed to be normal centered on the design-based estimates with variance computed through linearization and taking into account the sample design and total nonresponse. Exogeneous selection of the subsample, in particular the nonresponse mechanism, is assumed and we condition on the adequate covariates.


💡 Research Summary

The paper tackles the problem of estimating wealth inequality measures from the French 2004 Household Wealth Survey, where the available data are heavily censored, bracketed, and partially missing. In addition to the survey responses, the authors have access to matched tax‑declaration records, which introduce further multidimensional, non‑rectangular censoring because each variable may be observed only within different intervals or not at all. Traditional design‑based estimators, which rely on point estimates and linearized variances, cannot adequately handle such complex censoring structures.

To overcome these limitations, the authors adopt a Bayesian framework. They model household wealth as a multivariate (log‑)normal distribution conditional on a set of covariates common to both the survey and tax data (e.g., age, education, region). The censoring mechanism is encoded as a set of multidimensional interval constraints that form a non‑rectangular feasible region for each observation. Because the likelihood under such constraints is analytically intractable, the authors employ a data‑augmentation strategy within a Gibbs sampler. At each iteration, the latent (uncensored) wealth components are drawn from their conditional truncated normal distributions given the current parameter values, and then the model parameters (mean vector, covariance matrix) are updated conditional on the augmented data. Convergence diagnostics (Gelman‑Rubin statistics, effective sample size) are reported to ensure reliable posterior draws.

Inequality indices—such as the Gini coefficient, the share of wealth held by the top 1 %, and the Palma ratio—are nonlinear functions of the wealth distribution. Consequently, after each Gibbs iteration the authors compute these indices on the simulated full population distribution, thereby generating a posterior sample of each index. The posterior mean provides a point estimate, while quantiles of the posterior sample yield credible intervals that fully reflect sampling variability, censoring uncertainty, and model uncertainty.

For the design‑based component, the authors assume that the survey’s design‑based estimates of the indices are normally distributed with a variance obtained by linearization that incorporates the complex sample design and total non‑response. This normal approximation serves as a prior‑like centering for the Bayesian analysis, effectively blending design‑based information with model‑based inference. The non‑response mechanism is treated as exogenous; conditioning on observed covariates is assumed to remove any selection bias.

The empirical application demonstrates that the Bayesian estimates differ noticeably from the raw design‑based figures, especially for the upper tail measures. The posterior credible intervals are wider than the design‑based standard errors, reflecting the additional uncertainty introduced by the censoring and missing data. The methodology also allows the authors to perform counterfactual analyses, such as assessing how changes in the non‑response pattern would affect inequality estimates.

In summary, the paper presents a rigorous Bayesian solution for estimating inequality measures from surveys with intricate, non‑rectangular censoring. By integrating auxiliary tax data, employing Gibbs sampling with data augmentation, and respecting the complex survey design, the authors produce point and interval estimates that are statistically coherent and practically useful for policymakers concerned with wealth distribution and redistribution policies. Future work could extend the framework to longitudinal surveys, incorporate alternative distributional assumptions, or apply the method to other countries with similar data challenges.


Comments & Academic Discussion

Loading comments...

Leave a Comment