Robust estimation of polyserial correlation coefficients: A density power divergence approach

Robust estimation of polyserial correlation coefficients: A density power divergence approach
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The association between a continuous and an ordinal variable is commonly modeled through the polyserial correlation model. However, this model, which is based on a partially-latent normality assumption, may be misspecified in practice, due to, for example (but not limited to), outliers or careless responses. The typically used maximum likelihood (ML) estimator is highly susceptible to such misspecification: One single observation not generated by partially-latent normality can suffice to produce arbitrarily poor estimates. As a remedy, we propose a novel estimator of the polyserial correlation model designed to be robust against the adverse effects of observations discrepant to that model. The estimator leverages density power divergence estimation to achieve robustness by implicitly downweighting such observations; the ensuing weights constitute a useful tool for pinpointing potential sources of model misspecification. The proposed estimator generalizes ML and is consistent as well as asymptotically Gaussian. As price for robustness, some efficiency must be sacrificed, but substantial robustness can be gained while maintaining more than 98% of ML efficiency. We demonstrate our estimator’s robustness and practical usefulness in simulation experiments and an empirical application in personality psychology where our estimator helps identify outliers. Finally, the proposed methodology is implemented in free open-source software.


💡 Research Summary

This paper addresses a fundamental vulnerability in the estimation of polyserial correlation, the measure of association between a continuous variable and an ordinal variable. The conventional approach relies on a partially‑latent normality assumption: the observed continuous variable X and an unobserved latent continuous variable η are jointly normally distributed, and η is discretized by a set of thresholds to produce the ordinal response Y. Under this model the polyserial correlation ρ = Corr(X, η) is the parameter of interest. Maximum‑likelihood (ML) estimation, the standard method, is extremely sensitive to violations of the latent normality assumption. Even a single observation that does not arise from the assumed bivariate normal distribution can cause the ML estimator to diverge or become arbitrarily biased. This situation is formally described as a “partial contamination” problem, analogous to Huber’s contamination model, where only an unknown subset of the data may be generated by an unspecified non‑normal process (e.g., outliers in X, careless responses in Y).

To obtain an estimator that remains reliable under such partial misspecification, the authors adopt the density power divergence (DPD) framework introduced by Basu et al. (1998). DPD defines a family of objective functions indexed by a tuning parameter α ≥ 0. When α = 0 the DPD objective reduces to the usual log‑likelihood, so the DPD estimator coincides with the ML estimator. For α > 0 each observation receives a weight w_i = f_θ(x_i, y_i)^α, where f_θ denotes the model density evaluated at the current parameter vector θ. Observations that are poorly fitted by the model (i.e., have low density) receive exponentially smaller weights, thereby limiting their influence on the final estimate. The authors show that the DPD estimator is consistent and asymptotically normal for any fixed α, and that it inherits the robustness properties of the underlying divergence: under partial contamination it converges to a parameter value that is closer to the true θ than the ML limit.

The paper provides a thorough theoretical development. Section 5 introduces the DPD objective for the polyserial model, derives the estimating equations, and proposes a simple rescaling of the raw weights to the unit interval, which makes the weights directly interpretable as diagnostic measures of outlyingness. Section 6 proves consistency, asymptotic normality, and derives the influence function, confirming that the estimator’s robustness increases with α while its asymptotic efficiency relative to ML decreases. The authors also establish that, for modest α (e.g., 0.1–0.2), the loss of efficiency is less than 2 % of the ML efficiency, yet the breakdown point is dramatically higher.

From a computational standpoint, the authors adapt an EM‑type algorithm. In the E‑step they compute the conditional expectations of the latent η given the observed (X, Y) and the current parameter values. In the M‑step they update μ, σ², the thresholds τ, and the correlation ρ by solving the weighted DPD estimating equations. The algorithm converges rapidly; in the authors’ implementation (the R package robcat, available on CRAN) typical data sets with a few thousand observations are processed in under two seconds on a standard laptop.

Simulation studies explore four scenarios: (i) pure normal data (no contamination), (ii) contamination in X (mean/variance shifts), (iii) contamination in Y (threshold shifts mimicking careless responding), and (iv) combined contamination. Across all settings the DPD estimator outperforms ML in terms of bias and mean‑squared error. When 5 % of the data are contaminated, ML can produce correlation estimates that deviate by more than 0.2 from the truth, whereas the DPD estimator with α = 0.15 typically stays within 0.03. Efficiency loss relative to ML remains below 2 % when α is kept modest, confirming the practical viability of the method.

An empirical application uses a personality‑psychology questionnaire containing a continuous stress score and several 5‑point Likert items. The ML estimate of the polyserial correlation is negative (≈ −0.12), driven by a handful of respondents with extreme stress scores paired with the lowest Likert category—a pattern suggestive of careless responding. The DPD estimator (α = 0.15) automatically down‑weights these cases (weights ≈ 0.1) and yields a positive, plausible correlation (≈ 0.46). Plotting the observation‑specific weights provides a transparent tool for researchers to flag and investigate potential outliers.

The authors position their contribution relative to prior work. Earlier studies on polyserial correlation have examined full‑distribution misspecification (e.g., non‑normal latent variables) but have not addressed partial contamination, nor have they offered a robust estimator for the joint model. The present work fills this gap, extending the authors’ earlier robust approach for polychoric correlation (Welz et al., 2026) to the mixed‑type polyserial setting. By leveraging DPD, the method remains fully parametric, requires no prior knowledge of the contamination proportion or form, and integrates seamlessly with existing SEM software that relies on polyserial correlations.

In conclusion, the paper delivers a theoretically sound, computationally efficient, and practically useful robust estimator for polyserial correlation. It demonstrates that a modest sacrifice in asymptotic efficiency (≤ 2 %) yields substantial gains in robustness against a wide range of realistic data problems, including outliers and careless responses. The accompanying open‑source implementation makes the method readily accessible to applied researchers, and the diagnostic weights offer an added benefit for data cleaning and model validation. Future extensions could explore multivariate polyserial systems, Bayesian DPD formulations, and applications to other mixed‑type data structures.


Comments & Academic Discussion

Loading comments...

Leave a Comment