A Kernelization-Based Approach to Nonparametric Binary Choice Models
We propose a new estimator for nonparametric binary choice models that does not impose a parametric structure on either the systematic function of covariates or the distribution of the error term. A key advantage of our approach is its computational scalability in the number of covariates. For instance, even when assuming a normal error distribution as in probit models, commonly used sieves for approximating an unknown function of covariates can lead to a large-dimensional optimization problem when the number of covariates is moderate. Our approach, motivated by kernel methods in machine learning, views certain reproducing kernel Hilbert spaces as special sieve spaces, coupled with spectral cut-off regularization for dimension reduction. We establish the consistency of the proposed estimator and asymptotic normality of the plug-in estimator for weighted average partial derivatives. Simulation studies show that, compared to parametric estimation methods, the proposed method effectively improves finite sample performance in cases of misspecification, and has a rather mild efficiency loss if the model is correctly specified. Using administrative data on the grant decisions of US asylum applications to immigration courts, along with nine case-day variables on weather and pollution, we re-examine the effect of outdoor temperature on court judges’ ``mood’’, and thus, their grant decisions.
💡 Research Summary
This paper introduces a novel estimator for fully non‑parametric binary choice models that avoids imposing any parametric form on either the systematic component G₀(·) or the error distribution F₀(·). The key innovation is to treat a reproducing kernel Hilbert space (RKHS) as a special sieve space and to apply spectral cut‑off regularization for dimensionality reduction. By representing G₀ as a linear combination of kernel functions k(·,Xᵢ) centered at the observed covariates, the number of parameters to be estimated is equal to the sample size n and does not grow with the covariate dimension d. To keep computation feasible when n is large, the authors perform an eigen‑decomposition of the n × n kernel matrix K and retain only the leading m eigenvectors, effectively projecting the problem onto an m‑dimensional subspace. They prove that the loss incurred by this truncation vanishes asymptotically, so the regularized solution is asymptotically equivalent to the full‑kernel solution.
For the error term, the paper follows Gallant and Nyckha (1987) by approximating its density with squared Hermite polynomials, which yields closed‑form approximations of the cumulative distribution function (CDF) and eliminates the need for numerical integration even when a log‑likelihood loss is used.
The estimator, called the kernelized non‑parametric (KNP) estimator, minimizes a sample average loss (either least‑squares or negative log‑likelihood) over a set Gₙ of RKHS functions constrained to lie in a ball of radius Rₙ and a set Fₙ of Hermite‑based density approximations. The authors establish consistency of both Ĝ and Ĥ by showing that, with appropriately growing RKHS radii and increasing Hermite order, the approximation errors shrink at rates n^{‑α} and n^{‑β} respectively.
A major contribution is the derivation of asymptotic normality for plug‑in estimators of weighted average partial derivatives (WAPDs) and conditional average partial effects (conditional APEs), which are the primary policy parameters in binary choice contexts. Using the functional delta method and a central limit theorem for empirical processes, they prove that √n (θ̂ – θ₀) converges to a normal distribution with a covariance matrix that can be consistently estimated.
Monte‑Carlo experiments explore three dimensions of misspecification: (i) non‑Gaussian error distributions (t‑distribution, mixture normals), (ii) highly non‑linear systematic functions (sinusoids, higher‑order polynomials), and (iii) increasing covariate dimensions (d = 10, 20, 30). Across all settings, the KNP estimator outperforms traditional probit and logit estimators in terms of mean‑squared error, reducing bias dramatically when the model is misspecified and incurring only a modest (≤5 %) efficiency loss when the model is correctly specified.
The empirical application revisits the “judges’ mood” literature by analyzing U.S. asylum‑grant decisions. Using administrative data on grant outcomes together with nine daily weather and pollution variables, the authors estimate the effect of outdoor temperature on judges’ propensity to grant asylum. The KNP estimator finds a statistically significant positive temperature effect, whereas parametric probit estimates are unstable and have inflated standard errors. This illustrates the practical advantage of the proposed method for policy‑relevant inference.
In summary, the paper makes three substantive advances: (1) it provides a computationally scalable framework for high‑dimensional non‑parametric binary choice models by leveraging RKHS sieves and spectral regularization; (2) it delivers a rigorous asymptotic theory covering consistency, convergence rates, and normality of key policy‑relevant functionals; and (3) it demonstrates through simulations and a real‑world case study that the method is robust to misspecification while retaining near‑optimal efficiency under correct specification. Remaining challenges include the selection of kernel bandwidth and the truncation dimension m in practice, as well as the computational burden of eigen‑decompositions for extremely large samples, which are natural avenues for future research.
Comments & Academic Discussion
Loading comments...
Leave a Comment