Using SVM to Estimate and Predict Binary Choice Models
The support vector machine (SVM) has an asymptotic behavior that parallels that of the quasi-maximum likelihood estimator (QMLE) for binary outcomes generated by a binary choice model (BCM), although it is not a QMLE. We show that, under the linear conditional mean condition for covariates given the systematic component used in the QMLE slope consistency literature, the slope of the separating hyperplane given by the SVM consistently estimates the BCM slope parameter, as long as the class weight is used as required when binary outcomes are severely imbalanced. The SVM slope estimator is asymptotically equivalent to that of logistic regression in this sense. The finite-sample performance of the two estimators can be quite distinct depending on the distributions of covariates and errors, but neither dominates the other. The intercept parameter of the BCM can be consistently estimated once a consistent estimator of its slope parameter is obtained.
💡 Research Summary
This paper investigates the relationship between the support vector machine (SVM) and the binary choice model (BCM), showing that under standard regularity conditions the SVM’s separating hyperplane provides a consistent estimator of the BCM’s slope parameters. The authors begin by noting that the optimal classifier for a BCM—derived from the sign of a linear index α₀ + X′β₀—is exactly the same functional form used by the SVM decision rule. They formalize this connection by defining the SVM’s soft‑margin objective, which minimizes the empirical hinge loss plus a ridge penalty, and then study its population counterpart. Under a “linear conditional mean” assumption—namely that the conditional expectation of covariates given the systematic component α₀ + X′β₀ is an affine function—the population risk has a unique minimizer θ* that coincides with the true parameter vector θ₀ up to a positive scalar. Consequently, the SVM estimator θ̂ converges in probability to θ* and, because the direction of β̂ is invariant to scaling, β̂ consistently estimates the direction of β₀. This condition is satisfied for a wide class of covariate distributions, including elliptical families, and can be enforced by appropriate weighting schemes.
A major contribution of the paper is the treatment of severely imbalanced binary outcomes. In the absence of class weighting, a hard‑margin SVM collapses to predicting the majority class, rendering the slope estimator inconsistent. By incorporating class‑specific penalty weights into the hinge loss, the authors restore consistency even when the minority class is extremely rare. They contrast this with the quasi‑maximum likelihood estimator (QMLE), especially logistic regression, which remains consistent under the same linear‑conditional‑mean condition but does not require explicit class weighting. However, the QMLE can suffer from numerical instability in highly imbalanced settings, whereas the weighted SVM remains robust.
Through extensive Monte‑Carlo simulations, the authors compare finite‑sample performance of the SVM estimator and the logistic‑regression estimator across a variety of covariate and error distributions. When covariates are Gaussian and errors follow a logistic distribution, the logit estimator typically yields slightly lower mean‑squared error. Conversely, with non‑Gaussian, skewed, or heavy‑tailed covariates (e.g., mixtures, t‑distributions) the SVM often outperforms the logit estimator. The simulations also reveal that neither estimator uniformly dominates the other; performance hinges on the underlying data‑generating process.
After establishing slope consistency, the paper addresses estimation of the intercept α₀. Leveraging the fact that β̂ is a consistent direction estimator, the authors apply Manski’s maximum‑score estimator and Horowitz’s smoothed version to recover α₀. They prove that substituting β̂ for β₀ does not affect the asymptotic distribution of the intercept estimator, thereby providing a practical two‑step procedure: first obtain β̂ via SVM, then estimate α₀ using a maximum‑score type method.
The authors emphasize that, despite the asymptotic equivalence in slope estimation, the SVM cannot be expressed as a QMLE; the two methods differ in objective functions, limit distributions, and sensitivity to dimensionality. The SVM’s margin‑maximizing nature offers advantages in high‑dimensional, low‑sample contexts by controlling over‑fitting, while the QMLE’s likelihood‑based approach is asymptotically efficient when sample sizes are large and the model is correctly specified.
In conclusion, the paper provides a rigorous bridge between machine‑learning classification algorithms and classical econometric binary‑choice estimation. It demonstrates that, under a modest linear‑conditional‑mean condition and with appropriate class weighting, the SVM yields a consistent estimator of the BCM slope, comparable to logistic regression, and can be combined with established maximum‑score techniques to estimate the intercept. This result expands the toolkit for applied researchers dealing with high‑dimensional or imbalanced binary outcome data, offering a theoretically justified alternative to traditional likelihood‑based methods.
Comments & Academic Discussion
Loading comments...
Leave a Comment