Voting with Random Classifiers (VORACE): Theoretical and Experimental Analysis
In many machine learning scenarios, looking for the best classifier that fits a particular dataset can be very costly in terms of time and resources. Moreover, it can require deep knowledge of the specific domain. We propose a new technique which does not require profound expertise in the domain and avoids the commonly used strategy of hyper-parameter tuning and model selection. Our method is an innovative ensemble technique that uses voting rules over a set of randomly-generated classifiers. Given a new input sample, we interpret the output of each classifier as a ranking over the set of possible classes. We then aggregate these output rankings using a voting rule, which treats them as preferences over the classes. We show that our approach obtains good results compared to the state-of-the-art, both providing a theoretical analysis and an empirical evaluation of the approach on several datasets.
💡 Research Summary
The paper introduces VORACE (VOting with RAndom Classifiers), a novel ensemble learning framework that sidesteps the costly processes of hyper‑parameter tuning and domain‑specific expertise. Instead of carefully designing or selecting a single high‑performing model, VORACE randomly generates a pool of n base classifiers drawn from a predefined set (e.g., decision trees, neural networks, support vector machines). For each classifier, a random subset of its hyper‑parameters is sampled, and all classifiers are trained on the same training data. When a test instance arrives, each classifier outputs a probability distribution over the m classes; these probabilities are sorted to produce a full ranking of the classes. The collection of rankings constitutes a voting profile, which is then aggregated using a social‑choice voting rule. The paper experiments with four classic voting rules:
- Plurality – each classifier casts a single vote for its top‑ranked class.
- Borda – points are assigned based on the full ranking (m‑i points for the i‑th position).
- Copeland – pairwise comparisons are performed; the class winning the most head‑to‑head contests wins.
- Kemeny – the ranking that maximizes total agreement with the individual rankings (a computationally hard problem approximated in practice).
The authors provide a thorough theoretical analysis. First, under the simplifying assumption that all base classifiers have identical accuracy p and are statistically independent, they derive a closed‑form expression for the probability that the plurality vote selects the correct class. This expression is a cumulative binomial probability and reproduces the classic Condorcet Jury Theorem: if p > 0.5, the ensemble accuracy approaches 1 as n grows. Second, they relax the equal‑accuracy assumption by allowing each classifier i to have its own accuracy p_i and introduce pairwise correlation coefficients ρ_ij to model dependence. Using a multivariate normal approximation of the joint vote counts, they obtain analytic approximations for the expected vote share and variance, leading to a generalized probability of correct selection that accounts for heterogeneity and dependence. Third, they discuss how the axiomatic properties of voting rules (anonymity, neutrality, monotonicity, non‑dictatorship) translate into desirable ensemble characteristics such as fairness and robustness to outliers.
Empirically, the method is evaluated on twelve publicly available datasets spanning image, text, and biomedical domains. For each dataset, ensembles of size n = 50, 100, 200 are constructed, and the four voting rules are applied. Baselines include state‑of‑the‑art ensembles: XGBoost (gradient boosting), Random Forest (bagging), and standard bagging/boosting variants. Results show that:
- Plurality and Borda consistently achieve the highest accuracy across most datasets; Copeland and Kemeny are advantageous in highly imbalanced settings.
- VORACE’s accuracy is comparable to XGBoost and often exceeds Random Forest, despite the absence of any hyper‑parameter search.
- Training time and memory consumption are substantially lower (≈30‑50 % reduction) because the method avoids exhaustive grid or random searches.
- Even when the base classifiers are weak (e.g., shallow trees), increasing n yields a marked performance boost, echoing the “weak learners → strong learner” principle of boosting.
The paper also highlights the flexibility offered by voting theory. By selecting or weighting voting rules, practitioners can enforce fairness constraints, mitigate bias toward majority classes, or ensure monotonic behavior when classifier confidences change. This opens a new research avenue where ensemble design is guided by well‑studied social‑choice axioms rather than ad‑hoc heuristics.
In conclusion, VORACE demonstrates that a simple random‑generation of diverse classifiers combined with principled voting aggregation can deliver competitive predictive performance while dramatically reducing the engineering effort required for model selection and tuning. Future work is suggested on adaptive voting rule selection, class‑specific weighting schemes, and scalable distributed implementations.
Comments & Academic Discussion
Loading comments...
Leave a Comment