Selecting Hyperparameters for Tree-Boosting
Tree-boosting is a widely used machine learning technique for tabular data. However, its out-of-sample accuracy is critically dependent on multiple hyperparameters. In this article, we empirically compare several popular methods for hyperparameter optimization for tree-boosting including random grid search, the tree-structured Parzen estimator (TPE), Gaussian-process-based Bayesian optimization (GP-BO), Hyperband, the sequential model-based algorithm configuration (SMAC) method, and deterministic full grid search using $59$ regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than $100$ is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a material effect on the accuracy of tree-boosting, i.e., there is no small set of hyperparameters that is more important than others, and (iv) choosing the number of boosting iterations using early stopping yields more accurate results compared to including it in the search space for regression tasks.
💡 Research Summary
This paper presents a comprehensive empirical comparison of six hyperparameter optimization (HPO) techniques for tree‑boosting models, specifically using the GPBoost implementation that mirrors LightGBM. The authors evaluate deterministic full grid search, random grid search, the Tree‑structured Parzen Estimator (TPE), Gaussian‑process‑based Bayesian optimization (GP‑BO), Hyperband, and Sequential Model‑based Algorithm Configuration (SMAC) across 59 publicly available OpenML data sets (36 regression, 23 classification).
Experimental design follows a rigorous protocol: each data set undergoes 5‑fold cross‑validation; within each training fold, an inner 80/20 split creates validation data for hyperparameter selection. To capture stochastic variability, every HPO method is repeated 20 times with different random seeds for each of the five outer folds, yielding 100 runs per method per data set. The hyperparameter search space includes eight key LightGBM parameters—learning rate, number of leaves, max depth, min data per leaf, L2 regularization, max bin, bagging fraction, and feature fraction—spanning both continuous and discrete ranges. All methods are allocated the same computational budget: 135 trials (the size of the deterministic full grid) for all methods except Hyperband, which uses a configuration of R = 2^150 and η = 2, resulting in 9 recorded successive‑halving rungs and a comparable total number of model evaluations.
Performance metrics are RMSE and R² for regression, accuracy and log‑loss for classification. To enable fair cross‑dataset comparison, the authors apply ADTM (average distance to minimum/maximum) normalization, followed by aggregation of normalized scores, ranks, and relative differences to the best method.
Results show that SMAC consistently outperforms all other approaches in terms of average normalized score and rank, especially when the budget is limited (fewer than 100 trials). Random grid search and default hyperparameters perform worst, often lagging 10–15 % behind the best method. The analysis of hyperparameter importance reveals that while learning rate, number of leaves, min data per leaf, and L2 regularization have the strongest individual effects, max depth and max bin also contribute non‑negligibly, indicating that tuning the full set of parameters is beneficial.
A notable finding for regression tasks is that determining the number of boosting iterations via early stopping yields better predictive performance than treating the iteration count as a tunable hyperparameter; early stopping improves average RMSE by roughly 3 % relative to the explicit‑search alternative. Hyperband, despite its efficient resource allocation, converges more slowly than SMAC in this setting. GP‑BO demonstrates strong early‑stage performance but struggles with the high‑dimensional discrete space, while TPE outperforms random grid search but still falls short of SMAC’s model‑based scheduling and multi‑fidelity capabilities.
The authors conclude that SMAC is the most reliable and efficient HPO method for tree‑boosting models, provided that a sufficient number of trials (≥ 100) is allocated. They also emphasize the practical advantage of using early stopping for iteration control. Future work is suggested to explore meta‑learning‑based initializations, multi‑fidelity extensions of SMAC, and scalability to larger data sets.
Comments & Academic Discussion
Loading comments...
Leave a Comment