Default Machine Learning Hyperparameters Do Not Provide Informative Initialization for Bayesian Optimization

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Bayesian Optimization (BO) is a standard tool for hyperparameter tuning thanks to its sample efficiency on expensive black-box functions. While most BO pipelines begin with uniform random initialization, default hyperparameter values shipped with popular ML libraries such as scikit-learn encode implicit expert knowledge and could serve as informative starting points that accelerate convergence. This hypothesis, despite its intuitive appeal, has remained largely unexamined. We formalize the idea by initializing BO with points drawn from truncated Gaussian distributions centered at library defaults and compare the resulting trajectories against a uniform-random baseline. We conduct an extensive empirical evaluation spanning three BO back-ends (BoTorch, Optuna, Scikit-Optimize), three model families (Random Forests, Support Vector Machines, Multilayer Perceptrons), and five benchmark datasets covering classification and regression tasks. Performance is assessed through convergence speed and final predictive quality, and statistical significance is determined via one-sided binomial tests. Across all conditions, default-informed initialization yields no statistically significant advantage over purely random sampling, with p-values ranging from 0.141 to 0.908. A sensitivity analysis on the prior variance confirms that, while tighter concentration around the defaults improves early evaluations, this transient benefit vanishes as optimization progresses, leaving final performance unchanged. Our results provide no evidence that default hyperparameters encode useful directional information for optimization. We therefore recommend that practitioners treat hyperparameter tuning as an integral part of model development and favor principled, data-driven search strategies over heuristic reliance on library defaults.

💡 Research Summary

Bayesian Optimization (BO) has become the de‑facto method for tuning machine‑learning hyperparameters because it can locate high‑performing configurations with very few expensive model‑training evaluations. In practice, BO pipelines usually begin by drawing a small set of initial points uniformly at random from the entire search space, implicitly assuming that no prior knowledge about where good configurations lie is available. At the same time, popular ML libraries such as scikit‑learn, XGBoost, and LightGBM ship with carefully chosen default hyperparameter values that are meant to work reasonably well across many datasets. This raises a natural hypothesis: if these defaults encode useful prior information, initializing BO with points sampled from a distribution centered on the defaults should accelerate convergence and improve final performance, all without any extra data or meta‑learning infrastructure.

The authors formalize this idea by constructing a truncated Gaussian distribution whose mean equals the library default for each hyperparameter and whose support respects the allowed bounds. Initial BO points are drawn from this distribution (the “default‑centered” strategy) and compared against the conventional uniform‑random initialization (the “random” strategy). The comparison is carried out across a comprehensive experimental grid: three BO back‑ends (BoTorch with a GP surrogate, Optuna using the Tree‑Parzen Estimator, and Scikit‑Optimize with a GP surrogate), three model families (Random Forests, Support Vector Machines, Multilayer Perceptrons), and five benchmark datasets covering both classification and regression tasks. For each combination, the authors run 30 independent BO trials with a fixed evaluation budget (e.g., 50 function calls) and record both early‑iteration loss reduction (convergence speed) and the final validation metric (accuracy or RMSE).

Statistical significance is assessed with one‑sided binomial tests that count how often the default‑centered strategy outperforms the random baseline across repetitions. Across all 45 experimental conditions, the resulting p‑values range from 0.141 to 0.908, far above conventional thresholds (α = 0.05). In other words, there is no evidence that default‑informed initialization yields a higher probability of beating random initialization.

A sensitivity analysis varies the Gaussian prior variance to explore how tightly the initial points are clustered around the defaults. With very small variance, the first few BO iterations show a modest advantage because the defaults are indeed “safe” single‑shot configurations. However, this advantage quickly disappears; after roughly ten iterations the convergence curves of both strategies are indistinguishable, and the final best‑found hyperparameters have comparable predictive quality. This pattern confirms that while defaults provide a convenient starting point, they do not guide the optimizer toward the global optimum in a lasting way.

The paper also provides a theoretical perspective on why such an effect might be limited. In GP‑based BO, the posterior variance collapses near observed points. If the initial observations are densely packed around a single default, the surrogate becomes over‑confident locally and remains highly uncertain elsewhere, potentially suppressing exploration of regions that actually contain the optimum. The empirical results validate this risk: default‑centered initialization can lead to a “conformist” behavior where the optimizer spends too much time near the default before eventually escaping.

Overall, the study concludes that library default hyperparameters should be viewed as deployment shortcuts rather than as informative priors for hyperparameter search. Practitioners seeking to accelerate BO should instead rely on data‑driven priors—such as transfer learning from previous optimization runs, meta‑models that predict promising configurations, or domain‑expert knowledge—rather than assuming that the defaults encode useful directional information. The findings discourage the uncritical use of defaults as an initialization heuristic and reinforce the importance of principled, uncertainty‑aware exploration from the very first BO iteration.

Default Machine Learning Hyperparameters Do Not Provide Informative Initialization for Bayesian Optimization

💡 Research Summary

Comments & Academic Discussion

Leave a Comment