Learning to Choose Branching Rules for Nonconvex MINLPs
Outer-approximation-based branch-and-bound is a common algorithmic framework for solving MINLPs (mixed-integer nonlinear programs) to global optimality, with branching variable selection critically influencing overall performance. In modern global MINLP solvers, it is unclear whether branching on fractional integer variables should be prioritized over spatial branching on variables, potentially continuous, that show constraint violations, with different solvers following different defaults. We address this question using a data-driven approach. Based on a test set of hundreds of heterogeneous public and industrial MINLP instances, we train linear and random forest regression models to predict the relative speedup of the FICO(R) Xpress Global solver when using a branching rule that always prioritizes variables with violated integralities versus a mixed rule, allowing for early spatial branches. We introduce a practical evaluation methodology that measures the effect of the learned model directly in terms of the shifted geometric mean runtime. Using only four features derived from strong branching and the nonlinear structure, our linear regression model achieves an 8-9% reduction in geometric-mean solving time for the Xpress solver, with over 10% improvement on hard instances. We also analyze a random regression forest model. Experiments across solver versions show that a model trained on Xpress 9.6 still yields significant improvements on Xpress 9.8 without retraining. Our results demonstrate how regression models can successfully guide the branching-rule selection and improve the performance of a state-of-the-art commercial MINLP solver.
💡 Research Summary
The paper investigates a fundamental algorithmic decision in global mixed‑integer nonlinear programming (MINLP): whether to prioritize branching on fractional integer variables (“PreferInt”) or to allow early spatial branching on continuous variables that exhibit constraint violations (“Mixed”). While modern commercial MINLP solvers such as FICO® Xpress Global implement both strategies, they typically fix a default rule and do not adapt the choice to the specific problem instance. The authors address this gap with a data‑driven approach that treats the selection as a regression problem, predicting the relative speed‑up (or slow‑down) factor obtained when using PreferInt instead of Mixed.
A benchmark of 683 heterogeneous public and industrial MINLP instances was assembled. For each instance the authors ran Xpress 9.6 twice—once with PreferInt and once with Mixed—recording runtimes and extracting a rich set of 17 features at the root node. These features include strong‑branching statistics (average relative bound change and computational work for integer and spatial strong branching), counts of variables fixed by strong branching, and structural descriptors derived from the directed acyclic graph (DAG) representation of factorable nonlinear expressions (e.g., percentage of variables appearing in the DAG, node‑to‑nonzero ratios, percentages of integer and unbounded variables in the DAG, and the proportion of quadratic operators). Additional problem‑level statistics such as the fraction of integer variables, equality constraints, quadratic elements, and nonlinear constraints are also captured.
The learning task is to predict the speed‑up factor of PreferInt relative to Mixed. By framing it as regression, the model directly captures the magnitude of performance differences, focusing learning effort on instances where the choice matters most. Two regression families are explored: a linear regression model and a random‑forest regressor. To assess robustness, each model type is trained and evaluated across 100 random seeds, with an 80/20 train‑test split. Performance metrics comprise overall prediction accuracy, accuracy on “LargeLabel” instances (those with a speed‑up factor greater than four), and the shifted geometric mean runtime (sgm_runtime), a standard aggregation used in optimization benchmarking. The shift of 10 seconds mitigates the impact of very short runs.
Feature importance analysis reveals that the average relative bound change from spatial strong branching (AvgRelBndChngSBLPSpat) is the most predictive variable for both model families, underscoring the relevance of how much spatial branching can tighten the outer‑approximation relaxation. The fraction of integer variables (%IntVars) and the proportion of nonlinear constraints (%NonlinCons) also rank highly, indicating that problem structure strongly influences the optimal branching policy.
A systematic feature‑reduction experiment shows that performance remains stable as long as at least four of the most informative features are retained. Using only the top four features for each model type (for the linear model: AvgRelBndChngSBLPSpat, %IntVars, %NonlinCons, %VarsDAGInt; for the random forest: AvgRelBndChngSBLPSpat, AvgCoeffSpreadConvCuts, %IntVars, #NonlinViols) yields an overall accuracy of about 84 % (over 90 % on LargeLabel instances) and an sgm_runtime of roughly 0.91 on the test set. This corresponds to an 8–9 % reduction in the shifted geometric mean runtime across the full benchmark and more than a 10 % improvement on the hardest instances.
Importantly, the model trained on Xpress 9.6 continues to deliver significant speed‑ups when applied unchanged to Xpress 9.8, demonstrating that the learned policy is robust to solver version updates and does not require retraining for each new release.
In summary, the study shows that a lightweight regression model, built from a handful of strong‑branching and structural features, can effectively guide the choice between integer‑first and mixed branching strategies in a state‑of‑the‑art commercial MINLP solver. The approach yields measurable runtime reductions without altering the underlying solver code beyond a simple rule‑selection hook, and it generalizes across solver versions. The work contributes a practical blueprint for integrating machine‑learning‑based algorithm selection into global optimization software, and it highlights the value of strong‑branching information and problem‑structure descriptors for predictive performance modeling in nonconvex MINLP contexts.
Comments & Academic Discussion
Loading comments...
Leave a Comment