Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multi-objective combinatorial optimization seeks Pareto-optimal solutions over exponentially large discrete spaces, yet existing methods sacrifice generality, scalability, or theoretical guarantees. We reformulate it as an online learning problem over a decomposed decision space, solving position-wise bandit subproblems via adaptive expert-guided sequential construction. This formulation admits regret bounds of $O(d\sqrt{T \log T})$ depending on subproblem dimensionality (d) rather than combinatorial space size. On standard benchmarks, our method achieves 80–98% of specialized solvers performance while achieving two to three orders of magnitude improvement in sample and computational efficiency over Bayesian optimization methods. On real-world hardware-software co-design for AI accelerators with expensive simulations, we outperform competing methods under fixed evaluation budgets. The advantage grows with problem scale and objective count, establishing bandit optimization over decomposed decision spaces as a principled alternative to surrogate modeling or offline training for multi-objective optimization.

💡 Research Summary

The paper introduces Divide & Learn (D&L), a novel framework for multi‑objective combinatorial optimization (MOCO) that treats the problem as an online learning task with full‑bandit feedback. Traditional approaches to MOCO—such as multi‑objective Bayesian optimization, evolutionary heuristics, or neural combinatorial methods—either struggle with the exponential size of discrete search spaces, require costly surrogate modeling, need massive offline training data, or lack theoretical convergence guarantees. D&L overcomes these limitations by decomposing the decision variables into K overlapping sub‑problems, each containing only d ≈ n/K variables, and by solving each sub‑problem with a position‑wise multi‑armed bandit.

At every iteration t, a full solution xₜ is assembled by selecting an action for each variable (position) from a set of “experts” (e.g., UCB, FTRL, EXP3, Thompson Sampling). The experts share global statistics (visit counts, estimated values) and are sampled according to a mixture distribution πₜ that adapts to local uncertainty. After evaluating the scalarized reward rₜ = ϕ(f(xₜ)) + εₜ (where ϕ is any scalarization such as a weighted sum), the observed reward is propagated to all position‑action pairs in the chosen solution, allowing each expert to update its estimates without requiring per‑arm feedback.

Cross‑sub‑problem consistency is enforced through Lagrangian dual variables λ. When sub‑problems share variables, λ penalizes inconsistent assignments, and a simple DualUpdate step performs a gradient‑ascent style adjustment of λ after each iteration. This mechanism preserves the independence of sub‑problem optimization while guaranteeing that the global solution respects the overlapping structure.

The authors prove that, under mild assumptions (decomposability of the variable space, discrete Lipschitz continuity of objectives, bounded coupling across overlapping sub‑problems, and bounded objective ranges), each position‑wise bandit expert enjoys a standard O(√T log T) regret bound. Because there are only d positions per sub‑problem, the total regret scales as O(d √T log T), independent of the exponential size of the original action space |X|. This represents a substantial theoretical improvement over existing combinatorial bandit analyses, which typically depend on log |X| or on semi‑bandit feedback that is unavailable in black‑box MOCO.

Empirically, D&L is evaluated on two fronts: (1) standard MOCO benchmarks with up to 10⁶⁰ configurations (binary, integer, or categorical variables) and multiple objectives; (2) a real‑world hardware‑software co‑design problem for AI accelerators, where each evaluation requires a costly cycle‑accurate simulation. The performance metrics include hypervolume, inverted generational distance, sample efficiency (number of evaluations), and wall‑clock time. Results show that D&L attains 80–98 % of the Pareto front quality achieved by specialized solvers (e.g., MOEA/D, problem‑specific heuristics) while using roughly 90 % fewer evaluations and 2–3 orders of magnitude less computational time than state‑of‑the‑art Bayesian optimization methods. Notably, under fixed evaluation budgets (e.g., 200–500 simulations), D&L quickly identifies high‑utility regions and, thanks to the Lagrangian coordination, avoids conflicting assignments across sub‑problems, leading to steadily improving solutions throughout the run.

Key contributions of the work are:

A fully online MOCO algorithm that requires no offline training or surrogate modeling.
Regret guarantees that depend only on sub‑problem dimensionality, enabling scalability to extremely large combinatorial spaces.
A multi‑expert bandit architecture that leverages heterogeneous exploration strategies while sharing statistics, thus reducing variance and accelerating convergence.
Lagrangian dual coordination for overlapping sub‑problems, preserving global feasibility without sacrificing the decomposition benefits.

The paper also discusses potential extensions, such as adaptive sub‑problem partitioning, non‑linear Lagrangian updates, and alternative scalarizations (e.g., ε‑constraint methods). Overall, Divide & Learn provides a principled, theoretically grounded, and practically efficient alternative to surrogate‑based or offline‑trained methods for multi‑objective combinatorial optimization at scale.

Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

💡 Research Summary

Comments & Academic Discussion

Leave a Comment