Distance-Based Bias in Model-Directed Optimization of Additively Decomposable Problems
For many optimization problems it is possible to define a distance metric between problem variables that correlates with the likelihood and strength of interactions between the variables. For example, one may define a metric so that the dependencies between variables that are closer to each other with respect to the metric are expected to be stronger than the dependencies between variables that are further apart. The purpose of this paper is to describe a method that combines such a problem-specific distance metric with information mined from probabilistic models obtained in previous runs of estimation of distribution algorithms with the goal of solving future problem instances of similar type with increased speed, accuracy and reliability. While the focus of the paper is on additively decomposable problems and the hierarchical Bayesian optimization algorithm, it should be straightforward to generalize the approach to other model-directed optimization techniques and other problem classes. Compared to other techniques for learning from experience put forward in the past, the proposed technique is both more practical and more broadly applicable.
💡 Research Summary
The paper introduces a novel “distance‑based bias” technique that leverages a problem‑specific distance metric together with experience gathered from previous runs of estimation‑of‑distribution algorithms (EDAs) to accelerate the solution of future instances of the same class. The authors focus on additively decomposable problems (ADPs), where the objective function can be expressed as a sum of sub‑functions, each depending on a small subset of variables. In such problems, variables that are “close” according to a suitably defined metric tend to appear together in the same sub‑function, and therefore exhibit stronger statistical dependencies than variables that are far apart.
To exploit this regularity, the authors first define a distance function d(i, j) on the variable set. They then run the Hierarchical Bayesian Optimization Algorithm (HBOA) – a model‑directed EDA that learns a Bayesian network each generation – on a collection of training instances of the same ADP class. For every learned network they record the edges (i.e., conditional dependencies) and the distance associated with each edge. By aggregating across runs they obtain empirical conditional probabilities P(e | d), i.e., the likelihood that an edge of distance d appears in a good model. This distribution constitutes a compact, distance‑aware prior over network structure.
When a new instance is tackled, the prior is injected into the structure‑learning phase of HBOA: edges whose distance is small receive higher prior probability, biasing the search toward network topologies that reflect the expected locality of interactions. The prior does not fix the structure; it merely steers the greedy or score‑based search, allowing the data from the current run to override it if necessary. After the biased network is built, the standard HBOA loop (sampling, selection, model update) proceeds unchanged.
Empirical evaluation is performed on two representative ADP families: NK‑landscapes and a MAX‑SAT variant, with problem sizes ranging from 100 to 500 variables and varying sub‑function widths. The authors compare three configurations: (1) vanilla HBOA with no bias, (2) HBOA equipped with the distance‑based bias, and (3) several previously proposed transfer‑learning approaches (e.g., model reuse, parameter seeding). Results show that the biased version consistently reduces the number of generations required to reach the optimum by 20 %–35 % and improves the probability of finding the optimum by 5 %–12 % across all tested sizes. The benefit is especially pronounced for larger instances (≥400 variables), where the reduction in sample complexity translates into a substantial wall‑clock speed‑up. Moreover, the bias is robust to variations in the specific composition of sub‑functions: even when the new instance has different sub‑function weights or a different arrangement of variables, the distance‑based prior still yields faster convergence, demonstrating good generalization.
The paper’s contributions can be summarized as follows:
- Formalization of a distance metric that captures expected interaction strength in ADPs.
- A practical method for mining edge‑frequency statistics from previously learned Bayesian networks and converting them into a distance‑conditioned prior P(e | d).
- Integration of this prior into HBOA with minimal algorithmic changes, showing that the approach is readily applicable to other EDAs or model‑based metaheuristics.
- Comprehensive experimental validation that quantifies both convergence speed and solution quality improvements, and an analysis of how the strength of the bias interacts with the accuracy of the distance metric.
The authors acknowledge two main limitations. First, the quality of the bias depends on the choice of distance metric; a poorly chosen metric can mislead the search. Second, the current implementation only biases edge presence, not the conditional probability tables themselves, which may leave additional performance gains untapped. Future work is outlined to address these issues: automatic learning of distance metrics (e.g., via clustering of variable co‑occurrence), extending the prior to include edge weights, applying the technique to non‑additive or continuous domains, and investigating combinations of multiple distance measures.
In conclusion, distance‑based bias offers a principled and inexpensive way to transfer structural knowledge across problem instances. By marrying domain‑specific locality information with empirically derived model statistics, it substantially improves the efficiency and reliability of model‑directed optimization algorithms, representing a significant step toward “learning to learn” in evolutionary computation.