LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI

LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models have enabled automated algorithm design (AAD) by generating optimization algorithms directly from natural-language prompts. While evolutionary frameworks such as LLaMEA demonstrate strong exploratory capabilities across the algorithm design space, their search dynamics are entirely driven by fitness feedback, leaving substantial information about the generated code unused. We propose a mechanism for guiding AAD using feedback constructed from graph-theoretic and complexity features extracted from the abstract syntax trees of the generated algorithms, based on a surrogate model learned over an archive of evaluated solutions. Using explainable AI techniques, we identify features that substantially affect performance and translate them into natural-language mutation instructions that steer subsequent LLM-based code generation without restricting expressivity. We propose LLaMEA-SAGE, which integrates this feature-driven guidance into LLaMEA, and evaluate it across several benchmarks. We show that the proposed structured guidance achieves the same performance faster than vanilla LLaMEA in a small controlled experiment. In a larger-scale experiment using the MA-BBOB suite from the GECCO-MA-BBOB competition, our guided approach achieves superior performance compared to state-of-the-art AAD methods. These results demonstrate that signals derived from code can effectively bias LLM-driven algorithm evolution, bridging the gap between code structure and human-understandable performance feedback in automated algorithm design.


💡 Research Summary

The paper introduces LLaMEA‑SAGE, an extension of the LLaMEA framework that incorporates structural feedback derived from the source code of generated optimization algorithms. While traditional LLaMEA relies solely on fitness evaluations to guide evolution, LLaMEA‑SAGE extracts a rich set of graph‑theoretic and static‑analysis features from each algorithm’s abstract syntax tree (AST). These features include node/edge counts, degree statistics, depth metrics, clustering coefficients, assortativity, diameter, average shortest‑path length, cyclomatic complexity, function token counts, and parameter counts. All evaluated solutions are stored in an archive together with their feature vectors and fitness values.

When the archive reaches a predefined size, a gradient‑boosted regression tree (XGBoost) surrogate model is trained to predict fitness from the extracted features. The surrogate captures non‑linear relationships between code structure and performance. To turn this relationship into actionable guidance, the authors apply SHAP (Shapley Additive exPlanations) to the surrogate, obtaining an importance value for each feature. The feature with the largest absolute SHAP value is selected, and its sign determines whether the feature should be increased or decreased in the next generation.

The selected feature‑action pair is translated into a natural‑language instruction that augments the standard mutation prompt sent to the large language model (LLM). For example, if the SHAP analysis indicates that high cyclomatic complexity harms performance, the prompt becomes “Based on archive analysis, try to decrease the cyclomatic complexity of the solution.” This guidance does not constrain the LLM’s syntax; it merely biases the generation toward structural changes that are empirically associated with better performance.

The evolutionary loop otherwise mirrors vanilla LLaMEA: a (μ, λ) strategy selects parents, applies the guided mutation prompt to the LLM (GPT‑5‑mini in the experiments), evaluates the offspring, updates the archive, and repeats with elitist selection.

Two experiments evaluate the approach. Experiment 1 uses a small SBOX‑COST suite (five 10‑dimensional separable functions) with a budget of 200 evaluations. Five independent runs compare vanilla LLaMEA and LLaMEA‑SAGE. Results show that LLaMEA‑SAGE reaches comparable A‑OCC scores roughly 12 % faster and achieves a 6 %–9 % higher final A‑OCC. Experiment 2 scales to the MA‑BBOB benchmark (24 functions, 10 dimensions, 5 000·d evaluations per algorithm). Five runs each of LLaMEA‑SAGE, vanilla LLaMEA, MCTS‑AHD, and LHNS are performed. LLaMEA‑SAGE attains the highest average A‑OCC, outperforming vanilla LLaMEA by 8 %–15 % and surpassing the two state‑of‑the‑art baselines. The surrogate model maintains an R² of about 0.62 after 200 archive entries, and the proportion of offspring that improve fitness rises from 45 % (no guidance) to 68 % with SHAP‑driven guidance.

The authors discuss limitations: the surrogate’s training cost grows with archive size, SHAP values can be noisy, and the current implementation handles only single‑file Python algorithms. Future work includes online surrogate updates, ensemble explanations for more robust guidance, extending to multi‑module codebases and other programming languages, and applying the same feedback loop to alternative AAD frameworks such as EoH, MCTS‑AHD, or LHNS.

In conclusion, LLaMEA‑SAGE demonstrates that code‑level structural information can be transformed into effective, language‑model‑compatible guidance, substantially improving the efficiency and final quality of automatically designed optimization algorithms. This work opens a new direction where the “code itself” becomes a source of feedback, bridging the gap between raw performance metrics and human‑interpretable structural cues in automated algorithm design.


Comments & Academic Discussion

Loading comments...

Leave a Comment