Evo 2025 -- Late-Breaking Abstracts Volume

Reading time: 5 minute
...

📝 Abstract

Volume containing the Late-Breaking Abstracts submitted to the Evo* 2025 Conference, held in Trieste (Italy) from April 23rd to 25th. These extended abstracts showcase ongoing research and preliminary findings exploring the application of various Bioinspired Methods (primarily Evolutionary Computation) to a range of problems, many of which address real-world scenarios.

💡 Analysis

Volume containing the Late-Breaking Abstracts submitted to the Evo* 2025 Conference, held in Trieste (Italy) from April 23rd to 25th. These extended abstracts showcase ongoing research and preliminary findings exploring the application of various Bioinspired Methods (primarily Evolutionary Computation) to a range of problems, many of which address real-world scenarios.

📄 Content

Exploratory Landscape Analysis (ELA) [4] is a versatile method used to characterize the fitness landscapes (FLs) of problems comprising a series of procedures designed to map the hypersurfaces created by the fitness values and other key characteristics of problem solutions. Studies have shown that the values of landscape features are impacted by the selection of the sampling strategy [5].

Besides the robust ELA-based feature sets [2], a novel straightforward feature based on the fitness values distribution has been introduced [6]. Each test problem / FL is represented by a normalized histogram of fitness values obtained for a sample set evaluated by its objective function. It showed good results for BBOB single-objective problems characterization and classification [6].

The normalized fitness histogram effectively captures the distribution of sample fitness values; however, it does not account for the relative importance of individual bins. The main aim of this short paper is to investigate several weighting methods to improve the distinctive properties of the fitness histograms, especially based on the TF-IDF statistic [3]. The impact of the weighting methods is evaluated using the standard clustering analysis. The experiments show promising results improving the silhouette score.

This section explains the proposed pipeline and briefly summarizes the related methods. The pipeline can be outlined as follows: 1) Test problems/functions are chosen, 2) Sets of random samples/solutions are generated by selected strategy, 3) Samples are evaluated by test functions, 4) Feature vector is computed for each sample set in the form of a normalized histogram of fitness values, 5) The histograms are weighted using the proposed methods based on relative frequencies, and 6) used for further problem characterization, and clustering.

This section briefly summarizes the test suite and methods used. The well-known 24 BBOB single-objective benchmark problems contained in the COmparing Continuous Optimizers (COCO) platform [1] have been selected. This study utilizes only the first instance of each problem in dimensions d ∈ {5, 10, 20}, and supported domain [-5, 5] d . The problem landscapes are sampled using the Sobol sequence-based sampling (Sobol) having good space-filling properties [5]. The standard Euclidean distance is used for cluster analysis. The compactness of histograms of different BBOB problems is assessed with the silhouette score [6].

Set of points P = {p 1 , . . . , p n } of size n = 2 m for m = 14 is generated in d-dimensional space J d = [0, 1) d , p i ∈ J d using the Sobol samling. Given a fitness function, f : R d → R, and a set P , the set of fitness values is computed as V = {v; ∀p i ∈ P : v = f (p i )}. The set, V , is utilized to compute a histogram of h bins c = {c 1 , . . . , c h } within the range of values [min(V ), max(V )], subject to h j=1 c j = n, where c j is the number of fitness values falling to the j-th bin. The normalized histogram represents a discrete probability distribution of fitness values n = {c 1 /n, . . . , c h /n} for h j=1 n j = 1. The normalized fitness histogram accurately represents the distribution of the sample fitness values, but it does not reflect the significance of bins. Multiple fitness landscapes can have a partially similar distribution of fitness values that on average can be numerous, and therefore, has a strong impact on the distance measures. In text mining, the TF-IDF statistic [3] is commonly used for weight calculation to determine the importance of terms within a set of documents consisting of Term Frequency (TF -term within a document) and Inverse Document Frequency (IDF -term across the set of documents).

In our case, a histogram represents a distribution of decimal values, and most of the bins contain some samples. The IDF based on the raw occurrence in the histogram cannot be used. This paper proposes a solution based on the relative frequency in the bins. Given a problem g ∈ G, where G is a set of all problems, and the corresponding set of normalized histograms N = {n 0 , . . . , n |G| }, the TF can be simply reformulated as tf H (j, g) = n g,j = c g,j /n,

which represents the relative frequency of the j-th bin in the normalized histogram representing the problem g. The IDF is then formalized as

which determines the weight of the j-th bin of all histograms and the TF-IDF is

The probabilistic inverse document frequency (pIDF) [3] is added here for comparison, and it is defined as

and it can simply replace the idf H in equation 3. The final weighted histogram has to be normalized again as described in the beginning of this section.

The basic question of this paper was if the TF-IDF weighting applied to normalized fitness histograms affects the representativeness of underlying landscapes which is evaluated in terms of clustering. The TF-IDF and the TF-pIDF weighting schemes were tested for the named configurations and the preliminary results can

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut