CMA-ES with Two-Point Step-Size Adaptation

Reading time: 5 minute
...

📝 Abstract

We combine a refined version of two-point step-size adaptation with the covariance matrix adaptation evolution strategy (CMA-ES). Additionally, we suggest polished formulae for the learning rate of the covariance matrix and the recombination weights. In contrast to cumulative step-size adaptation or to the 1/5-th success rule, the refined two-point adaptation (TPA) does not rely on any internal model of optimality. In contrast to conventional self-adaptation, the TPA will achieve a better target step-size in particular with large populations. The disadvantage of TPA is that it relies on two additional objective function

💡 Analysis

We combine a refined version of two-point step-size adaptation with the covariance matrix adaptation evolution strategy (CMA-ES). Additionally, we suggest polished formulae for the learning rate of the covariance matrix and the recombination weights. In contrast to cumulative step-size adaptation or to the 1/5-th success rule, the refined two-point adaptation (TPA) does not rely on any internal model of optimality. In contrast to conventional self-adaptation, the TPA will achieve a better target step-size in particular with large populations. The disadvantage of TPA is that it relies on two additional objective function

📄 Content

arXiv:0805.0231v4 [cs.NE] 18 May 2008 apport

de recherche SN 0249-6399 ISRN INRIA/RR–6527–FR+ENG Thème COG INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE CMA-ES with Two-Point Step-Size Adaptation Nikolaus Hansen N° 6527 May 2008 Centre de recherche INRIA Saclay – Île-de-France Parc Orsay Université 4, rue Jacques Monod, 91893 ORSAY Cedex Téléphone : +33 1 72 92 59 00 CMA-ES with Two-Point Step-Size Adaptation Nikolaus Hansen∗ Theme COG — Systemes cognitifs ´Equipes-Projets Adaptive Combinatorial Search et TAO Rapport de recherche n° 6527 — May 2008 — 9 pages Abstract: We combine a refined version of two-point step-size adaptation with the covariance matrix adaptation evolution strategy (CMA-ES). Additionally, we suggest polished formulae for the learning rate of the covariance matrix and the recombination weights. In contrast to cumulative step-size adaptation or to the 1/5-th success rule, the refined two-point adaptation (TPA) does not rely on any internal model of optimality. In contrast to conventional self-adaptation, the TPA will achieve a better target step-size in particular with large populations. The disadvantage of TPA is that it relies on two additional objective function evaluations. Key-words: optimization, evolutionary algorithms, covariance matrix adap- tation, step-size control, self-adaptation, two-point adaptation ∗Adaptive Combinatorial Search Team, Microsoft Research–INRIA Joint Centre. 28, rue Jean Rostand, 91893 Orsay Cedex, France. email:forename.name@inria.fr. Adaptation du Pas Deux-Point dans CMA-ES R´esum´e : Pas de r´esum´e Mots-cl´es : Pas de motclef CMA-ES with Two-Point Step-Size Adaptation 3 1 Introduction In the Covariance Matrix Evolution Strategy (CMA-ES) [8] two separate adap- tation mechanism are performed to determine variances and covariances of the search distribution. One for (overall) step-size control, a second for adapta- tion of a covariance matrix. The mechanisms are mainly independent and can therefore, in principle, be replaced separately. While the standard step-size con- trol is cumulative step-size adaptation (CSA), also a success-based control was successfully introduced for the (1+λ)-CMA-ES in [9]. The CSA has a few drawbacks. ˆ For very large noise levels the target step-size becomes zero, while the optimal step-size is still positive [3]. ˆ For large population sizes (λ > 10 n) the original parameter setting seemed not to work properly [6]—the notion of tracking a (long) path history seems not to perfectly mate with a large population size (large compared to the search space dimension). An improved parameter setting introduced in [5] shortens the backward time horizon for the cumulation and performs well also with large population sizes [5, 2]. ˆ The expected size for the displacement of the population mean under ran- dom selection is required. To compute a useful measurement independent of the coordinate system, the principle axes of the search distribution are needed. They are more expensive to acquire (at least by a constant fac- tor) than a simple matrix decomposition that is in any case necessary to sample a multivariate normal distribution with given covariance matrix. ˆ Because the length of an evolution path is compared to its expected length, the measurement is sensitive to the specific sample procedure of new can- didate solutions and also, for example, to repair mechanisms for solutions. Despite these disadvantages, CSA is regarded as first choice for step-size con- trol in the (µ/µw, λ)-ES, due to its advantages. Nonetheless, the disadvantages rise motivation to search for alternatives. Here, we suggest two-point step-size adaptation (TSA) as one such alternative. Two-point self-adaptation was introduced for backpropagation in [11] and later applied in Evolutionary Gradient Search [10]. In evolutionary search, two-point adaptation resembles self-adaptation on the population level. The principle is utmost simple: two different step lengths are tested for the mean displacement and the better one is chosen. In the next section, we integrate a slightly refined TSA in the CMA-ES and additionally introduce polished for- mulae for the recombination weights and the learning rates of the covariance matrix. RR n° 6527 4 Nikolaus Hansen 2 The Algorithm: CMA-ES with TPA Our description of the CMA-ES closely follows [4, 5, 7] and replaces CSA with TSA. Given an initial mean value m ∈Rn, the initial covariance matrix C = I and the initial step-size σ ∈R+, the new candidate solutions xk obey xk = m + σ yk, for k = 1, . . . , λ , (1) where yk ∼N (0, C) denotes the realization of a normally distributed random vector with zero mean and covariance matrix C. The solutions xk are evaluated and ranked such that xi:λ becomes the i-th best solution vector and yi:λ the corresponding random vector realization. For µ < λ let ⟨y⟩= µ X i=1 wiyi:λ, w1 ≥· · · ≥wµ > 0, µ X i=1 wi = 1 (2) be the weighted mean of the µ best ranked yk vectors. The reco

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut