CMA-ES with Two-Point Step-Size Adaptation
📝 Abstract
We combine a refined version of two-point step-size adaptation with the covariance matrix adaptation evolution strategy (CMA-ES). Additionally, we suggest polished formulae for the learning rate of the covariance matrix and the recombination weights. In contrast to cumulative step-size adaptation or to the 1/5-th success rule, the refined two-point adaptation (TPA) does not rely on any internal model of optimality. In contrast to conventional self-adaptation, the TPA will achieve a better target step-size in particular with large populations. The disadvantage of TPA is that it relies on two additional objective function
💡 Analysis
We combine a refined version of two-point step-size adaptation with the covariance matrix adaptation evolution strategy (CMA-ES). Additionally, we suggest polished formulae for the learning rate of the covariance matrix and the recombination weights. In contrast to cumulative step-size adaptation or to the 1/5-th success rule, the refined two-point adaptation (TPA) does not rely on any internal model of optimality. In contrast to conventional self-adaptation, the TPA will achieve a better target step-size in particular with large populations. The disadvantage of TPA is that it relies on two additional objective function
📄 Content
arXiv:0805.0231v4 [cs.NE] 18 May 2008 apport
de recherche
SN 0249-6399
ISRN INRIA/RR–6527–FR+ENG
Thème COG
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
CMA-ES with Two-Point Step-Size Adaptation
Nikolaus Hansen
N° 6527
May 2008
Centre de recherche INRIA Saclay – Île-de-France
Parc Orsay Université
4, rue Jacques Monod, 91893 ORSAY Cedex
Téléphone : +33 1 72 92 59 00
CMA-ES with Two-Point Step-Size Adaptation
Nikolaus Hansen∗
Theme COG — Systemes cognitifs
´Equipes-Projets Adaptive Combinatorial Search et TAO
Rapport de recherche n° 6527 — May 2008 — 9 pages
Abstract: We combine a refined version of two-point step-size adaptation with
the covariance matrix adaptation evolution strategy (CMA-ES). Additionally,
we suggest polished formulae for the learning rate of the covariance matrix and
the recombination weights. In contrast to cumulative step-size adaptation or to
the 1/5-th success rule, the refined two-point adaptation (TPA) does not rely on
any internal model of optimality. In contrast to conventional self-adaptation, the
TPA will achieve a better target step-size in particular with large populations.
The disadvantage of TPA is that it relies on two additional objective function
evaluations.
Key-words:
optimization, evolutionary algorithms, covariance matrix adap-
tation, step-size control, self-adaptation, two-point adaptation
∗Adaptive Combinatorial Search Team, Microsoft Research–INRIA Joint Centre. 28, rue
Jean Rostand, 91893 Orsay Cedex, France. email:forename.name@inria.fr.
Adaptation du Pas Deux-Point dans CMA-ES
R´esum´e : Pas de r´esum´e
Mots-cl´es :
Pas de motclef
CMA-ES with Two-Point Step-Size Adaptation
3
1
Introduction
In the Covariance Matrix Evolution Strategy (CMA-ES) [8] two separate adap-
tation mechanism are performed to determine variances and covariances of the
search distribution. One for (overall) step-size control, a second for adapta-
tion of a covariance matrix. The mechanisms are mainly independent and can
therefore, in principle, be replaced separately. While the standard step-size con-
trol is cumulative step-size adaptation (CSA), also a success-based control was
successfully introduced for the (1+λ)-CMA-ES in [9].
The CSA has a few drawbacks.
For very large noise levels the target step-size becomes zero, while the
optimal step-size is still positive [3].
For large population sizes (λ > 10 n) the original parameter setting seemed
not to work properly [6]—the notion of tracking a (long) path history
seems not to perfectly mate with a large population size (large compared
to the search space dimension). An improved parameter setting introduced
in [5] shortens the backward time horizon for the cumulation and performs
well also with large population sizes [5, 2].
The expected size for the displacement of the population mean under ran-
dom selection is required. To compute a useful measurement independent
of the coordinate system, the principle axes of the search distribution are
needed. They are more expensive to acquire (at least by a constant fac-
tor) than a simple matrix decomposition that is in any case necessary to
sample a multivariate normal distribution with given covariance matrix.
Because the length of an evolution path is compared to its expected length,
the measurement is sensitive to the specific sample procedure of new can-
didate solutions and also, for example, to repair mechanisms for solutions.
Despite these disadvantages, CSA is regarded as first choice for step-size con-
trol in the (µ/µw, λ)-ES, due to its advantages. Nonetheless, the disadvantages
rise motivation to search for alternatives. Here, we suggest two-point step-size
adaptation (TSA) as one such alternative.
Two-point self-adaptation was introduced for backpropagation in [11] and
later applied in Evolutionary Gradient Search [10].
In evolutionary search,
two-point adaptation resembles self-adaptation on the population level. The
principle is utmost simple: two different step lengths are tested for the mean
displacement and the better one is chosen. In the next section, we integrate a
slightly refined TSA in the CMA-ES and additionally introduce polished for-
mulae for the recombination weights and the learning rates of the covariance
matrix.
RR n° 6527
4
Nikolaus Hansen
2
The Algorithm: CMA-ES with TPA
Our description of the CMA-ES closely follows [4, 5, 7] and replaces CSA with
TSA. Given an initial mean value m ∈Rn, the initial covariance matrix C = I
and the initial step-size σ ∈R+, the new candidate solutions xk obey
xk = m + σ yk,
for k = 1, . . . , λ ,
(1)
where yk ∼N (0, C) denotes the realization of a normally distributed random
vector with zero mean and covariance matrix C. The solutions xk are evaluated
and ranked such that xi:λ becomes the i-th best solution vector and yi:λ the
corresponding random vector realization.
For µ < λ let
⟨y⟩=
µ
X
i=1
wiyi:λ,
w1 ≥· · · ≥wµ > 0,
µ
X
i=1
wi = 1
(2)
be the weighted mean of the µ best ranked yk vectors.
The reco
This content is AI-processed based on ArXiv data.