| UP (The CMA Evolution Strategy) |
This page provides implementations of the CMA-ES and links to libraries that contain such implementations. All implementations provided in this page follow very closely Hansen (2006). They support small and large population sizes, the latter by implementing the rank-µ-update (Hansen et al, 2003) with non-uniform recombination weights (Hansen and Kern, 2004) and an improved parameter setting for the step-size adaptation (Hansen and Kern, 2004). The default population size is small (Hansen and Ostermeier, 2001). Further (optionally) supported features:
Noise.on='yes').
sigma will be ≤ 2.
The natural encoding of (some of) the parameters can also be "logarithmic". That is, for a parameter that must always be positive, with a ratio between typical upper and lower value larger than 100, we might use 10x instead of x to call the objective function. More specifically, with a typical range of x in [10–4,10–1], we use 10–4×103x/10 with x in [0; 10]. Again, the idea is to have similar sensitivity: this makes sense if we expect a change from 10–4 to 10–3 to a have potentially a similar impact as a change from 10–2 to 10–1. In order to avoid the problem that changes of very small values have too less an impact, an alternative is to choose 10–1 × (x/10)2 ≥ 0. In summary, to map the values [0;10] into [a;b] we have the alternative transformations a + (b-a) × x/10 or a + (b-a) × (x/10)2 ≥ a or a × (b/a)x/10 ≥ 0.
The favorit method for handling boundaries (box-constraints) in connection with CMA-ES is described in A Method for Handling Uncertainty in Evolutionary Optimization...(2009) (pdf), see the prepended Addendum and Section IV B. This kind of method, a penalty that is a quadratic function of the distance to the feasible domain with adaptive penalty weights, is implemented in Matlab and Python codes below. A method to address non-linear constraints is described in Multidisciplinary Optimisation in the Design of Future Space Launchers (2010) (pdf), Section 12.4.1, but not (yet) available in any of the provided source codes.
cmaes_c.tar.gz, cmaes_c.tar provides a pure ANSI C implementation in a fairly object-oriented way, where the iteration step can be explicitly controlled. Also several independent optimization instances can be used at the same time. Therefore, the integration into any existing program in C or C++, should be fairly easy. The option "diagonalCovarianceMatrix" does not provide linear space complexity.
purecmaes.m is a minimalistic implementation running in Matlab and Octave. It is mainly provided for educational purpose: reading, understanding and running basic experiments. It contains roughly 100 lines, with 20 lines reflecting the core implementation of the algorithm. This implementation has only rudimentary termination conditions implemented, under some circumstances it might be numerically inefficient and it should probably not be used for running "real" experiments.
cmaes.m (version 3, revised June 2008, former versions) provides an interface similar to Matlab's optimization routines, e.g. fminsearch and fminunc. The following (optional) variants/extensions are implemented.
cmaesV255.m (version 2.55, July 2007) is the former version which saves the data record history internally. plotcmaes.m and dispcmaes.m are stand alone functions to plot/display data output from cmaes.m before version 3. They can be used to investigate the output data while the optimization is still running.
For Octave the package octave-forge is needed (thanks to Luís Correia for the advise).
fmin(), and the object-oriented ask-and-tell interface via an optimizer class (see Collette et al 2010 and below).
numpy. For this reason it is slow. If the Python module matplotlib is available, plots are provided.
invsqrtC incorrectly. In case using a version previous 1.1 (or unversioned), please replace
invsqrtC[i][j] = sum([B[k][i] * B[j][k] / D[k] for k in iN])
with
invsqrtC[i][j] = sum([B[i][k] * B[j][k] / D[k] for k in iN]) or respectively
self.invsqrtC[i][j] = sum([self.B[k][i] * self.B[j][k] / self.D[k] for k in iN])
with
self.invsqrtC[i][j] = sum([self.B[i][k] * self.B[j][k] / self.D[k] for k in iN]).
cma.fmin() and the class cma.CMAEvolutionStrategy with ask-and-tell interface. This code executes much faster than the minimalistic implementation. An improved boundary handling is implemented and more features that are useful in practice: a variety of termination and verbosity options, variable scaling, variable shading.
ipython -pylab (see also here) provides a superset of the python shell and allows, for example, to access the functionality of matplotlib easily. Two distributions provide an easy installation of all necessary (and many more useful) tools:
sage -ipython -pylab is also likely to be "the faster ipython shell". For general infos see the Sage documentation, for a possibly useful install info see sage-matlab. By June 2011, Sage is based on Python 2.6, but an upgrade to Python 2.7 (likely to be the standard Python version for a few years) should be "soon" available.
Some more useful "scientific Python" documentation for starters can be found at the Mathesaurus.
Integer versus true division: in Python, the "//" operator denotes integer division, while in Python 2.x, unfortunately, the division operator "/" behaves depending on the type of the operants (integer, 1/2 == 0, versus float, 1./2 == 0.5, division). This weird and bug-prone behavior (inherited from C) can be changed: to get the new ("true") division where 1/2 == 0.5 in Python 2.x (it is default in Python 3.x), use python -Qnew instead of python. Alternatively, type from __future__ import division at the python or ipython shell. The same can be used as first line in a python script or package. To change ipythons default behavior put "execute exec __IP.compile('from __future__ import division', '<input>', 'single') in __IP.user_ns" into the ~/.ipython/ipythonrc file.
A example output from a run on the 12-dimensional
Rosenbrock function, using plotcmaesdat.m for plotting.
The lower figures show the square root of eigenvalues (left) and of
diagonal elements (right) of the covariance matrix C. The actual sample
distribution has the covariance matrix σ2 C.
def abstract(): raise NotImplementedError class OOOptimizer(object): def __init__(self, xstart, maxiter=1000, **more_kwargs): """take an initial point to start with""" self.xstart = xstart self.xcurrent = xstart[:] # a copy self.maxiter = maxiter # however a general default does not make much sense self.iteration = 0 self.best_solution = BestSolution() # for keeping track def ask(self): """deliver one ore more candidate solution(s)""" abstract() def tell(self, solutions, function_values): """update internal state, prepare for next iteration""" self.iteration += 1 ibest = function_values.index(min(function_values)) self.xcurrent = solutions[ibest] self.best_solution.update(solutions[ibest], function_values[ibest], self.iteration * len(function_values)) # here, more work needs to be done in a derived class def stop(self): """check whether termination is in order, prevent infinite loop""" if self.maxiter and self.iteration > self.maxiter: return {'maxiter':self.maxiter} return {} def result(self): """get best solution, e.g. (x, f, possibly_more)""" return self.best_solution.get() + (self.iteration, self.xcurrent, self) def optimize(self, objectivefct): """find minimizer of objectivefct""" while not self.stop(): X = self.ask() # deliver candidate solutions fitvals = [objectivefct(x) for x in X] self.tell(X, fitvals) # all the work is done here return self.result() class BestSolution(object): def __init__(self, x=None, f=None, evaluation=0): self.x = x self.f = f self.evaluation = evaluation def update(self, x, f, evaluations=None): if self.f is None or f < self.f: self.x, self.f, self.evaluation = x[:], f, evaluations def get(self): return self.x, self.f, self.evaluationAn implementation of an optimization class must (re-)implement at least the first three abstract methods of
OOOptimizer: __init__, ask, tell. Here, only the interface to __init__ might change, depending on the derived class. An exception is pure random search, because it does not depend on the iteration and implements like
class PureRandomSearch(OOOptimizer): def ask(self): """delivers a single candidate solution in ask()[0], distributed with mean xstart and unit variance """ import random return [[random.normalvariate(self.xstart[i], 1) for i in range(len(self.xstart))]]The simplest use case is
res = PureRandomSearch(my_xstart).optimize(my_function)A more typical implementation of an
OOOptimizer subclass rewrites the following methods.
class CMAES(OOOptimizer): def __init__(self, xstart, sigma, popsize=10, maxiter=None): """initial point and sigma are mandatory, default popsize is 10""" ... def ask(self): """delivers popsize candidate solutions""" ... def tell(solutions, function_values): """update multivariate normal sample distribution for next iteration""" ... def stop(self): """check whether termination is in order, prevent infinite loop""" ...A typical use case resembles the
optimize method from above.
objectivefct = ... optim_choice = 1 long_use_case = True if optim_choice == 0: optim = PureRandomSearch(xstart, maxiter=1000) else: optim = barecmaes2.Cmaes(xstart, 1, maxiter=100) if long_use_case: while not optim.stop(): X = optim.ask() # deliver candidate solutions fitvals = [objectivefct(x) for x in X] # or do something more complicated to come up with fitvals # or ask for more solutions optim.tell(X, fitvals) # all the work is done here result = optim.result() else: result = optim.optimize(objectivefct) print resultSee the minimalistic Python code above for a fully working example.
Testing a stochastic algorithm is intricate, because the exact desired outcome is hard to determine and the algorithm
might be "robust" with respect to bugs. The code might run well on a few test functions despite a bug that might later compromise its performance. Additionally, the precise outcome depends on tiny details (as tiny as writing c+b+a instead of a+b+c can give slightly different values. The difference is irrelevant in almost all cases, but makes testing harder).
An effective way of testing is to compare two codes one-to-one, furnished with the same "random" numbers. One can use a simple (random) number generator. The numbers need not to be normal, but should be continuous and symmetric about zero. Tracking the internal state variables for at least two iterations must lead to identical results (apart from numerical effects that are usually in the order of 10-15). Internal state variables are the distribution mean, the step-size and the covariance matrix, and furthermore the two evolution paths.
For comparing outputs, at the minimum the fitness plots should be available. The plots generated in purecmaes.m define a more comprehensive set of reasonable output variables: fitness, sigma, sqrt(eigenvalues(C)), xmean.
The code should be tested on the following objective/fitness functions that can be found, for example, in purecmaes.m, and the results should be compared to trusted output, e.g. the one below ☺ The default test case can be 10-D. Besides the default population size, it is advisable to test also with a much larger population size, e.g. 10 times dimension. I usually use the following functions, more or less in this order. If not stated otherwise, I assume X0=0.1*ones and sigma0=0.1 in 10-D and default parameters.
Ideally, more dimensions should be tested, e.g. also 3 and 30, and different population sizes.
I appreciate any feedback regarding the provided source code and regarding successful and unsuccessful applications or modifications of the CMA-ES.
Mail to hansen AT lri.fr .