UP (The CMA Evolution Strategy)

stopsoftwarepatents.eu petition banner

CMA-ES Source Code

This page provides implementations of the CMA-ES and links to libraries that contain such implementations. All implementations provided in this page follow very closely Hansen (2006). They support small and large population sizes, the latter by implementing the rank-µ-update (Hansen et al, 2003) with non-uniform recombination weights (Hansen and Kern, 2004) and an improved parameter setting for the step-size adaptation (Hansen and Kern, 2004). The default population size is small (Hansen and Ostermeier, 2001). Further (optionally) supported features:

  1. All implementations, except for the minimalistic codes for reading, provide an independent restart procedure with increasing population size (Auger and Hansen, 2005) and standardized data output for visualization.
  2. All implementations, except for the minimalistic code for reading, optionally support a "separable/diagonal" initial phase, where the covariance matrix remains diagonal (Ros and Hansen, 2008). The latter permits a faster learning of the diagonal elements and reduces the internal time complexity from quadratic to linear which might be useful for dimensionalities, say, larger than a hundred.
  3. Matlab since version 3.0 and Python since version 0.9.92 support an additional uncertainty handling (Hansen et al, 2009, set option Noise.on='yes').
  4. Matlab since version 3.40.beta and Python since version 0.9.92 implement the idea of active CMA (Jastrebski and Arnold, 2006, see option CMA.active or CMA_active), where a "negative" covariance matrix update explicitly reduces the variance in directions where bad samples were found. The implementations differ and also deviate considerably from the original paper.
  5. Python since version 0.9.92 implements selective mirrored sampling (Auger et al, 2011), where a few of the worst solutions are mirrored.
Minimalistic code for reading and education is provided in Matlab/Octave and Python.

Practical Hints

Encoding of Variables

The specific formulation of a (real) optimization problem has a tremendous impact on the optimization performance. In particular, a reasonable parameter encoding is essential. All parameters should be rescaled such that they have presumably similar sensitivity (this makes the identity as initial covariance matrix the right choice). Probably the best way to do this is to write a wrapper around the objective function that transforms the parameters before the actual function call. The wrapper scales, for example, in each parameter/coordinate the value [0; 10] into the typical actual domain of the parameter/coordinate. To achieve this on-the-fly, a linear scaling of variables is provided in the Scilab and Python codes below. With the proposed transformation, a typical initial sigma will be ≤ 2.

The natural encoding of (some of) the parameters can also be "logarithmic". That is, for a parameter that must always be positive, with a ratio between typical upper and lower value larger than 100, we might use 10x instead of x to call the objective function. More specifically, with a typical range of x in [10–4,10–1], we use 10–4×103x/10 with x in [0; 10]. Again, the idea is to have similar sensitivity: this makes sense if we expect a change from 10–4 to 10–3 to a have potentially a similar impact as a change from 10–2 to 10–1. In order to avoid the problem that changes of very small values have too less an impact, an alternative is to choose 10–1 × (x/10)2 ≥ 0. In summary, to map the values [0;10] into [a;b] we have the alternative transformations a + (b-a) × x/10 or a + (b-a) × (x/10)2a or a × (b/a)x/10 ≥ 0.

Boundary Handling

For a very short and general overview on boundary and constraints handling, in connection with CMA-ES, see the appendix of The CMA Evolution Strategy: A Tutorial (pdf), p.29f. Another simple way to achieve, for example, a lower bound a for a variable x is to use a + x2 instead of x in the objective function call (for |x| > 1/2 one can use a + |x| - 1/4 instead of a + x2), see also above. Variables that consistently end up on the (same) boundary can be removed. The Python and Scilab codes provide the means to (tentatively) remove variables from the optimization.

The favorit method for handling boundaries (box-constraints) in connection with CMA-ES is described in A Method for Handling Uncertainty in Evolutionary Optimization...(2009) (pdf), see the prepended Addendum and Section IV B. This kind of method, a penalty that is a quadratic function of the distance to the feasible domain with adaptive penalty weights, is implemented in Matlab and Python codes below. A method to address non-linear constraints is described in Multidisciplinary Optimisation in the Design of Future Space Launchers (2010) (pdf), Section 12.4.1, but not (yet) available in any of the provided source codes.


Overview


C (ANSI)

cmaes_c.tar.gz, cmaes_c.tar provides a pure ANSI C implementation in a fairly object-oriented way, where the iteration step can be explicitly controlled. Also several independent optimization instances can be used at the same time. Therefore, the integration into any existing program in C or C++, should be fairly easy. The option "diagonalCovarianceMatrix" does not provide linear space complexity.

C++

A few libraries have implemented the CMA-ES in C++:

Fortran

The pCMALib parallel library in FORTRAN 90.

Java

A Java version is provided, where originally the core procedures from the ANSI C version were used: Also the following libraries implement the CMA-ES in Java.

Matlab and Octave

Code for Reading

purecmaes.m is a minimalistic implementation running in Matlab and Octave. It is mainly provided for educational purpose: reading, understanding and running basic experiments. It contains roughly 100 lines, with 20 lines reflecting the core implementation of the algorithm. This implementation has only rudimentary termination conditions implemented, under some circumstances it might be numerically inefficient and it should probably not be used for running "real" experiments.

Production Code

cmaes.m (version 3, revised June 2008, former versions) provides an interface similar to Matlab's optimization routines, e.g. fminsearch and fminunc. The following (optional) variants/extensions are implemented.

The verbosity options have been rectified as Disp Log and Save options. By default (LogModulo=1), output data is written to files (ASCII) and plotcmaesdat.m is a stand alone function to plots these data. The function can be used to investigate the output data from cmaes.m while the optimization is running. Under Octave plotcmaesdat.m works since Version 3, for earlier versions option LogPlot of cmaes.m must be set to zero (package octave-forge is needed in any case).

cmaesV255.m (version 2.55, July 2007) is the former version which saves the data record history internally. plotcmaes.m and dispcmaes.m are stand alone functions to plot/display data output from cmaes.m before version 3. They can be used to investigate the output data while the optimization is still running.

For Octave the package octave-forge is needed (thanks to Luís Correia for the advise).

Python

Two Python codes are available, both support a functional interface, fmin(), and the object-oriented ask-and-tell interface via an optimizer class (see Collette et al 2010 and below).

Code for Reading and Education

A somewhat minimalistic, object oriented code for reading and education. The code has also a minimal number of dependencies, in particular it does not need the python module numpy. For this reason it is slow. If the Python module matplotlib is available, plots are provided.

Production Code

A production code with a functional interface cma.fmin() and the class cma.CMAEvolutionStrategy with ask-and-tell interface. This code executes much faster than the minimalistic implementation. An improved boundary handling is implemented and more features that are useful in practice: a variety of termination and verbosity options, variable scaling, variable shading.

Remarks about Python

As of 2011, Python is available on almost every computer and provides, based on the numpy package, an excellent (and free) environment for scientific computing. In particular the powerful shell  ipython -pylab  (see also here) provides a superset of the python shell and allows, for example, to access the functionality of matplotlib easily. Two distributions provide an easy installation of all necessary (and many more useful) tools:

Some more useful "scientific Python" documentation for starters can be found at the Mathesaurus.

Integer versus true division: in Python, the "//" operator denotes integer division, while in Python 2.x, unfortunately, the division operator "/" behaves depending on the type of the operants (integer, 1/2 == 0, versus float, 1./2 == 0.5, division). This weird and bug-prone behavior (inherited from C) can be changed: to get the new ("true") division where 1/2 == 0.5 in Python 2.x (it is default in Python 3.x), use python -Qnew instead of python. Alternatively, type from __future__ import division at the python or ipython shell. The same can be used as first line in a python script or package. To change ipythons default behavior put "execute exec __IP.compile('from __future__ import division', '<input>', 'single') in __IP.user_ns" into the ~/.ipython/ipythonrc file.

R

CMA-ES in R by Olaf Mersmann.

Scilab

Two interfaces are provided. An object oriented ask-and-tell interface (cma_new etc.), where the loop control remains with the user, and a functional interface (cma_optim).


Example Output

A example output from a run on the 12-dimensional Rosenbrock function, using plotcmaesdat.m for plotting.
example output
The lower figures show the square root of eigenvalues (left) and of diagonal elements (right) of the covariance matrix C. The actual sample distribution has the covariance matrix σ2 C.


Object-Oriented Implementation (under construction)

The principle of the object-oriented ask-and-tell interface for optimization (see Collette et al 2010) is shown in the following abstract class in Python pseudocode (therefore untested).
def abstract():
    raise NotImplementedError

class OOOptimizer(object):
    def __init__(self, xstart, maxiter=1000, **more_kwargs):
    """take an initial point to start with"""
        self.xstart = xstart
        self.xcurrent = xstart[:]  # a copy 
        self.maxiter = maxiter  # however a general default does not make much sense
        self.iteration = 0
        self.best_solution = BestSolution() # for keeping track
    def ask(self):
    """deliver one ore more candidate solution(s)"""
        abstract()
    def tell(self, solutions, function_values):
    """update internal state, prepare for next iteration"""
        self.iteration += 1
        ibest = function_values.index(min(function_values))
        self.xcurrent = solutions[ibest]
        self.best_solution.update(solutions[ibest], function_values[ibest],
                                    self.iteration * len(function_values))
        # here, more work needs to be done in a derived class
    def stop(self):
    """check whether termination is in order, prevent infinite loop"""
        if self.maxiter and self.iteration > self.maxiter:
            return {'maxiter':self.maxiter}
        return {}
    def result(self):
    """get best solution, e.g. (x, f, possibly_more)"""
        return self.best_solution.get() + (self.iteration, self.xcurrent, self)
    def optimize(self, objectivefct):
    """find minimizer of objectivefct"""
        while not self.stop():
            X = self.ask()         # deliver candidate solutions
            fitvals = [objectivefct(x) for x in X]
            self.tell(X, fitvals)  # all the work is done here
        return self.result()

class BestSolution(object):
    def __init__(self, x=None, f=None, evaluation=0):
        self.x = x
        self.f = f
        self.evaluation = evaluation
    def update(self, x, f, evaluations=None):
        if self.f is None or f < self.f:
            self.x, self.f, self.evaluation = x[:], f, evaluations
    def get(self):
        return self.x, self.f, self.evaluation
An implementation of an optimization class must (re-)implement at least the first three abstract methods of OOOptimizer: __init__, ask, tell. Here, only the interface to __init__ might change, depending on the derived class. An exception is pure random search, because it does not depend on the iteration and implements like
class PureRandomSearch(OOOptimizer):
    def ask(self):
        """delivers a single candidate solution in ask()[0], 
           distributed with mean xstart and unit variance
        """
        import random
        return [[random.normalvariate(self.xstart[i], 1) for i in range(len(self.xstart))]]
The simplest use case is
res = PureRandomSearch(my_xstart).optimize(my_function)
A more typical implementation of an OOOptimizer subclass rewrites the following methods.
class CMAES(OOOptimizer):
    def __init__(self, xstart, sigma, popsize=10, maxiter=None):
        """initial point and sigma are mandatory, default popsize is 10"""
        ...
    def ask(self):
        """delivers popsize candidate solutions"""
        ...
    def tell(solutions, function_values):
        """update multivariate normal sample distribution for next iteration"""
        ...
    def stop(self):
        """check whether termination is in order, prevent infinite loop"""
        ...
A typical use case resembles the optimize method from above.
objectivefct = ...
optim_choice = 1
long_use_case = True

if optim_choice == 0:
    optim = PureRandomSearch(xstart, maxiter=1000)
else:
    optim = barecmaes2.Cmaes(xstart, 1, maxiter=100)

if long_use_case:
    while not optim.stop():
        X = optim.ask()         # deliver candidate solutions
        fitvals = [objectivefct(x) for x in X]
        # or do something more complicated to come up with fitvals
        # or ask for more solutions
        optim.tell(X, fitvals)  # all the work is done here
    result = optim.result()
else:
    result = optim.optimize(objectivefct)

print result
See the minimalistic Python code above for a fully working example.

Testing your own source code (under construction)

Testing a stochastic algorithm is intricate, because the exact desired outcome is hard to determine and the algorithm might be "robust" with respect to bugs. The code might run well on a few test functions despite a bug that might later compromise its performance. Additionally, the precise outcome depends on tiny details (as tiny as writing c+b+a instead of a+b+c can give slightly different values. The difference is irrelevant in almost all cases, but makes testing harder).

Comparing to a trusted code

An effective way of testing is to compare two codes one-to-one, furnished with the same "random" numbers. One can use a simple (random) number generator. The numbers need not to be normal, but should be continuous and symmetric about zero. Tracking the internal state variables for at least two iterations must lead to identical results (apart from numerical effects that are usually in the order of 10-15). Internal state variables are the distribution mean, the step-size and the covariance matrix, and furthermore the two evolution paths.

Comparing to trusted output

For comparing outputs, at the minimum the fitness plots should be available. The plots generated in purecmaes.m define a more comprehensive set of reasonable output variables: fitness, sigma, sqrt(eigenvalues(C)), xmean.

The code should be tested on the following objective/fitness functions that can be found, for example, in purecmaes.m, and the results should be compared to trusted output, e.g. the one below ☺ The default test case can be 10-D. Besides the default population size, it is advisable to test also with a much larger population size, e.g. 10 times dimension. I usually use the following functions, more or less in this order. If not stated otherwise, I assume X0=0.1*ones and sigma0=0.1 in 10-D and default parameters.

  1. On a random objective function frand (e.g. f(x) is uniform in [0,1] and independent of its input x). Step-size σ and the eigenvalues of C should be tracked and plotted. log(σ) must conduct a symmetric (unbiased) random walk. The iteration-wise sorted log(eigenvalues) will slowly drift apart, quite symmetrically but with the overall tendency to decrease. Eventually, the condition number of C will exceed numerically feasible limits (say 1e14, that is, axis ratio 1e7), which should be covered in the code as termination criterion.
    Random function

    Output on random function

  2. The sphere function fsphere
  3. The Ellipsoid felli, where it takes between 5000 and 6000 function evaluations to reach function value 1e-8. The eigenvalues (Scaling plot) get a nice regular configuration on this function. Note that both subplots to the left are invariant under any rigid transformation of the search space (including a rotation) if the initial point is transformed accordingly. This makes a rotated Ellipsoid another excellent test case.
    Ellipsoid function

    Output on Ellipsoid function

  4. The Rosenbrock function frosen, the first non-separable test case where the variables are not independent. It takes between 5000 and 6000 function evaluations to reach a function value of 1e-8. With a larger initial step-size sigma0 in about 20% of the trials the local minimum will be discovered with a function value close under four.
    Rosenbrock function

    Output on Rosenbrock function

  5. The multimodal Rastrigin function. Here, X0 = 5*ones (the point 0.1*ones is in the attraction basin of the global optimum), sigma0 = 5 and population size 200. With smaller population sizes it becomes increasingly difficult to find the global optimum, see Hansen & Kern 2004.
    Rastrigin function

    Output on Rastrigin function

  6. The Cigar fcigar.

Ideally, more dimensions should be tested, e.g. also 3 and 30, and different population sizes.

I appreciate any feedback regarding the provided source code and regarding successful and unsuccessful applications or modifications of the CMA-ES.

Mail to hansen AT lri.fr .


last change $Date: 2012-01-17 09:12:17 +0100 (mar. 17 janv. 2012) $

eXTReMe Tracker