Recall that in basic ES, we do Gaussian mutation:
where controls the mutation strength. The distribution of the steps is Gaussian, with variance .
Note that it’s important that we first mutate the step size , and then mutate the solution using the new step size. This means that the new individual is evaluate in directly; the primary evaluation is that we can tell is good if is good, and the second evaluation is that we can tell is good if it produced a good .

We can see that with wider, we get a wider exploration. At different stages of optimization, the search needs different behavior; early on, larger steps helps exploration. Later on when we get close to a good solution, smaller steps help fine-tune.
The 5 rule is a super basic basic version of this:
- If success rate is too high, is too small
- If success rate is too low, is too large
Case 1: Global Step Size
The simplest case uses one single for all variables. The step size mutation is given as
and the solution mutation is given as
The exponential form is used so that always stays positive.
In this case, the covariance is
Every direction has the same variance.
Geometry of Gaussian Mutation
Why can’t we always just use our global step size above?
Consider a problem in a higher dimension , where our our isotropic model becomes :
- is an -dimensional standard normal vector.
- is the identity matrix
- is the global step size
Viewing this as a distribution, we can also say that
This is radially symmetric around , with equal variance in all directions. Thus, there is no preferred search direction, with a spherical sampling cloud.

The expected step length is given as:
However, remember that high-dimensional spaces behave weirdly. Most probability mass lies on a thin shell, . Thus, mutations are rarely small in high .
Ill-conditioned landscapes
Considered an objective like . The level sets are ellipses, with anisotropic curvature. Curvature along is 1000 times steeper than along . Thus, there’s a condition number of .
Mutation samples are spherical, but the objective is anisotropic.
- Too large steps in steep direction () rejected moves
- Too small steps in flat direction () slow progress

Case 2: Uncorrelated Mutation
For the ill-formed landscapes, a single global is too crude, as some variables need larger mutations than others. Thus, we use one step size per coordinate:
where each coordinate mutates its own scale:
with
- (global)
- (per-coordinate)
- Once again, guarantees positivity.
This creates an axis-aligned ellipsoid instead of a sphere.
and are learning rate parameters:
- is the global adaptation strength (shared across coordinates)
- is the coordinate-wise adaptation strength
This dimension-dependent method controls the variance of , preventing unstable step-size explosions in high dimensions. Thus, adaptation speed is comparable across problem sizes, and we can ensure that self-adaptation remains stable as the dimension grows. Larger means smaller learning rates, and prevents unstable covariance, ensuring smooth adaptation of search geometry.
Here, our covariance has become
This is still uncorrelated, because the covariance matrix is diagonal; the mutation cloud is an ellipsoid aligned with the coordinate axes. Note that we can write the update as .
Case 3: Correlated Mutation
Even multiple coordinate-wise step sizes are not enough if the important search directions are rotated relative to the coordinate axes. Thus, we want to generalize the mutation to where is a learned covariance matrix.
Specifically, we use:
where:
- controls axis lengths
- is an orthogonal rotation matrix constructed from angles
Then:
Let’s walk through the full flow.
First, our chromosome is now . We first mutate step sizes:
Then mutate rotation angles:
controls rotation-angle mutation; small angle updates ensure gradual geometry change.
We often also use some constraints like and , which prevent collapse of mutation strength and avoid premature convergence. We also do angle wrapping to ensure numerical stability and uniqueness. We can generalize this into CMA-ES.
Correlated ES Example








