Memetic Algorithms

Memetic algorithms combine global evolutionary search with local improvement.

Essentially, the idea is that evolutionary algorithms are more random than necessary near an optimum, as the mutation/recombination may be too coarse. On the other hand, local search methods and gradient descent are good at this type of thing but might get stuck in local optima.

For a differential problem, the simple abstract form of memetic algorithms is:

x^{'} = x + ϵ - η \nabla f (x)

where $ϵ$ provides stochastic exploration and $- η \nabla f (x)$ provides local descent.

In algorithmic form:

Formally, if $E$ denotes the evolutionary variation operator and $L$ the local search operator, a pure evolutionary algorithm would be:

P^{(t + 1)} = R (E (P^{(t)}))

A memetic algorithm inserts the local search:

P^{(t + 1)} = R (L (E (P^{(t)})))

We can also think of EA as providing guidance to find basins to enter, and local search as determining where inside that basin is the lowest point.

Analysis

Consider the simplified memetic update:

x_{t + 1} = x_{t} + ϵ_{t} - η \nabla f (x_{t})

The candidate evolves candidate solutions $x_{t}$ , but convergence should be analyzed relative to an optimum $x^{*}$ . Define the error state:

e_{t} = x_{t} - x^{*}

Then:

e_{t + 1} = x_{t + 1} - x^{*} = e_{t} + ϵ_{t} - η \nabla f (x_{t})

So, for convergence we want:

e_{t} \to 0 ⟺ x_{t} \to x^{*}

Near a smooth local optimum $x^{*}$ , the gradient can be linearized:

\nabla f (x_{t}) \approx H (x_{t} - x^{*}) = H e_{t}

where $H$ is the Hessian at $x^{*}$ .

So the local dynamics become:

e_{t + 1} = (1 - ηH) e_{t} + ϵ_{t}

We can either do a deterministic contraction to go toward the optimum:

(I - ηH) e_{t}

or do exploratory forcing $ϵ_{t}$ to re-inject diversity and allow basin transitions. This is the local dynamical interpretation of memetic search.

Quadratic Example and Analysis

Thus, we can see that the parameters $σ$ and $η$ define the search regime: $σ$ sets the noise level (exploration), and $η$ controls contraction (refinement).

Large $σ$ , small $η$ gives highly exploratory behavior, with weak refinement. This results in large steady-state variance.
Small $σ$ , moderate $η$ gives stronger local exploitation. This gives tighter concentration, with variance limited below by $σ^{2}$ .
If we set $η$ to be too large, we can get unstable behavior.

Scheduling

Naively, we can apply local search every $k$ generations:

L_{k} (x) = {L (x), x, t mod k = 0 otherwise

Frequent local search gives strong exploitation but is expensive. Sparser scheduling means that more exploration is retained, with lower cost, but with the trade-off of slower local improvement.

Adaptive schedules

A fixed local-search policy might be suboptimal throughout the run. We can adapt the memetic intensity over time, such that early generations emphasize exploration, and later generations increase local refinement. Examples:

Increase local-search depth as diversity decreases
Decrease memetic intensity when the population starts collapsing too quickly

Selection

What individuals should receive local search?

Let $S \subseteq P$ denote the selected subset of individuals that receive local search.

If we want this to be all offspring, we set $S = K$ . This gives the strongest exploitation but also has the strongest collapse risk.
If we want elites only, we do $S = top-k$ . This concentrates effort on promising regions.
We can also do a random subset $S \sim sample (P)$ . This spreads improvement more broadly.

Depth

If local search is applied repeatedly, we can write:

x^{(d)} = L^{d} (x)

where $d$ is the local-search depth.

If we have $d = 1$ , that’s just one corrective local step. Moderate $d$ gives more meaningful refinement, while very large $d$ means near-full local convergence.

But deeper local search means high cost:

cost = O (d \cdot local-eval cost)

Thus, depth controls a trade-off between better local precision and greater computational burden.

Lamarckian vs. Baldwinian

There are two conceptually different ways that local search can be integrated into evolution.

In Lamarckian learning, the improved solution directly replaces the genotype.

x \leftarrow L (x)

This transfers acquired improvement directly.

In Baldwinian learning, the genotype is unchanged but its evaluated quality refletcs local improvement:

f^{'} (x) = f (L (x))

This rewards genotypes that can improve well.

Applications/Examples

One-Dimensional Minimization

Combinatorial Example: TSP

Neural Networks with Memetic Search

Cost

Let $N$ be the population size, $d$ be the local search depth, and $c_{f}$ be the cost of one fitness evaluation. A rough per-generation model is:

O (N_{C_{f}} + ∣ S ∣ d c_{f})

where $S$ is the subset receiving local search.

This highlights a practical fact: Memetic algorithms often use fewer generations, but each generation is more expensive. So, they are attractive when better solutions justify more effort per candidate

/notes/

Recent

Semantic Segmentation

YOLO

VGG

Memetic Algorithms

Analysis

Scheduling

Adaptive schedules

Selection

Depth

Lamarckian vs. Baldwinian

Applications/Examples

Cost

Graph View

Table of Contents

Backlinks