GP for Control

Control problems are attractive for GP because the controller is naturally a program. A controller maps state to action:

u_{t} = f (s_{t})

Thus, we can use GP to evolve threshold logic, arithmetic combinations of sensor readings, nonlinear feedback laws, and conditional policies.

Consider the classic cartpole problem:

Failure occurs if $∣ θ_{t} ∣ > θ_{max}$ or $∣ x_{t} ∣ > x_{max}$ . Basically, the pole must stay near upright and within the track.

The state is:

s = (x, \overset{x}{˙}, θ, \dot{θ})

GP evolves a controller of the form:

u = f (x, \overset{x}{˙}, θ, \dot{θ})

We can have the output either be a discrete force ${- 10, + 10}$ , or a continuous form in a bounded interval.

We can then define a survival-based fitness:

F = number of timesteps before failure

or a reward-based fitness:

F r_{t} = t = 0 \sum T r_{t} = 1 - α ∣ θ_{t} ∣ - β ∣ x_{t} ∣

where we are basically rewarding 1 for each step staying alive, with penalties for large angles and displacements. This encourages both stability and control stability.

Control is a hard problem!

/notes/

Recent

Japanese Denim Chords

Decoder Model

Encoder Model

GP for Control

Graph View

Backlinks