Learning Decoders

So far, we’ve built neural networks by prescribing connection weights. We decide what computation we want, and then choose decoders to implement the required transformation.

Most brains don’t have an oracle to solve for connection weights. It turns out all we need is an error signal.

How can that error be used to update the connection weights?

Let’s start with a simple model:

Consider the decoding error:

ϵ = ∣∣ y - f (x) ∣ ∣^{2} = ∣∣ A D - f (x) ∣ ∣^{2}

$A = G (J (x))$

To minimize this error iteratively, we can use gradient descent.

\frac{\partial ϵ}{\partial D} = 2 A^{T} (A D - f (x))

This is the gradient vector that points “uphill” for the error function, or in the direction of greatest rate of increase. To move to a position ( $D$ ) with lower error with lower error, we move in the direction opposite the gradient vector.

Δ D = - \tilde{κ}^{2} A^{T} (A D - f (x)) = κ A^{T} (f (x) - A D)

More generally:

Δ = κ A^{T} (error)

$κ$ is the learning rate
$A^{T}$ is the pre-synaptic neural activity

/notes/

Recent

Japanese Denim Chords

Decoder Model

Encoder Model

Learning Decoders

Graph View

Backlinks