Maximum Likelihood Criterion

Under the conditional probabilistic perspective of learning, the model now computes different distribution parameters $θ_{i} = f [x_{i}, ϕ]$ for each training input $x_{i}$ .

Each observed training output $y_{i}$ should have high probability under its corresponding distribution $P r (y_{i} ∣ θ_{i})$ . Hence, we choose the model parameters $ϕ$ so that they maximized the combined probability across all $I$ training samples:

\hat{ϕ} = ϕ argmax [i = 1 \prod I P r (y_{i} ∣ x_{i})] = ϕ argmax [i = 1 \prod I P r (y_{i} ∣ θ_{i})] = ϕ argmax [i = 1 \prod I P r (y_{i} ∣ f [x_{i}, ϕ])]

The combined probability term is the likelihood of the parameters. Thus, the above is known as the maximum likelihood criterion.

A more practical version of the maximum likelihood criterion is the Log-Likelihood Criterion.

/notes/

Recent

DSA Depth First Search

Greedy Descent

Arithmetic Crossover

Maximum Likelihood Criterion

Graph View

Backlinks