Maximum Entropy

Discrete

The maximum entropy configuration can be found by maximizing $H$ using a Lagrange multiplier to enforce the normalization constraint on the probabilities, such that $\sum_{i = 1}^{n} p_{i} = 1$ .

Thus, we maximize

\tilde{H} = - i \sum p (x_{i}) ln p (x_{i}) + λ (i \sum p (x_{i}) - 1)

from which we find that all of the $p (x_{i})$ are equal and are given by $p (x_{i}) = 1/ M$ where $M$ is the total number of states $x_{i}$ . The corresponding value of the entropy is then $H = ln M$ . This result can also be derived from Jensen’s Inequality. To verify that the stationary point is indeed a maximum, we can evaluate the second derivative of the entropy, which gives

\frac{\partial H ~}{\partial p ( x _{i} ) \partial p ( x _{j} )} = - I_{ij} \frac{1}{p _{i}}

where $I_{ij}$ are the elements of the identity matrix. We see that these values are all negative and, hence, the stationary point is indeed a maximum.

Continuous

We saw that the maximum entropy configuration for discrete distributions corresponds to a uniform distribution of probabilities across the possible states of the variable. For continuous distributions, if we want the maximum to be well-defined, we need to constrain the first and second moments $p (x)$ and to preserve the normalization constraint.

Thus, we maximize differential entropy with three constraints:

\int_{- \infty}^{\infty} p (x) d x \int_{- \infty}^{\infty} x p (x) d x \int_{- \infty}^{\infty} (x - μ)^{2} p (x) d x = 1 = μ = σ^{2}

We can then do constrained optimization with Lagrange Multipliers so that we maximize the following functional with respect to $p (x)$ :

- \int_{- \infty}^{\infty} p (x) ln p (x) d x + λ_{1} (\int_{- \infty}^{\infty} p (x) d x - 1) + λ_{2} (\int_{- \infty}^{\infty} x p (x) d x - μ) + λ_{3} (\int_{- \infty}^{\infty} (x - μ)^{2} p (x) d x - σ^{2})

We set the derivative of this function to zero giving:

p (x) = exp {- 1 + λ_{1} + λ_{2} x + λ_{3} (x - μ)^{2}}

The Lagrange multipliers can be found by back-substitution of this result into the three constraint equations, leading to the result:

p (x) = \frac{1}{( 2 π σ ^{2} ) ^{1} 2} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}})

and so the distribution that maximizes the differential entropy is the Gaussian.

If we evaluate the differential entropy of the Gaussian, we obtain

H [x] = \frac{1}{2} {1 + ln (2 π σ^{2})}

Thus, we see again that the entropy increases as the distribution becomes broader (as $σ^{2}$ increases).

This result also shows that the differential entropy, unlike the discrete entropy, can be negative, because $H (x) < 0$ for $σ^{2} < 1/ (2 π e)$ .

/notes/

Recent

Nesterov Accelerated Momentum

Model Optimization

Gradient Descent for Non-convex Gabor Model

Maximum Entropy

Discrete

Continuous

Graph View

Table of Contents

Backlinks