Discrete

The maximum entropy configuration can be found by maximizing using a Lagrange multiplier to enforce the normalization constraint on the probabilities, such that .

Thus, we maximize

from which we find that all of the are equal and are given by where is the total number of states . The corresponding value of the entropy is then . This result can also be derived from Jensen’s Inequality. To verify that the stationary point is indeed a maximum, we can evaluate the second derivative of the entropy, which gives

where are the elements of the identity matrix. We see that these values are all negative and, hence, the stationary point is indeed a maximum.

Continuous

We saw that the maximum entropy configuration for discrete distributions corresponds to a uniform distribution of probabilities across the possible states of the variable. For continuous distributions, if we want the maximum to be well-defined, we need to constrain the first and second moments and to preserve the normalization constraint.

Thus, we maximize differential entropy with three constraints:

We can then do constrained optimization with Lagrange Multipliers so that we maximize the following functional with respect to :

We set the derivative of this function to zero giving:

The Lagrange multipliers can be found by back-substitution of this result into the three constraint equations, leading to the result:

and so the distribution that maximizes the differential entropy is the Gaussian.

If we evaluate the differential entropy of the Gaussian, we obtain

Thus, we see again that the entropy increases as the distribution becomes broader (as increases).

This result also shows that the differential entropy, unlike the discrete entropy, can be negative, because for .