Kullback-Leibler Divergence, or relative entropy, measures the dissimilarity between two distributions.
Consider some unknown distribution , approximated by a distribution .
If we use to construct a coding scheme for transmitting values of to a receiver, then the average additional amount of information (in nats) required to specify the the value of as a result of using instead of the true distribution is given by:
This is known as the relative entropy or Kullback-Leibler divergence between the distributions and .
The KL divergence is not a symmetrical quantity; that is, .
We can apply the continuous form of Jensen’s Inequality to KL divergence to give
- Here, as a convex function so Jensen’s inequality applies.
- We’re also using the normalization condition .
Since is actually strictly convex, the equality will hold if and only if for all . Thus, we can interpret the KL divergence as a measure of the dissimilarity between and .