Kullback-Leibler Divergence, or relative entropy, measures the dissimilarity between two distributions.

Consider some unknown distribution , approximated by a distribution .

If we use to construct a coding scheme for transmitting values of to a receiver, then the average additional amount of information (in nats) required to specify the the value of as a result of using instead of the true distribution is given by:

This is known as the relative entropy or Kullback-Leibler divergence between the distributions and .

The KL divergence is not a symmetrical quantity; that is, .

We can apply the continuous form of Jensen’s Inequality to KL divergence to give

  • Here, as a convex function so Jensen’s inequality applies.
  • We’re also using the normalization condition .

Since is actually strictly convex, the equality will hold if and only if for all . Thus, we can interpret the KL divergence as a measure of the dissimilarity between and .