Mutual Information

When two variables $x$ and $y$ are independent, their joint distribution will factorize into the product of their marginals, $p (x, y) = p (x) p (y)$ .

If the variables are not independent, we can gain some idea of whether they are ‘close’ to being independent by considering the Kullback-Leibler Divergence between the joint distribution and the product of the marginals, given by

I [x, y] \equiv KL (p (x, y)) = - \iint p (x, y) ln (\frac{p ( x ) p ( y )}{p ( x , y )})

which is called the mutual information between the variables $x$ and $y$ .

From the properties of KL divergence, we see that $I [x, y] \geq 0$ , with equality iff $x$ and $y$ are independent.

Using the Sum and Product Rules of Probability, we see that the mutual information is related to the conditional entropy through

I [x, y] = H [x] - H [x ∣ y] = H [y] - H [y ∣ x]

Thus, the mutual information represents the reduction in uncertainty about $x$ by virtue of being told the value of $y$ , or vice versa.

From a Bayesian perspective, we can view $p (x)$ as the prior distribution for $x$ and $p (x ∣ y)$ as the posterior distribution. The mutual information thus represents the reduction in uncertainty about $x$ as a consequence of the new observation $y$ .

/notes/

Recent

Sources of Test Error

UDL Chapter 8 Problems

Parameter Initialization

Mutual Information

Graph View

Backlinks