We found that we can define a Measure of Information when observing a particular event such that:

Suppose a sender wants to transmit the value of a random variable to a receiver. The average amount of information is obtained by taking the expectation of with respect to and is given by:

This is called the entropy of the random variable . Note that and so we will take whenever we encounter a value for such that .

Example

Consider a random variable having eight possible states, each of which is equally likely. To communicate the value of to a receiver, we would need to transmit a message of length 3 bits.

The entropy of this variable is given by:

Now consider an example of a variable having 8 possible states for which the respective probabilities are given by . The entropy in this case is given by:

How would we transmit the identity of the variable’s state to a receiver? We could use a 3 bit number like before. However, we can take advantage of the non-uniform distribution by using shorter codes for more probable events, leading to a shorter average code length. For example:

The average code length would then be:

which again is the same as the entropy of the random variable.

  • Note that shorter code strings cannot be used because it must be possible to disambiguate a concatenation of such strings into its component parts. For instance, 11001110 decodes uniquely into the state sequence .
  • This relation between entropy and shortest coding length is a general one. The noiseless coding theorem states that the entropy is a lower bound on the number of bits needed to transmit the state of a random variable.
  • The non-uniform distribution has a smaller entropy than the uniform one.

Physical Entropy

The concept of entropy has origins in physics where it was introduced in the context of equilibrium thermodynamics and later given a deeper interpretation as a measure of disorder through developments in statistical mechanics.

This alternative view of entropy can be understood by considering a set of identical objects that are to be divided amongst a set of bins, such that there are objects in the th bin. Consider the number of different ways of allocating the objects to the bins:

  • There are ways to choose the first object
  • There are ways to choose the second object, and so on.
  • This leads to a total of ways to allocate all objects to the bins.

We don’t want to to distinguish between rearrangements of objects within each bin. In the th bin there are ways of reordering the objects, and so the total number of ways of allocating the objects to the bins is given by: