How does a probability density transform under a non-linear change of variable? Probability densities have different behavior than simple functions under such transforms.

Consider a single variable , and we make a change of variables , such that becomes a new function such that

For a probability density , if we want to find a density for a new variable , such that . This density is expressed as . To make this transformation, we consider the probabilities of and falling into infinitesimally small ranges.

  • The probability that in the range is (see probability density).
  • Similarly, the probability that in the range is .

Now, since and are related by , we can say that a small change in will cause a corresponding small change in . This can be expressed mathematically by considering that probability is conserved when we change variables, such that:

This becomes exactly equal when we take the limit of and :

We can then turn this into:

Here we’re using the modulus because the derivative could be negative, but we want to scale the density by the proportion of lengths, which is a positive value.

This sort of procedure is very powerful, as any density can be obtained from a fixed density by making a non-linear change of variable in which is monotonic so that , and is non-zero everywhere. However, it can also make things more complicated, such as wen trying to find Maximum of Transformed Density.

This property is important to Normalizing Flows.