How does a probability density transform under a non-linear change of variable? Probability densities have different behavior than simple functions under such transforms.
Consider a single variable , and we make a change of variables , such that becomes a new function such that
For a probability density , if we want to find a density for a new variable , such that . This density is expressed as . To make this transformation, we consider the probabilities of and falling into infinitesimally small ranges.
- The probability that in the range is (see probability density).
- Similarly, the probability that in the range is .
Now, since and are related by , we can say that a small change in will cause a corresponding small change in . This can be expressed mathematically by considering that probability is conserved when we change variables, such that:
This becomes exactly equal when we take the limit of and :
We can then turn this into:
Here we’re using the modulus because the derivative could be negative, but we want to scale the density by the proportion of lengths, which is a positive value.
This sort of procedure is very powerful, as any density can be obtained from a fixed density by making a non-linear change of variable in which is monotonic so that , and is non-zero everywhere. However, it can also make things more complicated, such as wen trying to find Maximum of Transformed Density.
This property is important to Normalizing Flows.