The convolution of two arrays (or vectors) and is a new vector such that

If we assume and are of length and respectively, and indexed starting from , the natural range on is from to . The values of all out-of-range elements of and are interpreted as zero, so they do not contribute to any product.

An example of convolution is polynomial multiplication. Recall the problem of multiplying two polynomials, for example:

Let and denote the coefficients of in each of the polynomials. Then, multiplication is a convolution, because the coefficient of the term in the product polynomial is given by the convolution above. This coefficient is the sum of the products of all terms which have exponent pairs adding to . For example .

The obvious way to implement convolution is by computing the term dot product for each . This is two nested loops, running in time. The inner loop does not always involve iterations because of boundary conditions. Simpler loop bounds could have been employed if and were flanked by ranges of zeros.

for (i = 0; i < n+m-1; i++) { 
	for (j = max(0,i-(n-1)); j <= min(m-1,i); j++) { 
		c[i] = c[i] + a[j] * b[i-j]; 
	} 
}

Convolution multiplies every possible pair of elements from and , and hence it seems like we should require quadratic time to get these numbers. Like sorting, there exists a clever divide and conquer algorithm that runs in time, assuming that . And just like sorting, there are a large number of applications that take advantage of this enormous speedup for large sequences.

Fast Convolution/Polynomial Multiplication

We present convolution through a fast algorithm for multiplying polynomials. It is based on a series of observations:

Polynomials can can be represented either as equations or sets of points. We know that every pair of points defines a line; more generally, any degree- polynomial is completely defined by points on the polynomial. For example, the points , , and define (and are defined by) the quadratic equation .

We can find such points on by evaluation, but it looks expensive. Generating a point on a given polynomial is easy – simply pick an arbitrary value and plug it into . The time it takes for one such will be linear in the degree of , which means doubles for the problems we are interested in. But doing this times for different values of would take time, which is more than we can afford if we want fast multiplication.

Multiplying polynomials A and B in a points representation is easy, if they have both been evaluated on the same values of : Suppose we want to compute the product of . The result will be will be a degree- polynomial, so we need five points to define it. We can evaluate both factors on the same values:

Since , we can now construct points on by multiplying the corresponding -values:

Thus, multiplying points in this representation takes only linear time.

We can evaluate a degree-n polynomial as two degree- polynomials in . We can partition the terms of into those of even and odd degree, for example:

By replacing with , the right side gives us two smaller, lower degree polynomials as promised.

This suggests an efficient divide-and-conquer algorithm. We need to evaluate points of a degree- polynomial. We need points, since we will be using them to compute the product of two polynomials. We can decompose the problem into doing this evaluation on two polynomials of half the degree, plus a linear amount of work stitching the results together. This defines the recurrence , which evaluates to .

Making this work correctly requires picking the right values to evaluate on. The trick with the squares makes it desirable for our sample points to come in pairs of the form , since their evaluation requires half as much work because they are identical when squared.

However, this property does not hold recursively, unless the values are carefully chosen complex numbers. The th roots of unity are the set of solutions to the equation . In reals, we only get , but there are solutions with complex numbers. The th of these roots is given by

To appreciate the magic properties of these numbers, look at what happens when we raise them to powers:

Observe that these terms come in positive/negative pairs, and the number of distinct terms gets halved with each squaring. These are the properties we need to make the divide and conquer work.

The best implementations of fast convolution generally compute the fast Fourier transform (FFT), so usually we seek to reduce our problems to FFTs to take advantage of existing libraries.