Introducing the imaginary cyclic convolution, speeding up convolution by a factor of 2

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	adamant	152
6	djm03178	152
8	Qingyu	151
9	luogu_official	149
10	awoo	147

I recently had a very interesting idea for how to greatly speed up convolution (a.k.a. polynomial multiplication).

def convolution(A, B):
  C = [0] * (len(A) + len(B) - 1)
  for i in range(len(A)):
    for j in range(len(B)):
      C[i + j] += A[i] * B[j]
  return C

The standard technique for how to do convolution fast is to make use of cyclic convolution (polynomial mult mod $$$x^n - 1$$$).

def cyclic_convolution(A, B):
  n = len(A) # A and B needs to have the same size
  C = [0] * n
  for i in range(n):
    for j in range(n):
      C[(i + j) % n] += A[i] * B[j]
  return C

Cyclic convolution can be calculated in $$$O(n \log n)$$$ using FFT, which is really fast. The issue here is that in order to do convolution using cyclic convolution, we need to pad with a lot of 0s to not be affected by the wrap around. All this 0-padding feels very inefficient.

So here is my idea. What if we do polynomial multiplication mod $$$x^n - i$$$ instead of mod $$$x^n - 1$$$? Then when we get wrap around, it will be multiplied by the imaginary unit, so it wont interfere with the real part! I call this the imaginary cyclic convolution.

def imaginary_cyclic_convolution(A, B):
  n = len(A) # A and B needs to have the same size
  C = [0] * n
  for i in range(n):
    for j in range(n):
      C[(i + j) % n] += (1 if i + j < n else 1j) * A[i] * B[j]
  return C

Imaginary cyclic convolution is the perfect algorithm to use for implementing convolution. Using it, we no longer need to do copious amount of 0 padding, since the imaginary unit will take care of the wrap around. In fact, the size (the value of $$$n$$$) required is exactly half of what we would need if we had used cyclic convolution.

One question still remains, how do we implement imaginary cyclic convolution efficiently?

The trick is rather simple. Let $$$\omega = i^{\frac{1}{n}}$$$. Now note that if $$$C(\omega x) = A(\omega x) B(\omega x) \mod x^n - 1$$$ then $$$C(x) = A(x) B(x) \mod x^n - i$$$. So here is the algorithm

def imaginary_cyclic_convolution(A, B):
  n = len(A) # A and B needs to have the same size
  w = (1j)**(1/n) # n-th root of imaginary unit
  
  # Transform the polynomials A(x) -> A(wx) and B(x) -> B(wx)
  A = [A[i] * w**i for i in range(n)]
  B = [B[i] * w**i for i in range(n)]

  C = cyclic_convolution(A, B)
  
  # Transform the polynomial C(wx) -> C(x)
  C = [C[i] / w**i for i in range(n)]
  return C

Comments (3)

Write comment?

pajenegod

2 years ago, # |

Auto comment: topic has been updated by pajenegod (previous revision, new revision, compare).

→ Reply

adamant

+36

Nice trick! Note however that it's not compatible with simultaneous computation of DFT for two real valued arrays, as described in №16 here. For my implementation, they have roughly the same performance (here is my trick and here is yours).

I wonder if there are any ways to further speed up stuff with imaginary convolution...

pajenegod's blog