FFT Problem Optimization

→ Pay attention

Before contest
Codeforces Round 1006 (Div. 3)
16:22:17
Register now »

→ Streams

Codeforces Round 1006 Solution Discussion

By aryanc403

Before stream 18:42:16

View all →

→ Top rated

#	User	Rating
1	tourist	3857
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
4	atcoder_official	158
6	Qingyu	156
7	djm03178	151
7	adamant	151
7	luogu_official	151
10	awoo	146

View all →

→ Find user

→ Recent actions

Detailed →

Diguised's blog

FFT Problem Optimization

By Diguised, history, 10 years ago, In English

Hi —

Recently I submit this solution to a polynomial FFT multiplication problem — POLYMUL.

Even on my computer — this solution runs very slow, and I cannot identify the reason. I'm wondering if anyone can assist in optimizing this solution — there must be something wrong for it to run so slowly.

Thanks in advance, Disguised

Diguised
10 years ago
12

Comments (10)

Show archived | Write comment?

Klein

10 years ago, # |

The main optimization you can do, in my opinion, is to set a higher base case threshold. That is, instead of evaluating the base case when there's only a single element left, use a O(N²) evaluation when the number of elements is less than 32 (32 is usually a good threshold ^^). In my experience (not only in FFT, but also in Karatsuba and the like) this makes a huge difference.

→ Reply

yeputons

10 years ago, # |

← Rev. 2 →

+22

Start with getting rid of:

push_back, it's O(1), but does reallocations. Pre-allocate memory (or just use vector.reserve).
Recursive calls and allocations of memory, do everything in-place.
complex<double> — it's slow somewhy, implement same class yourself.
Replace two FFTs with one. FT of a + 0·i it's excessive, so no need to perform two separate transformations for a + 0·i and b + 0·i, do one for a + bi and then restore two results with some formulas.

I'm not sure that these are the most important, but they came to my mind first. By the way, here you can find FFT implementation from SPb SU 4's notebook.

→ Reply

zakharvoit

10 years ago, # ^ |

← Rev. 3 →

+16

Looking at the std::complex implementation, I noticed that there are template specializations for standard types that use complex operations from the C language. Seems that it is slow because of additional function calls.

I tested fft with the custom double "implementation" (i.e. a wrapper class around double with all operators overloaded) and std::complex and it shows the same perfomance with the custom complex class. But this implementation is even bigger than the one with the custom complex class, so it seems that it cannot be used to shorten the code.

P.S. Maybe there is another way to override template specialization than creating the custom class, but don't know it.

→ Reply