While trying to squeeze my solution of 1336E2 - Chiori and Doll Picking (hard version) into the time limit, I have noticed a huge discrepancy in performance among the C++ compilers offered by codeforces. Since I think it is something worth knowing (and it is rather unexpected for me), I am sharing this discovery.
Consider the following minimal working example (it generates $$$2N=30$$$ random $$$56$$$-bits numbers and computes the sum of the bits of the xors of all the subsets of the $$$30$$$ numbers).
The important lines are the following ones, where the function __builtin_popcountll
is called $$$2^{30}$$$ times.
ULL res = 0;
for (int i = 0; i < (1<<N); i++) for (int j = 0; j < (1<<N); j++) {
res += __builtin_popcountll(c[i]^d[j]);
}
Executing the above program in Codeforces custom invocation, yields these execution times:
Compiler Execution Time
GNU GCC C11 5.1.0 4040 ms
GNU G++11 5.1.0 4102 ms
GNU G++14 6.4.0 1123 ms
GNU G++17 7.3.0 1107 ms
GNU G++17 9.2.0 (64bit, msys 2) 374 ms
Notice that the only 64bit-native compiler that Codeforces offers produces a much faster executable (and notice that also among the other compilers there is quite a difference).
Next time you have to optimize a solution with a lot of bit-operations on 64bit integers (and, in fact, this situation is not so uncommon), consider using the compiler GNU G++17 9.2.0 (64bit, msys 2).
It might be that the differences among the various compilers are due to the way I have written the program (maybe the wrong PRAGMAS? Maybe preventing some compilers from optimizing because of a certain idiom? Maybe something else), if this is the case, please enlighten me!