While trying to squeeze my solution of 1336E2 - Chiori and Doll Picking (hard version) into the time limit, I have encountered an unexpectedly large difference in the execution times among the various C++ compilers offered by Codeforces. Since I think it is something worth knowing, I am sharing this discovery.
Consider the following minimal working example (it generates $$$2N=30$$$ random $$$56$$$-bits numbers and computes the sum of the bits of the xors of all the subsets of the $$$30$$$ numbers).
The important lines are the following ones, where the functions __builtin_popcountll
and xor
are called $$$2^{30}$$$ times.
ULL res = 0;
for (int i = 0; i < (1<<N); i++) for (int j = 0; j < (1<<N); j++) {
res += __builtin_popcountll(c[i]^d[j]);
}
Executing the above program in Codeforces custom invocation yields these execution times:
Compiler Execution Time
GNU GCC C11 5.1.0 4040 ms
GNU G++11 5.1.0 4102 ms
GNU G++14 6.4.0 1123 ms
GNU G++17 7.3.0 1107 ms
GNU G++17 9.2.0 (64bit, msys 2) 374 ms
Notice that the 64bit-native compiler produces a much faster executable (and notice also that among the other compilers there is quite a difference). Hence, next time you have to optimize a solution with a lot of bit-operations on 64bit integers (and, in fact, this situation is not so uncommon), consider using the compiler GNU G++17 9.2.0 (64bit, msys 2).
It might be that the differences among the execution times are due to the way I have written the program (maybe the wrong PRAGMAS? Maybe preventing some compilers from optimizing because of a certain idiom? Maybe something else?), if this is the case, please enlighten me!