Блог пользователя maomao90

Автор maomao90, история, 23 месяца назад, По-английски

I was attempting to implement the solution to 1770G - Koxia and Bracket after reading the editorial recently but I kept getting TLE on test case 12. However, after changing the compiler from GNU C++17 to GNU C++17 (64), I got AC in 1185ms, which is very far off from the time limit of 5 seconds.

GNU C++17: 188211296

GNU C++17 (64): 188211331

I tried testing it on errorgorn solution in the editorial as well and there was the same problem.

GNU C++17: 188211673

GNU C++17 (64): 188211705

I thought that it might be because of the NTT implementation that we used (KACTL), however, I even tried on Radewoosh submission which uses a different NTT implementation, but still faces the same issue.

GNU C++17: 188203821

GNU C++20 (64): 187349301

I have seen some cases where 64-bit compiler runs faster than 32-bit before, but never to such a large extent. Does anyone know the reason why? Does NTT run a lot faster on 64-bit compiler? Or is it something about our implementation?

If anyone know the reason why, I will appreciate it very much if you could explain it to me down in the comments. I guess it is about time that I shift from GNU C++17 to GNU C++17 (64) 😢

  • Проголосовать: нравится
  • +35
  • Проголосовать: не нравится

»
23 месяца назад, # |
  Проголосовать: нравится -8 Проголосовать: не нравится

I wrote editorial's code with my library and have

GNU C++17 (64): OK 1419 ms 188256118

GNU C++17: OK 3884 ms 188256187

I guess modular arithmetic is bottleneck

»
23 месяца назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

From my experience of solving problems on a 32-bit-only online judge, I can say that 64-bit arithmetic on 32-bit machines is indeed very costly. On such machines, 64-bit integers are split into two 32-bit ones and arithmetic on them is emulated using multiple 32-bit operations, which is not always the easiest task, especially when the operations intertwine the low and high parts of a number as it is with multiplication and division/modulo. Here's MSVC's implementation of 64-bit division using 32 bits and as you can see, it uses many different processor instructions, including branching which has its own performance problems, compared to just one "idiv" instruction you'd have on a 64-bit processor. In short, just use 64-bit compilers whenever you can to get the most out of your computer's capabilities.