In the latest contest I solved 2028C - Alice's Adventures in Cutting Cake in ~ O(nlog(n)). My first submission got TLE, which I found odd so I optimized it — tighter bound for BS and static array used for DP, which passed in 200ms.
Then I started adjusting my original solution and found out that if I use C++23 (GCC 14-64, msys2) instead of C++20 (GCC 13-64), the solution passes. If I create a single static vector, the solution is much faster.
C++23 (GCC 14-64, msys2): 291080726 — ~ 1400ms
C++20 (GCC 13-64): 291080768 — TLE
Single static vector: 291080651 — ~ 300ms
I have two questions:
1) Does anyone know why the compiler version changes the runtime so dramatically and how to prevent it?
2) Since creation of a vector is linear in its size, why is the runtime increased so much?
Seems like the small vector is the big slowdown here, after I replaced the vll(2) into an array<ll, 2>, it sped up by a large magnitude. I think it's well known that small vectors can cause this, probably why you should use array<type, size> for small, fixed dimensions
I see, I suppose that could be since vectors are reserving more size than they need to?
It's pretty likely that's what happened, as when I tested, the solution using array<> uses only 4000 kb of memory, while the vector solution used 12000kb
so I suppose the other compiler is just better at creating small vectors
vector
performs dynamic allocation, and it has to maintain the pointer to the allocated memory, the size of the vector, and the capacity of the vector. This is a huge overhead compared to anarray
where all it requires is a fixed size memory directly in place.Right, I expected the overhead to make it like 3x times slower but it is almost 10x slower
When you allocate 2d (n * 2) vector inside function it probably doesn't allocate a continuous block of memory, but runs n allocations on heap which is expensive + the resulting memory layout is not cache-friendly. For global vector you at least do these n allocations only once.
Also there's no reason to use vector here over a vanilla array, since you already know the size beforehand.
For C++20 option, Codeforces uses g++ v.13.2 (winlibs), which uses MSVCRT. It is pretty bad, especially at the memory allocation. Your solution even passes, if we submit it on g++ v.7.3.0 (32 bit): 291101988.
For C++23 option, Codeforces uses g++ v.14.2 (msys2), which uses UCRT64, which is better than MSVCRT.
For additional reading: 1, 2, 3, 4, 5, 6