I experienced something weird while solving 2050F - Maximum modulo equality. Basically, I tried solving it by decomposing the array into $$$\mathcal{O}(\sqrt{n})$$$ blocks. I submitted several of them each using difference size of blocks, and they passed in 2s during the contest. Surprisingly, in system tests, all of my submissions got TLE on my hack test that was used to hack other similar solutions that worked much slower in the original tests.
It felt really weird, because the most basic gcd
implementation we learn is something like this:
int G(int a, int b)
{
return b ? G(b, a % b) : a;
}
Although a single call for this function takes $$$\mathcal{O(\log{x}})$$$ time where $$$x$$$ is the value of the input, when we repeatedly apply this fuction $$$k$$$ times where either $$$a$$$ or $$$b$$$ is the result of a previous call on this function, it should only take $$$\mathcal{O}(k+\log{x})$$$ because the depth of the recursion gets deeper only when the result gets smaller. Thus, my solution should only take $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}+\log{x}))$$$ time, which should be fast enough to run in 5s.
However, the result shows that it's not the case. Check out these three versions (note that I used using ll = int;
so the divisions aren't that slow):
The first one uses std::gcd
, the second one uses __gcd
which is a gcc extension, and the third one uses the G
function I wrote above. The first one got TLE, while the other two took only around 2 seconds. So, why did this happen?
I asked about this to some people, and the conclusion was that std::gcd
is maybe using binary gcd algorithm, which still takes time proportional to the number of significant bits of the input, even if the result remains large. For example, when res = 0
initially, repeatedly calling res = gcd(res, (int)1e9)
will keep taking time proportional to $$$\log{10^9}$$$ each time, while the traditional Euclidian algorithm will only call itself recursively once each time. Conclusively, my solution with std::gcd
was likely to be $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}\log{x}))$$$, which is too slow indeed.
qwerasdfzxcl suggested me using b = gcd(a % b, b)
instead of b = gcd(a, b)
, and it does work: 295441081, and it's a lot faster than the other versions! I was told that it will bound at $$$\mathcal{O}(k+\log^2{x})$$$, but I don't quite understand this well so if anyone has idea on this, please do comment. and the theory is that assuming gcd
returns $$$b$$$ immediately when $$$a\%b=0$$$, any other case will be that $$$b$$$ is at least halved after this gcd
call and this happens only $$$\mathcal{O}(\log{x})$$$ times.
Lessons learned: Do not naively use std::gcd
when we need to repeatedly perform gcd on its result. Alternatively, using C++17 works, presumbaly because it used Euclidan algorithm back then. C++23 doesn't work.
Oh
Pragmas also help: 295462909.
useful knowledge!
Looks like it is: std:gcd: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/numeric#L137-L171
__gcd: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algo.h#L1118-L1130
This is insane... It is the first time I see the new C++ version being slower than the old one :) And from the case of
gcd(0, 1e9)
it looks like it's not just a random competitive programming quirk but a real worsening of the time complexity. I actually find it rather odd. One can implement binary gcd to work in $$$O(\log \min(a, b))$$$ time but from your example apparently it doesn't do that. What we use to bound the time complexity of the repeated application ofgcd
is that the time complexity of the Euclid's algorithm is $$$O(\log (\min(a, b) / \gcd(a, b)))$$$. This I agree is an obscure time complexity that only competitive programmers could care about. But the difference between $$$O(\log \min(a, b))$$$ and $$$O(\log \max(a, b))$$$ is actually significant...I would also like to point out that
__gcd(x, 0)
is undefined behaviour, so this problem is indeed annoying. I guess the only good solution is to always just use your owngcd
function. But that's just sad...Oh, I didn't know that
__gcd(x, 0)
is UB. I guess I'll just use my own implementation when the TL could be tight :)yes, it uses division by zero in this case: https://codeforces.me/blog/entry/13410?#comment-182865
Why does it always work as it should then?
guys how to be good at c++ the language ?
are you sure __gcd(x,0) is UB? I just tested it in codeforces custom test and it works, also see this https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algo.h#L1118-L1130
ok, interesting. so does it mean they changed something in c++20?
qwerasdfzxcl's trick is a co-called hybrid approach: reference. Based on the benchmarks form the cited blog, it is the fastest approach.
Why did u use sqrt decomposition ?
Isn't something like segment tree or sparse table much faster ?
If you ask me why... because of my skill issue? :)
The segment tree solution just didn't come to me for some reason at that time, and later in the contest I found that segment tree would be faster so I re-wrote the solution with segment tree, so I was actually able to keep AK despite that 6 of my submissions FST'ed.
It's so amazing but how can we use it usefully?