Блог пользователя djm03178

Автор djm03178, 6 недель назад, По-английски

I experienced something weird while solving 2050F - Maximum modulo equality. Basically, I tried solving it by decomposing the array into $$$\mathcal{O}(\sqrt{n})$$$ blocks. I submitted several of them each using difference size of blocks, and they passed in 2s during the contest. Surprisingly, in system tests, all of my submissions got TLE on my hack test that was used to hack other similar solutions that worked much slower in the original tests.

It felt really weird, because the most basic gcd implementation we learn is something like this:

int G(int a, int b)
{
    return b ? G(b, a % b) : a;
}

Although a single call for this function takes $$$\mathcal{O(\log{x}})$$$ time where $$$x$$$ is the value of the input, when we repeatedly apply this fuction $$$k$$$ times where either $$$a$$$ or $$$b$$$ is the result of a previous call on this function, it should only take $$$\mathcal{O}(k+\log{x})$$$ because the depth of the recursion gets deeper only when the result gets smaller. Thus, my solution should only take $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}+\log{x}))$$$ time, which should be fast enough to run in 5s.

However, the result shows that it's not the case. Check out these three versions (note that I used using ll = int; so the divisions aren't that slow):

The first one uses std::gcd, the second one uses __gcd which is a gcc extension, and the third one uses the G function I wrote above. The first one got TLE, while the other two took only around 2 seconds. So, why did this happen?

I asked about this to some people, and the conclusion was that std::gcd is maybe using binary gcd algorithm, which still takes time proportional to the number of significant bits of the input, even if the result remains large. For example, when res = 0 initially, repeatedly calling res = gcd(res, (int)1e9) will keep taking time proportional to $$$\log{10^9}$$$ each time, while the traditional Euclidian algorithm will only call itself recursively once each time. Conclusively, my solution with std::gcd was likely to be $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}\log{x}))$$$, which is too slow indeed.

qwerasdfzxcl suggested me using b = gcd(a % b, b) instead of b = gcd(a, b), and it does work: 295441081, and it's a lot faster than the other versions! I was told that it will bound at $$$\mathcal{O}(k+\log^2{x})$$$, but I don't quite understand this well so if anyone has idea on this, please do comment. and the theory is that assuming gcd returns $$$b$$$ immediately when $$$a\%b=0$$$, any other case will be that $$$b$$$ is at least halved after this gcd call and this happens only $$$\mathcal{O}(\log{x})$$$ times.

Lessons learned: Do not naively use std::gcd when we need to repeatedly perform gcd on its result. Alternatively, using C++17 works, presumbaly because it used Euclidan algorithm back then. C++23 doesn't work.

  • Проголосовать: нравится
  • +204
  • Проголосовать: не нравится

»
6 недель назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Oh

»
6 недель назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

Pragmas also help: 295462909.

»
6 недель назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

useful knowledge!

»
6 недель назад, # |
Rev. 4   Проголосовать: нравится +36 Проголосовать: не нравится

This is insane... It is the first time I see the new C++ version being slower than the old one :) And from the case of gcd(0, 1e9) it looks like it's not just a random competitive programming quirk but a real worsening of the time complexity. I actually find it rather odd. One can implement binary gcd to work in $$$O(\log \min(a, b))$$$ time but from your example apparently it doesn't do that. What we use to bound the time complexity of the repeated application of gcd is that the time complexity of the Euclid's algorithm is $$$O(\log (\min(a, b) / \gcd(a, b)))$$$. This I agree is an obscure time complexity that only competitive programmers could care about. But the difference between $$$O(\log \min(a, b))$$$ and $$$O(\log \max(a, b))$$$ is actually significant...

I would also like to point out that __gcd(x, 0) is undefined behaviour, so this problem is indeed annoying. I guess the only good solution is to always just use your own gcd function. But that's just sad...

»
6 недель назад, # |
Rev. 2   Проголосовать: нравится +8 Проголосовать: не нравится

qwerasdfzxcl's trick is a co-called hybrid approach: reference. Based on the benchmarks form the cited blog, it is the fastest approach.

»
6 недель назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Why did u use sqrt decomposition ?

Isn't something like segment tree or sparse table much faster ?

  • »
    »
    6 недель назад, # ^ |
    Rev. 2   Проголосовать: нравится +6 Проголосовать: не нравится

    If you ask me why... because of my skill issue? :)

    The segment tree solution just didn't come to me for some reason at that time, and later in the contest I found that segment tree would be faster so I re-wrote the solution with segment tree, so I was actually able to keep AK despite that 6 of my submissions FST'ed.

»
6 недель назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

It's so amazing but how can we use it usefully?