Good Bye 2018 Problem G

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

I realised I have too much contribution anyway, so I write this blog about G of today's contest.

This problem was plagued by technical issues from start to finish. When you read some of them, you will probably ask yourself how the hell did I get red in the first place.

Originally, the problem had n a product of just 2 primes. We then decided to raise it up to 10 primes. I was really confused that I had two types of runs — either they passed with runtime 30ms, or they got TLE or ILE. I added time measurement to the interactor itself, and found out that the 30ms solutions in fact took more than couple seconds to run. The same solutions reported run time of couple seconds in Gym contest that we used for testing. During post contest discussions with MikeMirzayanov I realised that I still don't understand how Polygon and Codeforces measure time for interactive problems.

I am not the fastest duck in the pond, so I investigated my Tonelli Shanks implementation for bugs, found and fixed a couple, but it was still too slow. Only at this point I actually calculated the number of operations needed to find a square root modulo a prime and facepalmed. I realised that only about 10 queries can fit into time limit in the worst case. 10 queries were too few to achieve reasonable success probability even for 2 prime factors.

Luckily, if the prime is of format 4k + 3, one can find the square root with just 1 modular exponentiation instead of 5 - 10 for Tonelli Shanks solving the general case. So the problem was changed to only support these primes, the TL and success probability was suddenly more than enough.

We made a way for the contestants to understand the implications of the runtime of the interactor, and were happy with the problem. Oh, how wrong we were! (Side note: After the contest, Mike suggested a better way of doing that, and I will change the problem for practice using this suggestion.)

Fast forward to contest — there is a crashing solution of Shik getting TLE on interactor. I don't see the interaction log in CF interface, so at first glance it seemed to me that he simply used too many queries and the time accumulated. Mike proved me wrong, as his logs showed that my interactor never returned answer to some of the queries.

I took the test case and the queries and modified the interactor so that it just uses stdin/stdout, and I can test and debug it manually outside of Polygon. For some strange reason the GCD was not returning the answer in a timely manner.

In the ensuing chaos we misjudged the situation and introduced the query limit of 100, but it only made matters worse.

I was looking at my GCD implementation like an idiot. More precisely, it is not my GCD implementation -- I am using someone else's library for big integers! I've made a few changes to it, but I considered it good, as it never failed me in a contest. Everything seemed fine except it wasn't. Then arsijo pointed out this loop:

do {
    b >>= b.count_trailing_zeros();            
    if (cmp_abs(a, b) > 0) a.words.swap(b.words);
    sub_unsigned_overwrite(b, a);
} while (b.size() > 0);

You don't need to know too much about my implementation to understand that this is roughly equivalent of:

do {
   if (a>b) swap(a,b);
   b -= a;
} while (b != 0);

You don't need to be a freaking red to understand that this is really stupid way of writing gcd.

I sighed, changed it to a = Num::mod(a, b) and ran it against Petr's and mine solution in Polygon. You don't need to be freaking red to understand that this is an incorrect fix. You need to have some Polygon experience to understand that this will use your CPU quota in Polygon and lock you out for 5 minutes, which is definitely something you appreciate when you have 10 potentially frustrated/angry contestants getting Denial of judgement.

I sighed again, changed it to b = Num::mod(b, a), committed the problem and asked arsijo to test it. He managed to lock himself out for some reason as well, and then the polygon returned HTTP 502 for some time. Magic MikeMirzayanov fixed it, saw that it runs correctly and pushed the change to Codeforces. It didn't change the verdicts in the slightest, but luckily it was just a case of a stale cache and it was fixed promptly.

We added a few minutes to the contest duration, but I understand that for some people the contest was already quite unfair or broken. I hope that during the post-contest investigation we found the least possible unfair solution. I would like to thank arsijo and MikeMirzayanov for the immense help with this difficult situation.

For anyone who managed to read this far: the downvote button is just below this sentence, and it looks like a triangle standing on one of its vertices.

Update: I realised I forgot to talk about one more important subject. Why the hell did we not find the issue with gcd during contest preparation? I think I know why. All of our solutions apparently did something along the lines of:

find random x in [1, N)
ask for square root of x * x

The crucial thing is that in expectation x is N / 2, hence x² is N / 4. For these values, the subtraction gcd seems to run fast in expectation, and we never noticed the issue. When contestants did something else, suddenly things broke. For me, this is an important lesson about testing my problems.

Комментарии (17)

Показать архивные | Написать комментарий?

kostka

6 лет назад, # |

+19

Can I ask what is "a better way" suggested by Mike?

Also, I don't know why, but when someone asks to be downvoted it doesn't work this way.

→ Ответить

phyzzmat

6 лет назад, # ^ |

+46

Probably because that sincere remark in the end of the article just reminds us that everyone is human and is prone to making mistakes from time to time..

majk

+63

It's so simple once you know it. You have 6000 coins. Asking query of type +, -, ... costs 1 coin, query of type sqrt costs 80 coins etc. When you run out of coins, you get Wrong Answer.

The only reliable way to get downvotes is to call people names and add ROFLLOOOLLMAOOOO!!!! between every two words.

Xellos

+37

relevant

ecnerwala

← Rev. 3 →

+153

Hey majk and all!

I think the loop that arsijo pointed out isn't actually slow. It's running the binary GCD algorithm, and the key line you're missing is b >>= b.count_trailing_zeros().

The way the algorithm works is that we first count and remove all common powers of 2 (trailing zeros) from a and b. WLOG, assume a is now odd. Then, in each iteration, we remove all trailing zeros from b (because a is odd, so the GCD doesn't divide 2), so that a and b are both odd. Then, we go from (a, b) to (a, b - a), so that we're back to having an even number and an odd number. This way, in each round, the number of binary digits decreases by one in the next round, so it takes $\text{[math]}$ many rounds.

This is actually faster than the plain-old Euclidean algorithm: we perform $\text{[math]}$ many subtractions instead of $\text{[math]}$ many modulo operations.

All this being said, I'm not sure what the slowdown is anymore. Do you have any thoughts?

EDIT: On a second read of Wikipedia, it looks like this is known to have comparable performance to the standard Euclidean algorithm. Perhaps the mod optimization indeed made it faster, though that's a bit of an odd hybrid approach.

← Rev. 2 →

Thanks for the link!

The problem with binary GCD is that it should in fact say: a >>= a.count_trailing_zeros(); b >>= b.count_trailing_zeros();

In particular, if the smaller number is even, it is never divided by two, and the much larger integer remained odd. It degenerated to performance of $\text{[math]}$ .

Prior to the loop, was a guaranteed to be odd? If that's the case, we always do b = odd - odd = even, while a stays odd the whole time.

Prior to the loop it is guaranteed that at least one of the numbers is odd by doing

size_t n = std::min(a.count_trailing_zeros(), b.count_trailing_zeros());
a >>= n;
b >>= n;

Since the larger integer is a product of odd primes, the smaller is in fact the only one that can be even, triggering the bug.

Interesting, that's quite a subtle bug then. Replacing those lines with

if (a.count_trailing_zeros() > b.count_trailing_zero()) 
    a.words.swap(b.words);
size_t n = a.count_trailing_zeros();
a >>= n;
b >>= n;

could fix the algorithm then.

Correct!

a61742

+12

If x is N/2, should x² be N²/4 instead of N/4?

+10

Yeah, x² should be N² / 4, I think that's a typo.

I think what's important for the statement is actually $\text{[math]}$ , which is around N / 2 in expectation if x is chosen randomly.

For those curious: The big-integer library can be found on GitHub. The faulty GCD code is here.

Someone should definitely make a pull request to fix the bug.

+11

Done! https://github.com/983/Num/pull/4

bicsi

Wouldn’t it be better to use the binary gcd algorithm for this implementation (performance-wise)? (given that mod is a particularly heavy operation here)

It is using the binary gcd algorithm.

riadwaw

I didn't really get the issue you mentioned first: so you expected that interactor time is counted as part of TL but it wasn't?

Блог пользователя majk