Same code passed after but not before

→ Обратите внимание

До соревнования
Rayan Programming Contest 2024 - Selection (Codeforces Round, Div. 1 + Div. 2)
4 дня
Зарегистрироваться »

*есть доп. регистрация

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	156
8	TheScrasse	154
9	Dominater069	153
10	nor	152

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя farmersrice

Same code passed after but not before

Автор farmersrice, история, 5 лет назад, По-английски

https://codeforces.me/contest/1200/submission/58612411

https://codeforces.me/contest/1200/submission/58619785

I'm that unlucky to just get the wrong base on test 19? Are you kidding me? Can I get this restored?

farmersrice
5 лет назад
39

Комментарии (37)

Показать архивные | Написать комментарий?

nantf

5 лет назад, # |

+33

Yes, that's just because of unluckiness.

Remember, when using hash, especially in CF rounds, use at least two hash arrays.

Also, don't write hash when you come up with another solution.

→ Ответить

apostoldaniel854

5 лет назад, # ^ |

+31

I was so scared of being hacked that I used 8 hash arrays. That's real paranoia right here...

→ Ответить

nantf

5 лет назад, # ^ |

Well, no need to be that afraid.

I remember once I participied in a Chinese OI round, I was afraid that my hash solution would fail. So I used three arrays. But after coding that the contest only remained 15 mins. Then I fail to detect the mistakes.

So, the best the way is to use hash as few as possible.

→ Ответить

vamaddur

5 лет назад, # ^ |

← Rev. 3 →

"What lies in this code... is none other than a programmer's boundless curiosity and obsession with getting AC!"

It's super clean though. :)

→ Ответить

apostoldaniel854

5 лет назад, # ^ |

← Rev. 3 →

Thanks =)) For me I find the code a little messy though

→ Ответить

jef

5 лет назад, # ^ |

← Rev. 2 →

+71

8 hashes can still be hacked using a lattice-reduction attack:

2
daaiaabfhaaaanaavahbahlbdoaxeaaaadabadcj aicasbaaakkagakhafaaeaaaaagaafchkaeamaaa

Better use randomization.

→ Ответить

apostoldaniel854

5 лет назад, # ^ |

Expected this...

→ Ответить

farmersrice

5 лет назад, # ^ |

+19

→ Ответить

nantf

5 лет назад, # ^ |

Hmmm... Feel sad about that.

(Wait? Shouldn't it be $$$10^6$$$ instead of $$$10^5$$$?)

In this way it is about 93%. That's unlucky anyway.

→ Ответить

meooow

5 лет назад, # |

+82

ll base = uniform_int_distribution<int>(11, 30000)(rng);
Base should not be less than the max alphabet value.

→ Ответить

farmersrice

5 лет назад, # ^ |

← Rev. 2 →

+22

I want to kill my self, I wait 4 years just to get shafted by this luck

→ Ответить

meooow

5 лет назад, # ^ |

+17

determination++;

→ Ответить

farmersrice

5 лет назад, # ^ |

+27

nah, I quit

→ Ответить

Savior-of-Cross

5 лет назад, # |

Unlucky :(

→ Ответить

farmersrice

5 лет назад, # |

← Rev. 2 →

+18

I can't get it back??????

→ Ответить

jo_ulej

5 лет назад, # |

+11

Why do you have random base and constant modulo?

→ Ответить

farmersrice

5 лет назад, # ^ |

+22

Anti-hack, but turns out it caused my stupidity with "base should not be less than the max alphabet value"

→ Ответить

jo_ulej

5 лет назад, # ^ |

I mean, why didn't you make modulo random as well?

→ Ответить

farmersrice

5 лет назад, # ^ |

← Rev. 2 →

I don't think that's necessary for anti-hack as long as the base is random. Should it be done?

→ Ответить

collapse

5 лет назад, # ^ |

+29

No you are correct, because modulo should be prime or at least coprime to base, unless you want to generate random prime

→ Ответить

farmersrice

5 лет назад, # ^ |

Thank you!

→ Ответить

Xellos

5 лет назад, # ^ |

It makes way more sense intuitively to use different random coprime modulos. Consider that as long as base > |alphabet|, a polynomial hash without mod will always be a unique number. If two strings have these hashes $$$h_1$$$ and $$$h_2$$$, then a collision modulo $$$m$$$ means $$$h_1-h_2$$$ is divisible by $$$m$$$. When you use several coprime modulos $$$m_1, m_2, \ldots$$$, a collision means that $$$m_1 \cdot m_2 \cdot \ldots$$$ divides $$$h_1-h_2$$$. I don't know how it works with random bases, but with random modulos, as long as the polynomial hashes give sufficiently random remainders modulo $$$m_1, m_2, \ldots$$$, the chance of a collision drops extremely quickly with added modulos.

→ Ответить

farmersrice

5 лет назад, # ^ |

Thank you!

→ Ответить

Zadymiarz1234

5 лет назад, # |

+11

btw did you heard about double hash but one without modulo? It's as fast as normal hash couse you dont use more modulo. Variables will "overflowing-same" (end up with same numbers) so it will works. But dont try it without first noraml hash couse technically overflowing is nothing more then modulo some power of 2. I hope you got what i had on mind :D gl in future coding!

→ Ответить

nantf

5 лет назад, # ^ |

-10

Sometimes it will be more faster than normal hash. Moduloing takes lots of time.

→ Ответить

meooow

5 лет назад, # ^ |

Wouldn't recommend a modulo $$$2^{64}$$$ hash by itself: link
But I have seen people use it paired with another prime modulo hash.

→ Ответить

yassin_

5 лет назад, # ^ |

that's exactly what he is saying, isn't it?

→ Ответить

meooow

5 лет назад, # ^ |

Seems so, now that I think about it... ignore my comment in that case.

→ Ответить

nik7

5 лет назад, # |

-33

Oh come on. You got 493 upvote on this blog. You think it really deserves that much?. It's just karma striking back

→ Ответить

farmersrice

5 лет назад, # ^ |

+11

dude i would give all my contribution just to have this one count as passed

→ Ответить

dacin21

5 лет назад, # |

+29

Only randomizing the base and not the modulo is fine, but you have to pick the base from a large range of values. In your code you do ll base = uniform_int_distribution<int>(11, 30000)(rng);, so you're picking the base from less than $$$30'000$$$ values. This is an issue:

you're more likely to pick a small base, which might fail on the regular system tests.
it is possible to hack all $$$30'000$$$ bases in a single test with around 360'000 characters in total. (Due to the arbitrary 256kb input size limit for hacks, you can only hack around $$$18'000$$$ of them at a time, but that still gives the hacker over 50% odds at success.)
there might be strings of length $$$\log_{26}(10^9+7) \cdot 30'000 \approx 191'000$$$ each for which all $$$30'000$$$ bases collide at the same time. I however don't know how to compute such strings.

A better thing to do would be ll base = uniform_int_distribution<int>(0, mod1-1)(rng);. Intuitively, with such a large range of bases, you're unlikely to pick a small base. The exact math behind this can be found here.

→ Ответить

redocyz

5 лет назад, # ^ |

← Rev. 2 →

ll base = uniform_int_distribution<int>(0, mod1-1)(rng);

you mean

ll base = uniform_int_distribution<int>(11, mod1-1)(rng); right?

→ Ответить

dacin21

5 лет назад, # ^ |

← Rev. 2 →

Short answer: It doesn't really matter.

Long answer: From a worst-case point of view (i.e. hacks), the first one is slightly better. Using the fact that a polynomial of degree $$$n$$$ has at most $$$n$$$ roots in any field, you can show that for every fixed modulo and every fixed pair of strings of length $$$n$$$, there are at most $$$n$$$ bases that cause a collision. With bases picked uniformly at random from $$$\{0,1,\dots, mod_1-1\}$$$, the chance of collision is at most $$$\frac{n}{mod_1}$$$, with bases from $$$\{11, 12, \dots, mod_1-1\}$$$, the chance is a most $$$\frac{n}{mod_1 - 10}$$$, which is slightly worse.

I guess that you think that small bases are bad as it is easy to construct tests against them. But in fact, it is easy to construct tests against any fixed base (assuming modulo is also fixed). (If you're just dealing with fixed tests and not with hacks, then tests against very small bases are more likely to be in the testdata. But with hacks, there is little difference between different fixed bases.) This is why you need to pick you're based randomly at runtime, so that the hacker can't know against which base he has to find a test case.

→ Ответить

redocyz

5 лет назад, # ^ |

← Rev. 2 →

I guess that you think that small bases are bad as it is easy to construct tests against them

I am instead thinking that having small base is bad because the size of the alphabet is bigger than the base. I am not concerned about hacks, but if you are unlucky to get 0 or 1 as your base in your systest, then it is quite likely for you to fail those test?

→ Ответить

dacin21

5 лет назад, # ^ |

You have a good point: If you only worry about system tests and not about hacks, then I could see that a base smaller than the alphabet size is more likely to cause collisions. If we model system tests as uniformly random strings of various lengths, then for bases larger than the alphabet size, small strings won't collide and large strings will have approximately uniformly distributed hash values, so chance of collision might be around $$$\frac{1}{mod_1}$$$. Bases smaller than the alphabet size will have many collision on small strings, so they're worse.

Let's compute how unlucky you'd have to get: With $$$mod_1 = 10^9+7$$$, the probability of getting base below $$$256$$$ is only $$$2.5 \cdot 10^{-7}$$$. If you're comparing $$$10^5$$$ pairs of strings, then the probability of getting a random collision on random testdata is already $$$10^{-4}$$$, which is much higher. So I'd worry more about random collisions that about getting a small base.

Ideally you use multiple hashes with large modulus so that you're incredibly unlikely to get a collision even with hacks. Then you'll have an implementation for which you can prove that it will work on any input with high probability (and don't need to hand-wave about how you model system tests). For instance, if you use $$$4$$$ hashes with $$$mod = 10^9+7$$$ and randomized bases, $$$n = 10^5$$$, then the probability of a single comparison colliding is at most $$$\left(\frac{n}{mod}\right)^{4} \approx 10^{-16}$$$. If you're comparing $$$10^5$$$ strings, then the chance of a collision is still only $$$10^{-11}$$$ per test case. (If you use $$$mod=2^{61}-1$$$, then the probabilities are even better, but the implementation becomes tricky.)

→ Ответить

farmersrice

5 лет назад, # ^ |

← Rev. 2 →

I picked from range [300, mod — 1] (300 to avoid alphabet size problem) and it gave me WA test 5:

https://codeforces.me/contest/1200/submission/58714718

I submitted 3 more times and it worked all 3 times, is this just incredibly bad luck? Or I did something wrong elsewhere?

→ Ответить

dacin21

5 лет назад, # ^ |

← Rev. 3 →

Actually, now that I think about it, you want to use $$$[300, mod-300]$$$ to avoid the alphabet size problem. Having a base that is minus a small number is just as bad as having a small base.(Once again, the probability of a random collision is much higher than the probability of getting a base close to $$$0$$$ or $$$mod$$$, so this shouldn't really matter.)

Regarding your submission, I guess it's just bad luck. On a max-test, you're comparing close to $$$10^6$$$ pairs of strings, so the probability of a random collision on random strings is $$$10^{-3}$$$. (We don't know if the string is really random, so the chance might be higher or lower.) (If you'd use two hashes, that would drop to $$$10^{-12}$$$.)

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 26.11.2024 08:54:24 (l3).

Десктопная версия, переключиться на мобильную.

При поддержке