djm03178's blog

By djm03178, 5 months ago, In English

It is already well-known that using unordered series in Codeforces is super dangerous, and it is not recommended to use them at all unless you know how to prevent them properly (well, even if you know, just using a regular map or set is more beneficial in most cases). Thanks to Blowing up unordered_map, and how to stop getting hacked on it, it is easy to access to the basics of hacking these hash-based structures. In this blog, I'm going to get into more details on how it works, and the ideas to hack them with various constraints on the variables.

The basics

As mentioned in the linked blog, gcc implements its hash table simply by modding the key with certain primes, and these primes vary between different versions of gcc. So, basically the easiest way to make a hack case is to put numerous distinct multiples of these primes. However, an important thing to note is that it's a list of primes instead of just one prime. So although the blog mentioned one prime for each of the two versions, its bucket size actually gets through a list of primes as rehashing keeps occurring and the mentioned ones are just on the middle of the list. Anyways, from now I'll call these primes evil primes and the multiples of such primes evil values.

First, let's slightly modify the insert_numbers function in the blog. We'll divide the function into the construction part and the testing part, so that we can easily use only the construction part when we actually hack. Note that unordered_map and unordered_set work exactly the same for the key part, so it doesn't matter which one we use.

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

vector<ll> make_evils(ll prime, ll n)
{
	vector<ll> ret(n);
	for (ll i = 0; i < n; i++)
		ret[i] = (i + 1) * prime;
	return ret;
}

unordered_set<ll> st;
void test_evils(const vector<ll>& evils)
{
	st = unordered_set<ll>();

	clock_t start = clock();

	for (ll x : evils)
		st.insert(x);

	clock_t end = clock();

	cout << fixed;
	cout.precision(3);
	cout << "Time: " << double(end - start) / CLOCKS_PER_SEC << " seconds" << endl;
}

int main()
{
	vector<ll> evils = make_evils(172933, 100000);
	test_evils(evils);
}

You can run this on custom test with G++20, and it will simply exceed the time limit. Changing the prime parameter to anything else, or using other version of G++ will make it run very fast.

The reason why it is slow is that when the modded key values are the same, the unordered structure has to compare the new key with every existing keys to see if there are no real duplicates. So, this process takes $$$\mathcal{O}(n)$$$ time each, and therefore inserting $$$\mathcal{O}(n)$$$ elements takes $$$\mathcal{O}(n^2)$$$ time in total.

What are the evil primes?

So we now know that $$$172933$$$ is one of the evil primes in G++20. But, this isn't enough to hack unordered series in many cases, due to the constraint on the key values. Typically, $$$10^9$$$ is the upper limit of the values in many problems, so there can be only $$$5782$$$ distinct multiples of $$$172933$$$ within this limit (excluding $$$0$$$). Let's say the constraint on the sum of the number of elements is $$$2 \cdot 10^5$$$. Can we simply repeat inserting $$$5782$$$ elements $$$34$$$ times to make it slow?

The answer is, no. It won't get rehashed enough to get to our desired bucket size, so inserting $$$5782$$$ elements takes just linear time. So this means we need to work with smaller evil primes. Here is the list of the notable evil primes for each version of G++:

G++14: 823, 1741, 3739, 7517, 15173, 30727, 62233, 126271, 256279
G++17: 709, 1493, 3209, 6427, 12983, 26267, 53201, 107897, 218971
G++20: 541, 1109, 2357, 5087, 10273, 20753, 42043, 85229, 172933

As we can see, the next bucket size is the smallest prime that is more than double of the current size in this list. With this list, we can finally make a sequence of evil values within the constraint.

For example, we can insert $$$48185$$$ distinct multiples of $$$20753$$$ in G++20, and it takes approximately $$$1.3$$$ seconds. If the limit of the number of elements in all test cases is $$$2 \cdot 10^5$$$, we can have $$$4$$$ such tests. This will make it run for around $$$5.2$$$ seconds, which is enough to exceed the time limit for most of the problems.

    for (ll i = 0; i < 4; i++)
    {
        vector<ll> evils = make_evils(20753, 48185);
        test_evils(evils);
    }

But... is it really the worst?

In fact, we can make it even more evil. When we insert $$$48185$$$ multiples of $$$20753$$$, not all $$$48185$$$ elements are making it slow. Inserting an element is slow only when the current bucket size is our desired bucket size, and it is not slow before and after we step on that bucket size. So, to find the perfect spot, we need to know when exactly rehashing occurs.

Let's try these two cases:

    vector<ll> evils = make_evils(42043, 20753);
    for (ll i = 0; i < 100000; i++)
        evils.push_back(42043);
    test_evils(evils);

Output: Time: 0.002 seconds

    vector<ll> evils = make_evils(42043, 20754);
    for (ll i = 0; i < 100000; i++)
        evils.push_back(42043);
    test_evils(evils);

Output: Time: 5.096 seconds

The only difference between these two is that the latter inserted just one more evil value to it, but it made a huge difference in the execution time. Why did this happen? Notice that $$$42043$$$ is the next evil prime to $$$20753$$$. So we can assume that rehashing happened just when the number of elements inserted exceeded the current bucket size, and therefore $$$42043$$$ became the evil prime.

So, we can see that if we're using $$$20753$$$ as the evil prime inserting more than $$$20753$$$ elements will no longer have the slowdown effect. Therefore, we can reduce the number of elements inserted to $$$20753$$$, and put $$$9$$$ test cases instead to make it run for around $$$11.7$$$ seconds. Further doing the evil thing, we can also fill the rest of the test cases with smaller evil values, or just spam other small cases.

    for (ll i = 0; i < 9; i++)
    {
        vector<ll> evils = make_evils(20753, 20753);
        test_evils(evils);
    }

If duplicated keys are allowed

This is something I've yet to fully discover and I need more research on it. But at least there is something I can say now.

We can make it even worse if duplicates are allowed. In the experiments in the above section, after inserting distinct evil values, I kept inserting more of the evil prime. In this case, it kind of worked. But what if you insert other evil values? Check these two versions out:

    vector<ll> evils = make_evils(42043, 20754);
    ll x = evils[evils.size() - 2];
    for (ll i = 0; i < 100000; i++)
        evils.push_back(x);
    test_evils(evils);

Output: Invocation failed [TIME_LIMIT_EXCEEDED]

    vector<ll> evils = make_evils(42043, 20754);
    ll x = evils.back();
    for (ll i = 0; i < 100000; i++)
        evils.push_back(x);
    test_evils(evils);

Output: Time: 0.003 seconds

Surprisingly, using the second to last element took far longer than using the very last element. So I think we can assume that it has to do with the insertion order, but I'm not too sure about the exact rule. For example, running this in G++17 with its evil primes shows very different behavior, where using the second to last element is as fast as using the very last element while using the first element or something in the middle was far more slower. If someone knows about this in detail, please tell me in the comments. But anyways, this opens a huge possibility in making worse tests in problems that allow duplicates, because this means we don't need to make several tests that initializes the unordered structure every time and instead we can keep inserting an evil value in its worst state possible.

When the constraints are tough

(This part is slightly modified for generalization, thanks pajenegod for pointing this out.)

Sometimes, we have very small range of elements available, such as only up to $$$m \le 10^6$$$. In these cases, it's pretty hard to make a proper hack case since we can't really insert a lot of distinct elements with the same modded key value. But depending on the problem and the code, it can still be slowed down to some extent.

Suppose the number of elements is $$$n$$$ and the problem allows duplicates, and there is an evil prime that is around $$$\sqrt{m}$$$. This means we can insert $$$\mathcal{O}(\sqrt{m})$$$ elements to get the desired bucket size. From now on, we can insert the same worst evil value for the rest of $$$\mathcal{O}(n) - \mathcal{O}(\sqrt{m})$$$ elements. Therefore, it takes $$$\mathcal{O}(n \cdot \sqrt{m})$$$ time in total. In practice, this isn't slow enough to hack normally, but it still depends on case by case. For example, you can try analyzing what my effort was to make https://codeforces.me/contest/1985/hacks/1049559 this hack.

To summarize, the worst time complexity we can achieve with $$$n$$$ elements with duplicates in range $$$[1, m]$$$ is $$$\mathcal{O}(n \cdot \min(n, \sqrt{m}))$$$, as each element can make at most $$$\mathcal{O}(\sqrt{m})$$$ collisions for each insertion. Without duplicates allowed, it is meaningless to insert more than $$$\sqrt{m}$$$ elements. Therefore, when multiple test cases are allowed it is best to insert $$$\mathcal{O}(\sqrt{m})$$$ elements so that it becomes $$$\mathcal{O}(m)$$$ time complexity for each test case, and make $$$\mathcal{O}({n_{sum} \over \sqrt{m}})$$$ such cases to achieve $$$\mathcal{O}(n \cdot \sqrt{m})$$$ in total (but with much smaller constant).

Another attack: the clear() bug

There has been a long-living bug in gcc's unordered series, and it has to do something with the clear() method. By standard, performing clear() should only take time proportional to the number of elements (i.e. size()), but in fact it takes time proportional to its bucket size as well. If this bucket size was also initialized properly, then it would have no issue. But unfortunately, it does NOT initialize its bucket size, and this leads to a huge vulnerability. Let's take a look at an example.

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

unordered_set<ll> st;
int main()
{
    for (ll i = 0; i < 200000; i++)
        st.insert(i);
    for (ll i = 0; i < 100000; i++)
        st.clear();
}

This simple code takes 5952 ms in G++20!!! As you can see, it simply inserts $$$2 \cdot 10^5$$$ elements in the set, and then just calls clear() $$$10^5$$$ times. No hash attack is involved in this. It simply increased its bucket size, and let each clear() to iterate through the whole bucket every time.

So when the number of test cases is enough, any code that initializes the set with clear() can be hacked with a case that inserts a lot of distinct elements, and then spams the smallest case possible. If you're the coder and want to avoid this, then you need to initialize the set simply by constructing a new object, i.e. st = unordered_set<ll>();. I wonder why they haven't fixed this bug till this day...

  • Vote: I like it
  • +245
  • Vote: I do not like it

»
5 months ago, # |
  Vote: I like it +9 Vote: I do not like it

Thanks for the informative blog. My question is, what if you implement your own hash for unordered_map, can you really hack it? For example in this blog link.

  • »
    »
    5 months ago, # ^ |
      Vote: I like it +11 Vote: I do not like it

    It's impossible when the custom hash involves unpredictable random seed. If the random generator is weaker (i.e. using time()) then you can probably precalculate the numbers and find the ones with all colliding mod values.

    • »
      »
      »
      5 months ago, # ^ |
        Vote: I like it +9 Vote: I do not like it

      I asked about this question, it’s just that my friend’s hash function was hacked from his blog, here’s the code link map code goes through.

      • »
        »
        »
        »
        5 months ago, # ^ |
          Vote: I like it +8 Vote: I do not like it

        If you see the hack case, it's certainly related to the last paragraph in my blog. Check it out.

        • »
          »
          »
          »
          »
          5 months ago, # ^ |
            Vote: I like it +9 Vote: I do not like it

          I removed clear() and it went away. Thanks for your help. An interesting case with clear() . I didn't think that clear() would slow down the code so much. I thought that clear() cleans like map. But it turns out it takes time proportional to the size of its container, and I thought that the size works like in map. Thanks for the blog. Very interesting.

      • »
        »
        »
        »
        5 months ago, # ^ |
        Rev. 3   Vote: I like it +14 Vote: I do not like it

        I know where this custom hash it was from https://codeforces.me/blog/entry/62393

        struct custom_hash {
            static uint64_t splitmix64(uint64_t x) {
                // http://xorshift.di.unimi.it/splitmix64.c
                x += 0x9e3779b97f4a7c15;
                x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
                x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
                return x ^ (x >> 31);
            }
        
            size_t operator()(uint64_t x) const {
                static const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();
                return splitmix64(x + FIXED_RANDOM);
            }
        };
        
    • »
      »
      »
      4 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      OK,I'll use random_device to make my hash,and it'll be safe for me to use unordered_map

»
5 months ago, # |
Rev. 2   Vote: I like it +11 Vote: I do not like it

If the maximum element allowed in the map is 200,000 and doesn't allow duplicates, how can you hack it? I've seen some umap hacks on problems where the maximum element is 200,000, but have never been able to pull it off myself.

  • »
    »
    5 months ago, # ^ |
      Vote: I like it +39 Vote: I do not like it

    If the maximum element allowed in the map is 200,000, you can use an array.

    • »
      »
      »
      5 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I'm aware of this... but there are certain users who use unordered_map all the time and I want to learn how to hack them.

  • »
    »
    5 months ago, # ^ |
      Vote: I like it +7 Vote: I do not like it

    If it's not related to the last paragraph, then I have no idea. I need to see an actual example to figure it out.

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Thank you for writing quality stuff recently! I hope you get into top contributors soon :D

»
5 months ago, # |
  Vote: I like it +3 Vote: I do not like it

Suppose the problem allows duplicates, and there is an evil prime that is around $$$n$$$. This means we can insert $$$O(\sqrt n)$$$ elements to get the desired bucket size. From now on, we can insert the same worst evil value for the rest of $$$O(n)−O(\sqrt n)$$$ elements. Therefore, it takes $$$O(n^{1.5})$$$ time in total.

More precisely, if $$$m$$$ is the upper bound on the input (typically $$$10^9$$$) and duplicates are allowed, then the optimal choice for evil prime is around $$$\sqrt m$$$. This results in a time complexity of $$$min(n^2, n \cdot \sqrt m)$$$.

  • »
    »
    5 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I only realized this generalization like yesterday, so thanks for pointing it out. I'll add this to the blog soon.

»
3 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Why are these some specific primes that cause trouble and these are the primes I always get stuck into... It really helped me a lot. Thanks man..........<3

»
3 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Hi! Thanks for the blog!

I'm curious about your opinion on if I make own implementation of hash map, how good is it and can it still be hacked? I wrote a code for 1850G - The Morning Star which is using the map. Here are the results:

  • Using std::map (1015ms) — 287908540
  • Using std::unordered_map (671ms) — 287914358
  • Using __gnu_pbds::gp_hash_table (249ms) — 287913968
  • Using own implementation HashMap (218ms) — 287911417