Hey all, I know that this is a very common topic and you are probably quite turned off by it but i still don't quite understand it. Please take the time to read through this post. Thanks a lot!
During the contest yesterday, I was solving 3SUM — closure: https://codeforces.me/contest/1698/problem/C I figured out the logic to the question and started implementing it. I used unordered_map — passed the pretests — but then TLE'd on the main tests. This was my submission: https://codeforces.me/contest/1698/submission/162250574
But then, i made some changes to this code and got AC. This was my next submission: https://codeforces.me/contest/1698/submission/162252261
Notice the difference between these two pieces of code? Yes! I used a map instead of unordered_map. Now, I have read many threads and articles advising to use maps instead of unordered_maps in my life, for all sorts of reasons, like slow hash functions, frequent collisions which cause the use of buckets which slow the unordered_map down, etc. But in this case, the only thing I'm doing is inserting and using the unordered_map::count function. are these not examples of where the unordered_map's O(1) time complexity should beat the map's O(logN)? Unlike, for example, iterating through the map, in which case, using a map instead of unordered_map would be faster? I was specifically asking myself this question during the contest and because of the fact that this was the only thing I was doing, I chose to use unordered_map — which turned out to stab me in the back and lose me quite a bit of rating. Basically, my question is, during competitive programming, is there EVER a time in which I would want to use unordered_map instead of map?
It could be found at https://codeforces.me/blog/entry/62393.
My personnel suggestion : Never use unordered_map on codeforces.
Use map instead.
How will you solve problems involving O(1) access of large values, where an extra O(logN) factor would be too slow then?
gp_hash_table
True but that is a special and rare occasion. (at least in my short CP career I never meet things like that)
If map cannot fit in the time limit,I will use unordered_map and make some additional hashes to avoid hacks,which means more code.
You leave contest and shit on problemsetter in comments in that case
It's sometimes useful for attempting to squeeze a suboptimal solution through TL (hopefully with a custom written hash function), especially for some classic string/hashing problems on old judges. But it's quite rare to find a problem that requires or expects an unordered_map solution.
In your case, it looks like authors decided to create a test, which kills solutions with unordered maps (every operation works in O(n)). Easiest way to prevent this is using randomized hash functions like splitmix64. You can read about these in another blogs. In other cases, unordered maps are much faster than maps, so you should always use them :)
I disagree with jiangbowen, basically the choice of
std::map
preferred overstd::unordered_map
is unjustified if you don't exploit the ordered nature ofstd::map
(e.g. by choosing the minimum elementstd::map::begin()
, or finding the nearest element to a value viastd::map::lower_bound()
), becausestd::unordered_map
is simply quicker. The problem is that you may fall victim to an antihash test. I would like to believe that usually antihash tests are made in the hacks, and I view it as bad manners to include antihash tests in the problem being a problemsetter. However, this sometimes happens, and if it happens, you should be ready. (This is an actual disadvantage ofstd::unordered_map
becausestd::map
has the guaranteed logarithmic time per query, andstd::unordered_map
has constant time per query on average. But choosingstd::map
simply because of this is somewhat conformist's way of thinking.)Xephon remembered just the right blog. Along with the deep analysis of the issue and its cause, a hotfix is given there. You can simply include it in your C++ template and always, without thinking, attach a custom hashing function every time you need an unordered container: https://codeforces.me/contest/1698/submission/162271598
Let's see if this is true.
And your submission gets 78ms time, which is far from impressive. Everything boils down to having a choice between:
std::unordered_map
solution with a low constant factor, which can degrade to O(N) in the worst case if hacked: https://codeforces.me/contest/1698/submission/162388677std::unordered_map
solution with a high constant factor because of custom hash: https://codeforces.me/contest/1698/submission/162271598std::map
solution with a low constant factor: https://codeforces.me/contest/1698/submission/162387139So I agree with jiangbowen and I also think that
std::map
is a reasonable default choice. Butstd::unordered_map
still can be faster on larger data sets. And if you are worried about a possible TLE, then it may be a good idea to benchmark a solution via custom test before submitting it.Are you sure that the solution No. 2 is with a high constant factor? Because of difference between 78 ms in 162271598 and 46 ms in 162387139? It is a way too little difference! The work time may fluctuate at these scales. Please compare
std::map
andstd::unordered_map
in a more computation-heavy problem.Interestingly, both your solution — with std::unordered_map with custom hash — and my solution — with just std::map — both took 78 ms. Kind of weird if you ask me. Why do they take exactly the same amount of time?
All submission execution durations on Codeforces are divisible by $$$\sim15.5$$$ ms. So even if our execution times varied within 10 ms, after rounding they might become exact same numbers.
Oh i see. didnt know that.
Yes, the splitmix64 function is pretty expensive because it arranges many arithmetic operations (including somewhat slow 64-bit multiplications) in a dependency chain.
Sorry, but you can't dismiss the difference between 46 ms and 78 ms as too little. That's a bit too much of handwaving.
I already mentioned that std::unordered_map will be faster on larger data sets because of better time complexity. But we are unlikely to see really large input data fed through stdin in competitive programming problems, because console input is a performance bottleneck on its own even with the
ios_base::sync_with_stdio(0); cin.tie(0);
magic. And on smaller data sets the std::unordered_map's performance advantage is not always guaranteed. I'm just saying that your suggestion to use unordered containers "without thinking" isn't always the best choice.Hey guys, thank you so much for your help. Just a side question tho — do you think that there could be a time where a std::map could still outperform the std::unordered_map with custom hash functions (for example the one orz added to my solution) given that you are only looking-up and inserting into the map?
Edit: I believe that antihashing tests were the reason that my code TLEd as the unordered_map solution took over 1 second in one of the tests whereas the maximum time the map took in all the tests was only 78ms
I guess there is a chance for
std::map
to outperformstd::unordered_map
. When there are few elements in the container,std::map
's logarithm is smaller thanstd::unordered_map
's constant factor: https://quick-bench.com/q/89itr4dl3zrR3dVnLcjlmuwAtzA https://quick-bench.com/q/Q-M_AWVMOxufcyYt2iZy0i3WSPMI send your code with c++20 compiler and got AC. It seems they improve the default hash function, can anyone provide some documentation to corroborate this?
Submition
Hello, so do you think I can use unordered_map with c++20 compiler without any worries? Is it still possible for me to get hacked?
I believe that you cannot use
std::unordered_map
with C++20 compiler without any worries.Moreover, let's distinguish C++ standard version from C++ compiler version. 20 is the number of standard, and standard only imposes some certain restrictions on compiler that claims to comply with this standard. There are no differences in behavior of
std::unordered_map
stated in the 17th standard and in the 20th standart, so the only difference is in compiler implementation ofstd::unordered_map
. С++20 version could equally well perform worse than C++17 version.So ideally we should look into the implementation code of hashing in GNU C++20, and probably there is no serious difference between the old version and the new one. The vulnerability is pretty much the same, I guess, and the countertest can be generated with the same generator, but with somewhat tweaked constants in it.
But one cannot say for sure until they in fact look into the code.
Ok thanks for the info!
Sometimes, unordered maps are much faster than maps so you should choose wisely what to use
Use All on Problems you Solve with Map or Unordered Map(With Custom Hash) or gp_hash_table(With Custom Hash) And Analyze The performance.
Personally I use Map (most of the Times)
Situation 1 : Once I used Unordered Map(With Custom Hash) It gave me TLE I tried multiple Types of hashes Still It Didn't Worked. With Map It got Accepted.
TLE With Unordered Map Only (Execution Time >=2000 ms):162435874
TLE With Unordered Map + Custom hash (Execution Time >= 2000 ms):162435379
Accepted With Map (Execution Time == 702 ms):162435610
Accepted With gp hash table (Execution Time == 717 ms):162435952
Accepted With gp hash table + custom hash (Execution Time == 686 ms):162435482
Situation 2 : Once I Used Map It Gave me TLE but When I used unordered map It got Accepted.
With Experience You Will Know What To use When.
Your custom hash is lightweight and fast, but vulnerable:
Doing XOR with a random key doesn't help to safeguard against Hash DoS hacks, because this only changes the bucket number. All inserted values still go to the same hash table bucket and time complexity still degenerates to O(N). A more heavyweight custom hash (such as the one suggested by orz) is necessary for real Hash DoS resistance, but it will impact performance to some extent and this will show up in benchmarks too.
No examples for this situation 2?
Check Out This : 158081683 I Took This Custom Hash From neal's blog
Actually Situation 2 Was Long Time ago so I Kind of forgot which problem It was But I still remember What I've learned From that Problem :)
The reason for TLE in this submission is the
mpp.clear();
operation. The std::unordered_map container allocates many buckets, but never reduces their number later (see the unordered_map::rehash documentation). So every clear operation has to go over all these allocated buckets and this is very slow. A workaround is to assign an empty container instead of clearing the old one: 162486169 (218 ms).Removing custom hash makes it somewhat faster 162486256 (187 ms), but leaves vulnerable to hacks.
All of this needs to be compared to std::map 162480822 (202 ms), which is slightly faster than the use of std::unordered_map with custom hash at least for this problem.
BTW, changing
endl
to"\n"
reduces time to 202 ms: 162480822