____**UPD:** Tricks to make unordered_map
faster added.
Hi!
What is unordred_map
?
It is a data structure like map
but it is more than 4 times faster than map
.you can use it in C++11 with including #include<unordered_map>
.for example:
#include<unordered_map>
using namespace std;
int main(){
unordered_map<int,int>mp;
mp[5]=12;
mp[4]=14;
cout<<mp[5]<<' '<<mp[4]<<endl;//prints: 12 14
}
Lets explain it more.
How it works?
Focus on unordered_set
for simplify.You can imagine that it has vector of vector like vector<vector<type> > mp
. Every time you insert value V
in that, it calculate its hash
(I will explain how it works), let hash(V)=K
; it inserts V
into mp[K]
(formally it calls mp[K].push_back(V)
).When you call mp.count(V)
it searchs for V
in mp[K]
.
map
VS unordered_map
(and set
VS unordered_set
)
1-unordered_map
is more than 4 times faster
Focus on problem 527E - Data Center Drama, it seems that it is good to use unordered_map
instead of map
.
My submission with map
: 14269000 Time:484 MS.
My submission with unordered_map
: 14269381 Time:358 MS.
Another good sample to show how fast is unordered_map
is problem 178C3 - Smart Beaver and Resolving Collisions:
My submission with map
: 15781810 Time limit exceeded on test 36 .
My submission with unordered_map
: 15774685 Accepted (Time:966 MS).
It seems that there isn't big gap between them, but in fact we know that unordered map
is more that 4 times faster than map
, I will explain it more later.
2-unordered_set
(and unordered_map
) is not sorted
Please note that set
(and map
) is sorted, for example *(s.begin()) is always smallest number in the set; but in unordered_set
it isn't. similarly there isn't lower_bound
and upper_bound
in unordered_set
(and unordered_map
similarly).
Creating hash function for structs
Now, one other problem remains, you can try to compile this code:
unordered_map<pair<int,int>,int>mp;
You will get Compilation Error! Why? See this page for unordered_map
supported types. For unsupported types, you have to create your own hash function for use. For example lets see how we can create a hash function for pair<int,int>
.
As you know any int
value is between -2^31+1 to 2^31-1.so if we create function that for every pair<int,int>
returns distinct value in type size_t
(alias of unsigned int
), it will be done. It is pretty easy: x.first^(x.second<<32)
is good. but be careful about BUG(OVERFLOW) (It's different from my friend NikaraBika :D );for having good hash function we use hash<long long>
.The code is looks like this:
struct HASH{
size_t operator()(const pair<int,int>&x)const{
return hash<long long>()(((long long)x.first)^(((long long)x.second)<<32));
}
};
unordered_map<pair<int,int>,int,HASH>mp;
Now you have a unordered_map
of pair<int,int>
(it isn't problem what is second member, in example it was int
).Creating hash function for other structs is same.
How to test unordered_map
is faster than map
?
You can test that for N=10^6, unordered_set
(unordered_map
) is more than 4 times faster than set
(map
) ;with this code:(compile code with command: g++ file.cpp -std=c++11 -O2)
#include <bits/stdc++.h>
using namespace std;
unordered_set<int>s;//replace it with set for test.
int main(){
auto T=clock();
for(int i=0;i<10000000;i++)
s.insert(i);
cout<<double(clock()-T)/CLOCKS_PER_SEC<<'\n';
}
Note1: Let your hash function H(V)
, it is better that H(V) returns distinct value for every distinct V
, it makes unordered_map
faster; but if you can't do that it doesn't problem. The only problem is that unordered_map
becomes slower(however it is better than map
).
Note2: Please be careful about your hash function time complexly! it is fun to use this hash function:
struct HASH{
size_t operator()(const pair<int,int>&x)const{
size_t ans=0;
for(int i=0;i<x.first;i++)
ans+=x.second;
return ans;
}
};
UPD I will explain hash<type>
in my next post.
UPD It seems that sometimes unordered_map
becames so slow.but it can improve with this two lines of code:
unordered_map<int,int>mp;
mp.reserve(1024);
mp.max_load_factor(0.25);
With this two lines unordered_map
become about 10 times faster. You can replace 1024
with another suitable power of two.(it depends on number of insert
s you will do).