Arpa's blog

By Arpa, 9 years ago, In English

Hi!

After my previous post about unordered_map now I want to explain hash functions.

std::hash.

C++ STL has one hash function in library <functional>. You can use it for this data types:

template<> struct hash<bool>;
template<> struct hash<char>;
template<> struct hash<signed char>;
template<> struct hash<unsigned char>;
template<> struct hash<char16_t>;
template<> struct hash<char32_t>;
template<> struct hash<wchar_t>;
template<> struct hash<short>;
template<> struct hash<unsigned short>;
template<> struct hash<int>;
template<> struct hash<unsigned int>;
template<> struct hash<long>;
template<> struct hash<long long>;
template<> struct hash<unsigned long>;
template<> struct hash<unsigned long long>;
template<> struct hash<float>;
template<> struct hash<double>;
template<> struct hash<long double>;

For example:

hash<int>hi;
int x=69;
cout<<hf(x)<<endl;
hash<long double>hld;
long double y=69.6969696969;
cout<<hld(y)<<endl;
hash<vector<int> >hv;
vector<int>v({69,69,69,69});//v→ 69,69,69,69
cout<<hv(v)<<endl;
Hand-made hash functions.

There are many ways to create hash function for your struct. For example:

struct reval{
  vector<int>v;
  int n;
  string s;
  size_t hash(){//size_t is alias of unsigned int
    hash<string>hs;
    hash<long long>hll;
    hash<int>hi;
    hash<vector>hv;
    long long ans=hs(s);
    ans<<=32;
    ans|=hi(n);
    ans=hll(ans);
    ans<<=32;
    ans|=hv(v);
    ans=hll(ans);
    return ans;
  }
}

A simple example:(be careful;it is only a sample and it isn't good hash function of course)

struct reval{
  vector<int>v;
  int n;
  string s;
  size_t hash(){//size_t is alias of unsigned int
    hash<string>hs;
    hash<int>hi;
    hash<vector>hv;
    return hs(s)+hv(v)+hi(n);
  }
}

A trick:

struct S{
  string first_name;
  string last_name;
};
namespace std{
    template<>struct hash<S>{
        size_t operator()(S const& s) const{
            size_t h1=hash<string>()(s.first_name);
            size_t h1=hash<string>()(s.last_name);
            return hash<long long>()( (h1<<32)^h2 );
        }
    };
}
int main(){
    S s;
    s.first_name="MohammadSina";
    s.last_name="PakSeresht";
    cout <<"hash(s) = "<<hash<S>()(s)<<endl;
}

Note1: You can use hash<type>()(variable), like this:

int x=69;
cout<<hash<int>()(x)<<endl;

Instead of:

int x=69;
hash<int>hasher;
cout<<hasher(x)<<endl;

Note2: please be careful about combining two hashes. There are two good ways:

1-x*P+y mod M(when P is prime number (like 43) and M is less than 2^32 and not a power of 2 (like 10^9+7)).

2-hash<long long>()((x<<32)^y) or hash<long long>()(x*1000000007LL+y).

  • Vote: I like it
  • +5
  • Vote: I do not like it

»
9 years ago, # |
  Vote: I like it +17 Vote: I do not like it

Both x·232 + y and x + y are bad ways to combine hashes.

If you use x·232 + y, the (32-bit) hashes of (x, y) pairs such as (1, 1), (2, 1), ..., (k, 1) will all be equal. As already discussed in your previous post.

If you use x + y, the hashes of (x, y) pairs such as (1, 109), (2, 109 - 1), ..., (109, 1) will all be equal.

Both are rather common cases which can, and will, make your program slow.

For programming contests, when you combine hashes of x and y, some kind of polynomial hashing like for relatively prime P and Q is usually sufficient. Still, take care when making Q a power of two.

  • »
    »
    9 years ago, # ^ |
    Rev. 2   Vote: I like it -9 Vote: I do not like it

    I wrote them in that way for simplify.

    Another way is to use hash<long long>()( (x<<32)^y ).

    +post has edited.

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Sorry, but some examples are so particularly bad that they are worse than having no example at all.

      The hash<long long>() of some mix of x and y approach looks like it might actually be a good one. Still, it surely behaves differently in 32-bit and 64-bit mode, even when there is no undefined behaviour in the mix itself. So, it might be difficult to debug when you compile to 64-bit and the judges test a 32-bit version.

»
9 years ago, # |
  Vote: I like it 0 Vote: I do not like it

template<> struct hash;

o0 That looks rather dangerous :P.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    Can you explain it more please?

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      Zonk. < and > are some HTML thingies and I wrote

      <double>
      

      but it disappeared xd. From that it is clear, why you wanted me to explain :D.

      By previous post I meant that we should never expect doubles to be equal (even if they were initialized by the same hardcoded formula). Hence, we should never expect hashes of doubles to be equal, so it is probably useless in that case. Am I right?

      • »
        »
        »
        »
        9 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Yes. std::hash<double> is only useful for exact calculations.

»
9 years ago, # |
  Vote: I like it 0 Vote: I do not like it

define 69 49

»
5 years ago, # |
Rev. 5   Vote: I like it +6 Vote: I do not like it
C++ STL has one hash function in library <functional>. You can use it for this data types:

template<> struct hash<bool>;
template<> struct hash<char>;
template<> struct hash<signed char>;
template<> struct hash<unsigned char>;
template<> struct hash<char16_t>;
template<> struct hash<char32_t>;
template<> struct hash<wchar_t>;
template<> struct hash<short>;
template<> struct hash<unsigned short>;
template<> struct hash<int>;
template<> struct hash<unsigned int>;
template<> struct hash<long>;
template<> struct hash<long long>;
template<> struct hash<unsigned long>;
template<> struct hash<unsigned long long>;
template<> struct hash<float>;
template<> struct hash<double>;
template<> struct hash<long double>;

But you yourself showed hashing of vector and string using hash<vector<int>> and hash<string>.

Edit:

I found out that more template specializations of hash class exist in the namespaces <string>, <memory>, <vector>, <bitset>, <system_error>, <type_index> and <thread> as well. So hash<vector<int>> and hash<string> are defined in the <vector> and <string> namespaces respectively, not in <functional> namespace.

»
5 years ago, # |
Rev. 8138   Vote: I like it -26 Vote: I do not like it

It seems like you are watching POR**UB(because you wrote 69 too many times).

»
5 years ago, # |
Rev. 3   Vote: I like it -12 Vote: I do not like it

 Seems like it is not working.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hash doesn't directly support vector of any data type other than bool. If it does then unordered_map would be supporting a vector as a key (which unfortunately doesn't).

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      But what about this article, how author uses it?

  • »
    »
    5 years ago, # ^ |
    Rev. 3   Vote: I like it +1 Vote: I do not like it

    This is why it is important to read documentation on "something" before jumping in to use that "something". Your code is invalid, not that you can't use hash for vector.

    It should look something like this: cout << hash <vector <bool>>()(vector <bool>(10, false));

    Firstly, hash does not take a vector in constructor. So, you need to declare a temporary object and then pass a to use the hash operator()(vector<bool>) overload.

    Secondly, I get errors for trying to use vector <int> instead. The error:

    Error

    This results into two cases: either the author gave a wrong example or I haven't read more documentation myself. The latter seems to be the issue according to me as I've not read much myself. But, the former may be the case too. Can't tell much unless I read further.