C++ STL: Order of magnitude faster hash tables with Policy Based Data Structures

Правка en4, от Chilli, 2018-07-20 23:34:11

TL;DR

The Policy Hash Table has 20-30% faster linear insertion/deletion, equal performance for random insertion/deletion, and 3-10x increase for writes/reads. Only downside is that clear is slower on a Policy Hash Table with random insertion order.

Background

I've often been irritated by how slow unordered_map is in C++. Too often, I have something that runs fast enough in terms of complexity, but the constant factor from unordered_map slows down the solution too much. Yesterday though, after using the useful order statistics tree from https://codeforces.me/blog/entry/11080, I was curious if there were any other useful data structures hiding in the Policy STL. And lo and behold, I found a hash table.

Benchmarks

Well, enough backstory, let's look at some numbers. All benchmarks below are compiled with -O2.

unordered map linear insertion:                                 0.816795
policy hash table linear insertion:                             0.442094

unordered map linear read/write:                                1.91384
policy hash table linear read/write:                            0.216669

unordered map random insertion:                                 2.98216
policy hash table random insertion:                             3.13651

unordered map random read/write:                                2.12804
policy hash table random read/write:                            0.356518

While for linear insertions, the policy hash table gives modest improvements, the policy hash table blows the unordered_map out of the water when it comes to reads/writes. These are order of magnitude improvements that make hash tables usable when they previously weren't.

Example

At this point, you may still be skeptical. Benchmarks of course, don't always reflect on the real world. So here's an example of it being used to speed up a previously too slow solution.

Example problem: http://codeforces.me/contest/264/problem/C
Solution with unordered_map: http://codeforces.me/contest/264/submission/40542899 (TLE on test case 8)
Solution with policy hash table directly substituted in: http://codeforces.me/contest/264/submission/40573491 (TLE on test case 26)
Solution with policy hash table and rewritten to not use clears: http://codeforces.me/contest/264/submission/40574196 (AC with max time of 3180 ms)

Usage

To use this data structure:

#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
typedef cc_hash_table<int, int, hash<int>> ht;

From there, the API seems almost exactly the same.

Thanks to adamant for his post that revealed to me the existence of policy data structures, and thanks to ed1d1a8d for the discussion.

PS: In other posts for unordered_map, I've seen people claim that reserve and max_load_factor could increase performance drastically. They didn't seem to do much for me. However, if you want to do something similar for these hash tables, use typedef cc_hash_table< int, int, hash<int>, equal_to<int>, direct_mask_range_hashing<int>, hash_standard_resize_policy<hash_exponential_size_policy<>, hash_load_check_resize_trigger<true>, true>> ht;, and you should be able to resize and set load factor manually as well.

Code for the benchmarks can be found here: https://ideone.com/ZkNMbH

Теги hashtable, c++, policy based, unordered_map

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en25 Английский Chilli 2020-04-08 21:39:21 204
en24 Английский Chilli 2018-12-20 19:59:40 19
en23 Английский Chilli 2018-08-26 22:29:49 97
en22 Английский Chilli 2018-07-23 11:48:12 5
en21 Английский Chilli 2018-07-23 11:26:11 64
en20 Английский Chilli 2018-07-23 11:23:29 657
en19 Английский Chilli 2018-07-23 10:23:47 198 Tiny change: ' clears:]( http://cod' -> ' clears:](http://cod'
en18 Английский Chilli 2018-07-23 09:17:15 390
en17 Английский Chilli 2018-07-21 23:50:59 504
en16 Английский Chilli 2018-07-21 05:33:02 10
en15 Английский Chilli 2018-07-21 05:31:38 4
en14 Английский Chilli 2018-07-21 05:20:37 2
en13 Английский Chilli 2018-07-21 05:18:56 5
en12 Английский Chilli 2018-07-21 04:59:02 842
en11 Английский Chilli 2018-07-21 01:50:28 0 (published)
en10 Английский Chilli 2018-07-21 01:08:04 136
en9 Английский Chilli 2018-07-21 01:04:50 45
en8 Английский Chilli 2018-07-21 00:42:45 21
en7 Английский Chilli 2018-07-21 00:41:07 99
en6 Английский Chilli 2018-07-21 00:40:10 24
en5 Английский Chilli 2018-07-21 00:37:56 25
en4 Английский Chilli 2018-07-20 23:34:11 41
en3 Английский Chilli 2018-07-20 23:21:31 2598 Tiny change: 'n\n\n~~~~~\n\nunordered ' -> 'n\n\n~~~~~unordered '
en2 Английский Chilli 2018-07-20 23:01:35 4
en1 Английский Chilli 2018-07-20 23:01:11 1442 Initial revision (saved to drafts)