Are the hash maps, using open addressing strategy ? #115

Fr3DBr · 2021-11-03T17:55:30Z

Fr3DBr
Nov 3, 2021

Hi, guys. In contrary to std::unordered_map, are the hashmaps provided by this library, currently using open addressing strategy to speed up the .find() lookup ?

greg7mdp · 2021-11-03T18:27:22Z

greg7mdp
Nov 3, 2021
Maintainer

Yes, absolutely, phmap hash maps use open addressing, collisions are resolved by probing, and not via a linked list as std::unordered_map.

0 replies

Fr3DBr · 2021-11-03T18:44:45Z

Fr3DBr
Nov 3, 2021
Author

So I can assume, that find is many times faster in comparison ? I'm primarily using the phmap::node_hash_map in replacement for std::unordered_map.

Generally, we have 24 threads running, each of it, with it's own map. I'm storing uint32 keys (ipv4 basically) w/ each key, linking to an object that's around 3 KB of size (We can expect up to 500K objects at peaks, and around 10k in lowest moments of usage). The idea, is to avoid time spent in "misses" for keys that doesn't exists, sometimes we may hit up to 40.000.000 (1.6M/thread) lookups/second, hence why my question. :)

0 replies

greg7mdp · 2021-11-03T18:56:06Z

greg7mdp
Nov 3, 2021
Maintainer

So I can assume, that find is many times faster in comparison ?

Probably not many times faster in your use case, but faster.

Generally, we have 24 threads running, each of it, with it's own map.

Sounds good. Is each thread specialized for a IP subset? How is this subset determined? Do the threads look at all IPs and only process those matching the subset the specialize for?

2 replies

Fr3DBr Nov 3, 2021
Author

So I can assume, that find is many times faster in comparison ?

Probably not many times faster in your use case, but faster.

Generally, we have 24 threads running, each of it, with it's own map.

Sounds good. Is each thread specialized for a IP subset? How is this subset determined? Do the threads look at all IPs and only process those matching the subset the specialize for?

Any thread, can actually find any IP, please note we're speaking about /32 entries only, we're not masking or performing subnet lookups, but instead, the specific /32 targets (so full uint32 range), always. So basically, we keep 24 hashmaps, with the same IP entries in it. Just that we wanna avoid locking/synching when it's not necessary, except for insert/erases when this is justified/needed hehe.

Fr3DBr Nov 3, 2021
Author

Anyways, the main reason, trying to go with Open Adressing, is that since sometimes, the map may contain, sequential addresses such as : 1.1.1.1 / 1.1.1.2 / 1.1.1.3 / 1.1.1.4 / 1.1.1.5 / 1.1.1.6, in a regular map, this would represent linear looping to find the correct key, and this increases exponentially the lookup time up to many elements in a bucket and excessive collisioning.

For me, the best would be, if there could be a way, to reduce the lookup time as much possible as close to an array, that's O(1) by it's nature, but I know that not everything are sweet flavors hehe... But the regular std::unordered_map, performs very slowly, with up to 10K elements in this situation...

I could achieve this for "destination addresses", because they're "predictable" so using a regular array as "caching" is neat in such scenario. But the problem are the "source addresses" that can't be predicted and this case, I'm in dependancy of the container lookup speed basically.

greg7mdp · 2021-11-03T20:05:34Z

greg7mdp
Nov 3, 2021
Maintainer

So basically, we keep 24 hashmaps, with the same IP entries in it. Just that we wanna avoid locking/synching when it's not necessary, except for insert/erases when this is justified/needed hehe.

I believe you can do much better and avoid duplicating entries. Even if memory usage is not an issue, this can improve memory locality and cache misses also improving performance. See the paragraph "Using the intrinsic parallelism of the parallel_flat_hash_map to insert values from multiple threads, lock free" there.

3 replies

Fr3DBr Nov 3, 2021
Author

Ok will do, did you see my second comment, in regards of lookup performance ?

greg7mdp Nov 3, 2021
Maintainer

Ok will do, did you see my second comment, in regards of lookup performance ?

Yes, I don't think open addressing will make a difference for the lookup. Even if the keys are sequential it will not help because the hash value is taken modulo the array size, and even it it was not the case internally phmap multiplies the hash with a large prime number. However still phmap will perform better than std::unordered_map I believe.

Fr3DBr Nov 3, 2021
Author

Ok will do, did you see my second comment, in regards of lookup performance ?

Yes, I don't think open addressing will make a difference for the lookup. Even if the keys are sequential it will not help because the hash value is taken modulo the array size, and even it it was not the case internally phmap multiplies the hash with a large prime number. However still phmap will perform better than std::unordered_map I believe.

I think, I may better make a bit array w/ the whole uint32 size and check for true/false with a very fast speed, before relying with container lookup then, because since sometimes I may have up to 40M/requests and 99% of them are for entries that doesn't exists, this must make things much faster in my case...

greg7mdp · 2021-11-03T20:38:01Z

greg7mdp
Nov 3, 2021
Maintainer

If you store only 500K objects max, you can use a bloom filter instead of a full 4 Gbit array, will be even faster and will allow to skip most queries for non-existing entries.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are the hash maps, using open addressing strategy ? #115

{{title}}

Replies: 5 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Are the hash maps, using open addressing strategy ? #115

Fr3DBr Nov 3, 2021

Replies: 5 comments · 5 replies

greg7mdp Nov 3, 2021 Maintainer

Fr3DBr Nov 3, 2021 Author

greg7mdp Nov 3, 2021 Maintainer

Fr3DBr Nov 3, 2021 Author

Fr3DBr Nov 3, 2021 Author

greg7mdp Nov 3, 2021 Maintainer

Fr3DBr Nov 3, 2021 Author

greg7mdp Nov 3, 2021 Maintainer

Fr3DBr Nov 3, 2021 Author

greg7mdp Nov 3, 2021 Maintainer

Fr3DBr
Nov 3, 2021

Replies: 5 comments 5 replies

greg7mdp
Nov 3, 2021
Maintainer

Fr3DBr
Nov 3, 2021
Author

greg7mdp
Nov 3, 2021
Maintainer

Fr3DBr Nov 3, 2021
Author

Fr3DBr Nov 3, 2021
Author

greg7mdp
Nov 3, 2021
Maintainer

Fr3DBr Nov 3, 2021
Author

greg7mdp Nov 3, 2021
Maintainer

Fr3DBr Nov 3, 2021
Author

greg7mdp
Nov 3, 2021
Maintainer