Added a hash combiner for performance improvement. #250

laurence-atomic · 2024-11-22T20:46:31Z

In our testing at Atomic Industries, we discovered that the performance of graaf suffers on graphs with >50 million edges.

We use a hash-combiner at Atomic to improve performance on hashing tuples. Implementing this in graaf resulted in substantial speed-ups in multiple test cases. Unfortunately I can not share internal performance data, but it should be possible to replicate the speed-up on large (>50 million edge) public datasets.

github-actions

Hi there! Thank you for creating your first pull-request on the Graaf library :)

codecov · 2024-11-22T22:10:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.59%. Comparing base (4dbde3d) to head (301664a).
Report is 8 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #250   +/-   ##
=======================================
  Coverage   99.59%   99.59%           
=======================================
  Files          56       56           
  Lines        2738     2748   +10     
  Branches      135      135           
=======================================
+ Hits         2727     2737   +10     
  Misses         11       11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joweich

Benchmarks do indeed indicate a significant leap in performance:

joweich · 2024-11-23T22:09:13Z

include/graaflib/types.h

+template <class T>
+inline void hash_combine(std::size_t& seed, const T& v) {
+  std::hash<T> hasher;
+  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);


If I understand correctly, this is hash_combine of boost v1.33.
A reference to the source as a comment would be good here. The magic number and the shifts otherwise seem a little arbitrary.

Ah, it appears that is the source of the hash combine function. I was only familiar with the function based on seeing it used in multiple contexts in production. I hadn't tracked down the source.

It appears that like many magic constants, the value of 0x9e3779b9 was chosen arbitrarily as the fractional component of the golden ratio multiplied by 2^32. Perhaps on modern hardware, using the golden ratio multiplied by 2^64 would show improved performance.

Sources:

http://burtleburtle.net/bob/hash/doobs.html
https://softwareengineering.stackexchange.com/questions/63595/tea-algorithm-constant-0x9e3779b9-said-to-be-derived-from-golden-ratio-but-the/63599#63599

Perhaps it's better in that case to directly use the Boost hash combiner?

The bit shifts are probably chosen arbitrarily too.

To be clear, hash functions are outside of my field of expertise, my goal here is to improve performance on very large graphs.

I'll speak with my team at Atomic Industries about making a novel hash combiner function, so we don't accidentally end up plagiarizing another project. We have a few mathematicians on the team with a background in combinatorics.

+1 I agree that referencing Boost would be helpful here. Looking at the Boost Software License under which the ContainerHash library is licensed this may also be a requirement. No need to push changes to the PR, I can add this before merging :)

The alternative would be to add Boost as a dependency to Graaf, but I would like to avoid this as I would like Graaf to be a standalone Boost::Graph alternative.

bobluppes

Thanks for opening this PR @laurence-atomic! And wow, that is some graph with 50M edges 😅 Glad you were able to find such a great performance win (and thanks @joweich for providing insights into our benchmarks)!

If I understand correctly, Boost switched to a slightly different implementation as of Boost 1.81 which improves the distribution over the output domain. Source code here. I am wondering whether we could gain anything by using this newer implementation, but I would leave this as a potential follow up.

LGTM! I just added the Boost license and am more than happy to include this optimization in the next release of Graaf 🚀

include/graaflib/types.h

Co-authored-by: Bob Luppes <bobluppes@gmail.com>

bobluppes · 2024-11-27T20:55:17Z

Glad to have this performance improvement merged, thanks again @laurence-atomic 🎉

I just wanted to take the opportunity to ask if you have any feedback regarding the library as we would love to hear from you and your team! In particular, we are currently trying to streamline the interface of the graph class and corresponding algorithms. So if there is anything in particular that does or doesn't work well for you then you are more than welcome to hop in our discord and we can discuss 😄

Added a hash combiner for performance improvement.

463b728

github-actions bot reviewed Nov 22, 2024

View reviewed changes

Fixed clang formatting

78e1c8a

joweich reviewed Nov 23, 2024

View reviewed changes

bobluppes approved these changes Nov 25, 2024

View reviewed changes

include/graaflib/types.h Outdated Show resolved Hide resolved

Update include/graaflib/types.h

301664a

Co-authored-by: Bob Luppes <bobluppes@gmail.com>

bobluppes added the performance Benchmarks or performance improvements label Nov 26, 2024

bobluppes merged commit ec9e0b0 into bobluppes:main Nov 26, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a hash combiner for performance improvement. #250

Added a hash combiner for performance improvement. #250

laurence-atomic commented Nov 22, 2024

github-actions bot left a comment

codecov bot commented Nov 22, 2024 •

edited

Loading

joweich left a comment

joweich Nov 23, 2024

laurence-atomic Nov 25, 2024

laurence-atomic Nov 25, 2024

laurence-atomic Nov 25, 2024

laurence-atomic Nov 25, 2024

bobluppes Nov 25, 2024

bobluppes left a comment

bobluppes commented Nov 27, 2024

Added a hash combiner for performance improvement. #250

Added a hash combiner for performance improvement. #250

Conversation

laurence-atomic commented Nov 22, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 22, 2024 • edited Loading

Codecov Report

joweich left a comment

Choose a reason for hiding this comment

joweich Nov 23, 2024

Choose a reason for hiding this comment

laurence-atomic Nov 25, 2024

Choose a reason for hiding this comment

laurence-atomic Nov 25, 2024

Choose a reason for hiding this comment

laurence-atomic Nov 25, 2024

Choose a reason for hiding this comment

laurence-atomic Nov 25, 2024

Choose a reason for hiding this comment

bobluppes Nov 25, 2024

Choose a reason for hiding this comment

bobluppes left a comment

Choose a reason for hiding this comment

bobluppes commented Nov 27, 2024

codecov bot commented Nov 22, 2024 •

edited

Loading