Skip to content
This repository has been archived by the owner on Dec 26, 2023. It is now read-only.

1. opt. calloc memory for big hashtable #141

Merged
merged 2 commits into from
Jan 12, 2022
Merged

Conversation

zhanglistar
Copy link
Contributor

If the hash table is big, say 1million or more, calloc big memory can be bottle neck, using malloc and memset metadata to optimize this.

@martinus
Copy link
Owner

martinus commented Jan 11, 2022

Do you have numbers how much faster malloc + memset is? On what system?

Usually calloc should be faster. See https://stackoverflow.com/a/2688522/48181

@zhanglistar
Copy link
Contributor Author

zhanglistar commented Jan 12, 2022

Do you have numbers how much faster malloc + memset is? On what system?

Usually calloc should be faster. See https://stackoverflow.com/a/2688522/48181

Notice that I did not memset all the memory, instead only the mInfo part which is really small, and I have tested in my program that will increase performance a lot.

The picture is after optimization, which is really fast than original calloc all the memory.
image

The last line means that the malloc and memset mInfo memory use 18 ms to memset the mInfo data structure of 67109127 Byte with total memory 1677727983 Byte and max capacity of 67108864 elements.

And here is the calloc and other memory allocation calls benchmark:
image
Code is here:https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2020/01/14

BTW, I have used this map and optimization as a replacement of std::unordered_map/set in clickhouse code, details on pr:ClickHouse/ClickHouse#33180

@martinus
Copy link
Owner

martinus commented Jan 12, 2022

Ah, it make sense to just initialize mInfo. That's certainly better. Can you update your PR a bit to use indentation with spaces and std::memset? like so:

diff --git a/src/include/robin_hood.h b/src/include/robin_hood.h
index 32e5641..a1b6d2d 100644
--- a/src/include/robin_hood.h
+++ b/src/include/robin_hood.h
@@ -2323,13 +2323,14 @@ private:
 
         auto const numElementsWithBuffer = calcNumElementsWithBuffer(max_elements);
 
-        // calloc also zeroes everything
+        // malloc & zero mInfo. Faster than calloc everything.
         auto const numBytesTotal = calcNumBytesTotal(numElementsWithBuffer);
         ROBIN_HOOD_LOG("std::calloc " << numBytesTotal << " = calcNumBytesTotal("
                                       << numElementsWithBuffer << ")")
         mKeyVals = reinterpret_cast<Node*>(
-            detail::assertNotNull<std::bad_alloc>(std::calloc(1, numBytesTotal)));
+            detail::assertNotNull<std::bad_alloc>(std::malloc(numBytesTotal)));
         mInfo = reinterpret_cast<uint8_t*>(mKeyVals + numElementsWithBuffer);
+        std::memset(mInfo, 0, numBytesTotal - numElementsWithBuffer * sizeof(Node));
 
         // set sentinel
         mInfo[numElementsWithBuffer] = 1;

@martinus martinus added the performance performance improvements label Jan 12, 2022
@zhanglistar
Copy link
Contributor Author

Ah, it make sense to just initialize mInfo. That's certainly better. Can you update your PR a bit to use indentation with spaces and std::memset? like so:

diff --git a/src/include/robin_hood.h b/src/include/robin_hood.h
index 32e5641..a1b6d2d 100644
--- a/src/include/robin_hood.h
+++ b/src/include/robin_hood.h
@@ -2323,13 +2323,14 @@ private:
 
         auto const numElementsWithBuffer = calcNumElementsWithBuffer(max_elements);
 
-        // calloc also zeroes everything
+        // malloc & zero mInfo. Faster than calloc everything.
         auto const numBytesTotal = calcNumBytesTotal(numElementsWithBuffer);
         ROBIN_HOOD_LOG("std::calloc " << numBytesTotal << " = calcNumBytesTotal("
                                       << numElementsWithBuffer << ")")
         mKeyVals = reinterpret_cast<Node*>(
-            detail::assertNotNull<std::bad_alloc>(std::calloc(1, numBytesTotal)));
+            detail::assertNotNull<std::bad_alloc>(std::malloc(numBytesTotal)));
         mInfo = reinterpret_cast<uint8_t*>(mKeyVals + numElementsWithBuffer);
+        std::memset(mInfo, 0, numBytesTotal - numElementsWithBuffer * sizeof(Node));
 
         // set sentinel
         mInfo[numElementsWithBuffer] = 1;

Done.

@martinus martinus merged commit 91f612c into martinus:master Jan 12, 2022
@martinus
Copy link
Owner

Merged, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants