-
-
Notifications
You must be signed in to change notification settings - Fork 146
1. opt. calloc memory for big hashtable #141
Conversation
Do you have numbers how much faster malloc + memset is? On what system? Usually calloc should be faster. See https://stackoverflow.com/a/2688522/48181 |
Notice that I did not memset all the memory, instead only the mInfo part which is really small, and I have tested in my program that will increase performance a lot. The picture is after optimization, which is really fast than original The last line means that the malloc and memset mInfo memory use 18 ms to memset the mInfo data structure of 67109127 Byte with total memory 1677727983 Byte and max capacity of 67108864 elements. And here is the calloc and other memory allocation calls benchmark: BTW, I have used this map and optimization as a replacement of std::unordered_map/set in clickhouse code, details on pr:ClickHouse/ClickHouse#33180 |
Ah, it make sense to just initialize mInfo. That's certainly better. Can you update your PR a bit to use indentation with spaces and diff --git a/src/include/robin_hood.h b/src/include/robin_hood.h
index 32e5641..a1b6d2d 100644
--- a/src/include/robin_hood.h
+++ b/src/include/robin_hood.h
@@ -2323,13 +2323,14 @@ private:
auto const numElementsWithBuffer = calcNumElementsWithBuffer(max_elements);
- // calloc also zeroes everything
+ // malloc & zero mInfo. Faster than calloc everything.
auto const numBytesTotal = calcNumBytesTotal(numElementsWithBuffer);
ROBIN_HOOD_LOG("std::calloc " << numBytesTotal << " = calcNumBytesTotal("
<< numElementsWithBuffer << ")")
mKeyVals = reinterpret_cast<Node*>(
- detail::assertNotNull<std::bad_alloc>(std::calloc(1, numBytesTotal)));
+ detail::assertNotNull<std::bad_alloc>(std::malloc(numBytesTotal)));
mInfo = reinterpret_cast<uint8_t*>(mKeyVals + numElementsWithBuffer);
+ std::memset(mInfo, 0, numBytesTotal - numElementsWithBuffer * sizeof(Node));
// set sentinel
mInfo[numElementsWithBuffer] = 1; |
Done. |
Merged, thanks! |
If the hash table is big, say 1million or more, calloc big memory can be bottle neck, using malloc and memset metadata to optimize this.