Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved hashing algorithm in luaS_newlstr #1168

Merged
merged 6 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions deps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ and our version:
1. Makefile is modified to allow a different compiler than GCC.
2. We have the implementation source code, and directly link to the following external libraries: `lua_cjson.o`, `lua_struct.o`, `lua_cmsgpack.o` and `lua_bit.o`.
3. There is a security fix in `ldo.c`, line 498: The check for `LUA_SIGNATURE[0]` is removed in order to avoid direct bytecode execution.
4. In lstring.c, the luaS_newlstr function's hash calculation has been upgraded from a simple hash function to MurmurHash3, implemented within the same file, to enhance performance, particularly for operations involving large strings.
madolson marked this conversation as resolved.
Show resolved Hide resolved

Hdr_Histogram
---
Expand Down
52 changes: 47 additions & 5 deletions deps/lua/src/lstring.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@


#include <string.h>
#include <stdint.h>

#define lstring_c
#define LUA_CORE
Expand Down Expand Up @@ -71,14 +72,55 @@ static TString *newlstr (lua_State *L, const char *str, size_t l,
return ts;
}

uint32_t murmur32(const uint8_t* key, size_t len, uint32_t seed) {
static const uint32_t c1 = 0xcc9e2d51;
static const uint32_t c2 = 0x1b873593;
static const uint32_t r1 = 15;
static const uint32_t r2 = 13;
static const uint32_t m = 5;
static const uint32_t n = 0xe6546b64;
uint32_t hash = seed; // static seed
madolson marked this conversation as resolved.
Show resolved Hide resolved

const int nblocks = len / 4;
const uint32_t* blocks = (const uint32_t*) key;
for (int i = 0; i < nblocks; i++) {
uint32_t k = blocks[i];
k *= c1;
k = (k << r1) | (k >> (32 - r1));
k *= c2;

hash ^= k;
hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
}

const uint8_t* tail = (const uint8_t*) (key + nblocks * 4);
uint32_t k1 = 0;
switch (len & 3) {
case 3:
k1 ^= tail[2] << 16;
case 2:
k1 ^= tail[1] << 8;
case 1:
k1 ^= tail[0];
k1 *= c1;
k1 = (k1 << r1) | (k1 >> (32 - r1));
k1 *= c2;
hash ^= k1;
}
madolson marked this conversation as resolved.
Show resolved Hide resolved

hash ^= len;
hash ^= (hash >> 16);
hash *= 0x85ebca6b;
hash ^= (hash >> 13);
hash *= 0xc2b2ae35;
hash ^= (hash >> 16);

return hash;
}

TString *luaS_newlstr (lua_State *L, const char *str, size_t l) {
GCObject *o;
unsigned int h = cast(unsigned int, l); /* seed */
size_t step = 1;
size_t l1;
for (l1=l; l1>=step; l1-=step) /* compute hash */
h = h ^ ((h<<5)+(h>>2)+cast(unsigned char, str[l1-1]));
unsigned int h = murmur32((uint8_t *)str, l, (uint32_t)l); /* seed */
madolson marked this conversation as resolved.
Show resolved Hide resolved
for (o = G(L)->strt.hash[lmod(h, G(L)->strt.size)];
o != NULL;
o = o->gch.next) {
Expand Down