Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved hashing algorithm in luaS_newlstr #1168

Merged
merged 6 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions deps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ and our version:
1. Makefile is modified to allow a different compiler than GCC.
2. We have the implementation source code, and directly link to the following external libraries: `lua_cjson.o`, `lua_struct.o`, `lua_cmsgpack.o` and `lua_bit.o`.
3. There is a security fix in `ldo.c`, line 498: The check for `LUA_SIGNATURE[0]` is removed in order to avoid direct bytecode execution.
4. In `lstring.c`, the luaS_newlstr function's hash calculation has been upgraded from a simple hash function to MurmurHash3, implemented within the same file, to enhance performance, particularly for operations involving large strings.

Hdr_Histogram
---
Expand Down
52 changes: 47 additions & 5 deletions deps/lua/src/lstring.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@


#include <string.h>
#include <stdint.h>

#define lstring_c
#define LUA_CORE
Expand Down Expand Up @@ -71,14 +72,55 @@ static TString *newlstr (lua_State *L, const char *str, size_t l,
return ts;
}

uint32_t murmur32(const uint8_t* key, size_t len, uint32_t seed) {
static const uint32_t c1 = 0xcc9e2d51;
static const uint32_t c2 = 0x1b873593;
static const uint32_t r1 = 15;
static const uint32_t r2 = 13;
static const uint32_t m = 5;
static const uint32_t n = 0xe6546b64;
uint32_t hash = seed;

const int nblocks = len / 4;
const uint32_t* blocks = (const uint32_t*) key;
for (int i = 0; i < nblocks; i++) {
uint32_t k = blocks[i];
k *= c1;
k = (k << r1) | (k >> (32 - r1));
k *= c2;

hash ^= k;
hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
}

const uint8_t* tail = (const uint8_t*) (key + nblocks * 4);
uint32_t k1 = 0;
switch (len & 3) {
case 3:
k1 ^= tail[2] << 16;
case 2:
k1 ^= tail[1] << 8;
case 1:
k1 ^= tail[0];
k1 *= c1;
k1 = (k1 << r1) | (k1 >> (32 - r1));
k1 *= c2;
hash ^= k1;
}

hash ^= len;
hash ^= (hash >> 16);
hash *= 0x85ebca6b;
hash ^= (hash >> 13);
hash *= 0xc2b2ae35;
hash ^= (hash >> 16);

return hash;
}

TString *luaS_newlstr (lua_State *L, const char *str, size_t l) {
GCObject *o;
unsigned int h = cast(unsigned int, l); /* seed */
size_t step = 1;
size_t l1;
for (l1=l; l1>=step; l1-=step) /* compute hash */
h = h ^ ((h<<5)+(h>>2)+cast(unsigned char, str[l1-1]));
unsigned int h = murmur32((uint8_t *)str, l, (uint32_t)l);
for (o = G(L)->strt.hash[lmod(h, G(L)->strt.size)];
o != NULL;
o = o->gch.next) {
Expand Down
Loading