-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic detection of SSE42 #75
Conversation
FYI, The test failure is due to the fix for LuaJIT/LuaJIT#494 and not due to this patch. |
Added some more patches to fix building with make amalg. I think we need to drop make amalg in favour of LTO though, since LTO has come quite a long way over the last few years. |
Sorry if this sounds stupid, but would this patch change #60 in any way? Would it make that problem worse/better/same? |
It would remain the same. |
Reviving this at the request of @agentzh . The approach is slightly different keeping in mind that luajit2 wants to remain compatible with LuaJIT/LuaJIT. That is, I've duplicated the relevant portion of the cpu flag check instead of moving the It still won't protect luajit2 from the possibility of Mike changing all of string hashing from under them ;) |
Updated to credit @fsfod in the commit log for Windows support. |
The test failure should be fixed with openresty/luajit2-test-suite#9 . |
@siddhesh I'm seeing warnings when cross-compiling for aarch64 and armv7:
|
@siddhesh maybe we should add this extra patch? diff --git a/src/lj_init.c b/src/lj_init.c
index 5c20f15c5d..e963763013 100644
--- a/src/lj_init.c
+++ b/src/lj_init.c
@@ -46,7 +46,10 @@ static void str_hash_init(uint32_t flags)
convenient. */
LJ_INITIALIZER(lj_init_cpuflags)
{
+#ifdef LJ_HAS_OPTIMISED_HASH
uint32_t flags = 0;
+#endif
+
#if LJ_TARGET_X86ORX64
uint32_t vendor[4];
@@ -64,6 +67,8 @@ LJ_INITIALIZER(lj_init_cpuflags)
#endif
+#ifdef LJ_HAS_OPTIMISED_HASH
/* The reason why we initialized early: select our string hash functions. */
str_hash_init (flags);
+#endif
} |
Oops, the |
@siddhesh I just tried this on a modern Intel CPU (Core i9-9900K) and built this branch with the following command:
But gdb shows that it never invokes the SSE 4.2 primitives:
I do see two only only two .c files are compiled with the
Am I missing anything here? The current v2.1-agentzh branch works fine when specifying |
For comparison, when I specify
BTW, I'm using this simple Lua script for the tests above: local a = string.rep("a", 20)
local b = a .. "c"
print("ok") |
My CPU info:
|
@siddhesh maybe the CPU ID detection is buggy? |
@siddhesh Seems like the constructor routine |
@siddhesh The following (hacky) patch seems to fix it. Do you know a better way? diff --git a/src/lj_init.c b/src/lj_init.c
index e963763013..4774a6e2c2 100644
--- a/src/lj_init.c
+++ b/src/lj_init.c
@@ -4,6 +4,10 @@
#include "lj_vm.h"
#include "lj_str.h"
+/* meant to be referenced by the lj_state.c CU so that the linker won't exclude
+ * this CU. */
+int lj_init_used = 0;
+
#if LJ_TARGET_ARM && LJ_TARGET_LINUX
#include <sys/utsname.h>
#endif
diff --git a/src/lj_state.c b/src/lj_state.c
index a2e5fdcb5a..83e1d738d1 100644
--- a/src/lj_state.c
+++ b/src/lj_state.c
@@ -30,6 +30,8 @@
#include "lj_alloc.h"
#include "luajit.h"
+extern int lj_init_used; /* defined in lj_init.c */
+
/* -- Stack handling ------------------------------------------------------ */
/* Stack sizes. */
@@ -201,6 +203,9 @@ LUA_API lua_State *lua_newstate(lua_Alloc allocf, void *allocd)
GG_State *GG;
lua_State *L;
global_State *g;
+
+ lj_init_used = 1; /* enforce the linker to include the lj_init.c CU */
+
/* We need the PRNG for the memory allocator, so initialize this first. */
if (!lj_prng_seed_secure(&prng)) {
lj_assertX(0, "secure PRNG seeding failed"); |
Sorry, I think I messed up the build when I shuffled the functions around from moonjit. I'll take a closer look at it and submit an update. |
@siddhesh OK, thanks |
This is a port of the dynamic SSE4.2 detection feature from moonjit. This makes luajit2 builds portable since SSE4.2 string hash functions are now built separately and chosen at runtime based on whether the CPU supports it. This patch also includes work by Thomas Fransham in moonjit to support Windows builds.
@siddhesh I wonder if there's any easy way to test it. I found my AMD CPUs already support SSE4.2 since they use the latest Zen 2 architecture...Alas. |
OK, never mind, seems like qemu is the best way to test an x64 CPU without SSE 4.2 :) |
I'm afraid my machines are all new too, so you'll need a volunteer to test the hash_sparse_def paths; maybe someone who complained about the sse4.2 requirement in the past could help :) That last push should hopefully address all issues. There was an implicit dependency (the LJ_CPU_FLAGS variable) that kept everything in place in moonjit and it got lost when I got rid of it in luajit2. That is also why your hacky patch worked. |
Just for the record, I'm using the following command to test on an "old" x86_64 CPU without SSE 4.1/4.2:
And I can indeed get the following error when trying to run an old LuaJIT built with
And now the new version in this PR works fine. |
Merged. Thanks! |
Looking forward to #21 :) |
This series of patches moves the CRC32 based
lj_str_hash
implementation into the main sources and builds it with-msse42
. We also move the CPU feature detection into a DSO constructor and use it to patch in the optimised lj_str_hash if we are running on a CPU with SSE4.2, making the binaries more portable across x86 platforms.This should obsolete #20 and also give a good framework to implement #21 making it a simple matter of adding the conditional compilation (and feature check) for crc32 and defining the right crc32 macros.