-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
luajit Infinite loop and cpu 100% #42
Comments
@zx827882285 |
yes |
@zx827882285 Will you show us your |
Okay, never mind, I saw it is in the zx827882285/luajit-bug-report github repo. |
@zx827882285 I just tried your luajit-bug-report repo and the steps on my side with OpenResty 1.15.8.1 RC0 on Fedora 28 x86_64 and I failed to reproduce any infinite looping no matter how hard I try: https://gist.github.com/agentzh/85bd2c83d1c7a8badf5851268e695743 The CPU usage of the nginx or openresty worker processes dropped to 0 immediately after the wrk runs finished. I tried both the GC64 and non-GC64 modes of LuaJIT. I only applied the following patch to your diff --git a/nginx.conf b/nginx.conf
index 60214ef..cda2f53 100644
--- a/nginx.conf
+++ b/nginx.conf
@@ -8,8 +8,8 @@ events {
http {
- lua_package_path "/usr/local/nginx/clua/?.lua;/usr/local/nginx/lua/?.lua;/usr/local/nginx/api/?.lua;;";
- lua_package_cpath "/usr/local/nginx/lua/?.so;;";
+ lua_package_path "$prefix/clua/?.lua;$prefix/lua/?.lua;$prefix/api/?.lua;;";
+ lua_package_cpath "$prefix/lua/?.so;;";
lua_shared_dict a_shm 1M;
lua_shared_dict b_shm 1M;
@@ -21,7 +21,7 @@ http {
server {
server_name localhost;
- listen 80;
+ listen 8083;
location /set {
content_by_lua_block { You should do the same when preparing a minimal example for others to run. BTW, you know about the BTW, will you try the following OpenResty 1.15.8.1 RC0 version? To see if you can still reproduce it. It's much easier for us to reproduce things when using OpenResty. And you can also save a lot of setup commands to explain as in this issue. https://openresty.org/download/openresty-1.15.8.1rc0.tar.gz Thanks! |
@zx827882285 BTW, you should try fetching both the C and Lua backtraces at different times for the same nginx worker process that spins at 100% CPU usage forever. So does your nginx worker processes spin at 100% CPU usage forever? Even after all wrk processes and any other traffic generators quit? The Lua and C backtraces do not look like inside any C or Lua loops at all, which is weird. |
@agentzh In the morning, I install fedroa28 on VMware Workstation and reproduce it; For convenience to reproduce it, i commit install.sh
thanks! |
OK, I can reproduce it using your You cannot reproduce it with OpenResty 1.15.8.1 RC0 just because it enables the GC64 mode in LuaJIT on x86_64 by default. If you enable that in your If you enable internal assertions in your LuaJIT build, you will see that it hits an internal assertion inside LuaJIT first, before entering the infinite loop:
Using openresty-valgrind (with no-pool patch for nginx and the system allocator for luajit) and valgrind fails to catch any memory problems. Disabling the JIT compiler in LuaJIT does not make the problem go away, so it should not be related to the JIT compiler in at all. Enforcing full GC cycle upon every Lua code line also makes the issue disappear. Replacing These are my initial experiments and findings. Need to dig deeper to find out why the GC state of the |
sorry for provide binary executable files in my minimal example.
Although I think there is no infulence. thie problem may related to another problem, segfault in luajit. thank you for your experiments and findings, it was helpful! |
@zx827882285 I think I've already fixed it. It's a bug in LuaJIT's FFI library when handling GC. Please try the following LuaJIT patch on your side: diff --git a/src/lj_clib.c b/src/lj_clib.c
index f016b06b96..a867205247 100644
--- a/src/lj_clib.c
+++ b/src/lj_clib.c
@@ -384,6 +384,7 @@ TValue *lj_clib_index(lua_State *L, CLibrary *cl, GCstr *name)
cd = lj_cdata_new(cts, id, CTSIZE_PTR);
*(void **)cdataptr(cd) = p;
setcdataV(L, tv, cd);
+ lj_gc_anybarriert(L, cl->cache);
}
}
return tv; |
@agentzh |
@zx827882285 BTW, this is a use-after-free kind of bug so it may lead to any weird things from infinite loop to core dumps. And that's why we hit the internal LuaJIT assertion failure before entering the infinite loop. The infinite loop happens because the ctype ID becomes zero, which is not possible for normal ctype objects at all in the first place. Thanks a lot for that minimal and standalone example, otherwise debugging this would be much much more difficult (if possible at all). This bug also affects GC64 builds and any other builds as far as I can see. The issue was not reproducible using your test case just because the GC progresses quite differently in the GC64 mode (for one thing, Lua tables in GC64 builds use table-level free list for recycling nodes while the X64 mode uses bucket-level free lists, thus it is less demanding on GC in GC64 mode). But it's still possible to hit at least in theory. Small changes in your nginx.conf or Lua code may have big impact on reproducing the issue. This is because that we need perfect timing for just the right GC activities to trigger this bug. And incremental GCs are notoriously known for their inherent nondeterminism. In a nutshell, what happens here in your test case is like this:
The key to reproduce this is to have a completed GC cycle in the middle of step 3 and step 4 above. The time window is very small and we have to be lucky enough to hit this. That's why you need To fix this problem, we just need to preserve the tri-color GC's invariant in step 3 by adding a write barrier there and turning the cache table under Just for the record, below is my debugging session using Mozilla rr and gdb python tools generated by our ylang compiler: https://gist.github.com/agentzh/534aabb3a5bc75ff62b8fd25e3d371e0 For more background info on LuaJIT's GC algorithm, we can look at Mike Pall's explanation here: http://wiki.luajit.org/New-Garbage-Collector#gc-algorithms_tri-color-incremental-mark-sweep |
Centos6.5, kernel 2.6.32-754.2.1.el6.x86_64, nginx 1.14.2
luajit2-2.1-20190221, lua-resty-core 0.1.16
behavior:
nginx cpu 100% and luajit Fall into the loop
gbd result:
LuaJIT:
nginx install:
Steps:
instructions:
Sorry, it is difficult to occur, if I delete some code or function, it may not occur.
a.lua and b.lua belong to company, I delete some function. until behavior can't be occur when i delete code
The text was updated successfully, but these errors were encountered: