Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makie started segfaulting quite regularly :( #2606

Closed
SimonDanisch opened this issue Jan 17, 2023 · 9 comments
Closed

Makie started segfaulting quite regularly :( #2606

SimonDanisch opened this issue Jan 17, 2023 · 9 comments

Comments

@SimonDanisch
Copy link
Member

Making a tracking issue for segfaults on julia 1.8.3+ to 1.9beta2, which have been happening quite frequently lately!

CairoMakie benchmark CI: https://github.com/MakieOrg/Makie.jl/actions/runs/3937490334/jobs/6735073575

wsl2 + linux + 1.9beta2 (this one always reproduces):
image

Tyler.jl + Linux segfaults on 1.8 while building docs (likely flaky and maybe HTTP.jl related): https://github.com/MakieOrg/Tyler.jl/actions/runs/3937793009/jobs/6735719538#step:7:87

Then there's #2585
I've also heard from other users an increase in segfaults!

@SimonDanisch
Copy link
Member Author

Intuitively I'd say this isn't our fault, since, at least in Makie itself, we shouldn't do anything unsafe.
But there's a chance I'm overlooking something, or some of our dependencies do something unsafe.
There's also a chance, that julia 1.8+ has become less stable, or that what we do in our precompilation is slightly unsafe.
The stacktrace in the CairoMakie benchmark on Julia@1.8.5 does point to serialization:

signal (11): Segmentation fault
in expression starting at none:1
jl_deserialize_struct at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/dump.c:2081 [inlined]
jl_deserialize_value_any at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/dump.c:2186 [inlined]
jl_deserialize_value at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/dump.c:2319

@timholy don't want to use up your time here, but maybe you can give a few pointers about what could be going on here.
I'll try to see today, if I can clean up our precompilation, to make sure we don't have any dangling globals that are not serializable (although it still shouldn't segfault 🤷 )

@timholy
Copy link
Contributor

timholy commented Jan 17, 2023

Have you cleaned your .julia/compiled/v1.9 directory and rebuilt? That's fixed all other such segfaults I've heard about. But that suggests we're not checking that something needs rebuilding. It's quite possible that the heroic @KristofferC has already fixed it:

It looks like beta3 is just about to be released, please try that when it comes out. I believe it may have both of these fixes in it.

@timholy
Copy link
Contributor

timholy commented Jan 17, 2023

For me GLMakie compiles just fine on WSL.

@timholy
Copy link
Contributor

timholy commented Jan 17, 2023

But yes this seems serious. It took me a bit to realize you're seeing this on 1.8 also. Not good. Can you try reverting back to SnoopPrecompile 1.0.1? There's no good reason that 1.0.3 should be able to cause segfaults, but there may be bad reasons (i.e., Julia bugs).

@SimonDanisch
Copy link
Member Author

Thanks, I'll try downgrading!
For the segfaults I run into locally (e.g. wgl2), I did indeed rm -rf ./julia/compiled/v1.9 and it's still segfaulting... But, I'm less concerned about a segfault with a pre-release Julia version, and there's a chance that I somehow messed up my WSL setup.

@SimonDanisch
Copy link
Member Author

Uhhh, I do have a pretty good lead for at least the GLMakie precompilation segfaults:
image
Seems like cleaning up open windows in GLFW when exiting julia can segfault - which could interact badly with precompiling GLMakie!

@SimonDanisch
Copy link
Member Author

Hmpf, I expected this to be easier and GLFW just doing something unsafe with the open window, but it doesn't seem so and the stacktrace points to the nvidia driver & pthreads:

#0  0x00007fffc9f3545a in ?? ()
   from /usr/lib/wsl/drivers/nvlti.inf_amd64_ac791a13a77aa333/libnvwgf2umx.so
#1  0x00007fffc9f341be in ?? ()
   from /usr/lib/wsl/drivers/nvlti.inf_amd64_ac791a13a77aa333/libnvwgf2umx.so
#2  0x00007fffc9f34136 in ?? ()
   from /usr/lib/wsl/drivers/nvlti.inf_amd64_ac791a13a77aa333/libnvwgf2umx.so
#3  0x00007ffff7f9e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007ffff7ec3133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) bt 100
#0  0x00007fffc9f3545a in ?? ()
   from /usr/lib/wsl/drivers/nvlti.inf_amd64_ac791a13a77aa333/libnvwgf2umx.so
#1  0x00007fffc9f341be in ?? ()
   from /usr/lib/wsl/drivers/nvlti.inf_amd64_ac791a13a77aa333/libnvwgf2umx.so
#2  0x00007fffc9f34136 in ?? ()
   from /usr/lib/wsl/drivers/nvlti.inf_amd64_ac791a13a77aa333/libnvwgf2umx.so
#3  0x00007ffff7f9e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007ffff7ec3133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@SimonDanisch
Copy link
Member Author

Ah wow, so this is completely unrelated to the other segfaults and now also segfaults in older Julia versions, and since I just recently tried out GLMakie in WSL2 and it worked perfectly, this must be due to a driver update / windows update -.- Why is everyone suddenly conspiring to segfault (GL)Makie ??

@SimonDanisch
Copy link
Member Author

Haven't seen segfaults in the last weeks besides the wsl2 one, so closing this issue for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants