Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ocaml5-issue] Segfault in Dynlink on Windows #290

Closed
shym opened this issue Jan 25, 2023 · 8 comments
Closed

[ocaml5-issue] Segfault in Dynlink on Windows #290

shym opened this issue Jan 25, 2023 · 8 comments
Labels
ocaml5-issue A potential issue in the OCaml5 compiler/runtime

Comments

@shym
Copy link
Collaborator

shym commented Jan 25, 2023

Seen in that run: https://github.com/shym/multicoretests/actions/runs/4005594444/jobs/6876136859#step:12:1228

random seed: 357183120
generated error fail pass / total     time test name

[ ]    0    0    0    0 /  100     0.0s negative Lin DSL Dynlink test with Domain
File "src/dynlink/dune", line 14, characters 7-20:
14 |  (name lin_tests_dsl)
            ^^^^^^^^^^^^^
(cd _build/default/src/dynlink && ./lin_tests_dsl.exe --verbose)
Command exited with code -1073741819.

in the code proposed by #289.

@jmid
Copy link
Collaborator

jmid commented Jan 25, 2023

First off: Wow, nice catch! 🎉 😃
Second: since it crashes on the very first input, this seems like it should be possible to rip out a stand alone reproducible from it? 🤔

@polytypic
Copy link

Sorry, I don't know the context for this, but the combination of "Dynlink" and "Windows" suggests that this might be related to the "shadow store reservation" bug in OCaml 5 on Windows. Mentioning this just in case this might be related.

@jmid
Copy link
Collaborator

jmid commented Jan 25, 2023

Good point - we should definitely try running the Dynlink test on that PR 👍

@shym
Copy link
Collaborator Author

shym commented Jan 25, 2023

Indeed, our other Windows CI runs (tagged 5.0.0 but that might be misleading at the moment 😄) do use @dra27’s branch, thanks to the trick in his opam repository.

@shym
Copy link
Collaborator Author

shym commented Jan 25, 2023

And it does segfault locally on that PR...

@shym
Copy link
Collaborator Author

shym commented Jan 25, 2023

I get the following backtrace for the thread apparently triggering the segfault:

(gdb) bt
#0  0x00007fffb71ed2f1 in strlen () from /cygdrive/c/Windows/System32/msvcrt.dll
#1  0x00007ff6e59e1f6f in caml_copy_string (s=s@entry=0x0) at runtime/alloc.c:213
#2  0x00007ff6e5a08509 in caml_raise_with_string (tag=<optimized out>, msg=0x0) at runtime/fail_nat.c:125
#3  0x00007ff6e5a08533 in caml_failwith (msg=0x4bd93ff590 "\220\366?\331K") at runtime/fail_nat.c:132
#4  0x00007ff6e5a07cd4 in caml_natdynlink_open (filename=<optimized out>, global=<optimized out>) at runtime/dynlink_nat.c:87
#5  0x00007ff6e5a09374 in caml_c_call ()
#6  0x000001a429d9f750 in ?? ()

The argument to caml_failwith is fishy.
If I track correctly its source, it comes from flexdll_dlerror, so that would make it somewhat out of scope.
But I nevertheless don’t understand why caml_raise_with_string gets NULL as a msg instead of 0x4bd93ff590.

@jmid
Copy link
Collaborator

jmid commented Jan 25, 2023

That's strange indeed!
Looking at the code it seems there are two implementations of caml_dlerror depending on the configuration:

  • return flexdll_dlerror(); and
  • return "dynamic loading not supported on this platform";

From the msg could it be that we are in the first case, but observing an underlying parallelism unsafety in flexdll?

@shym
Copy link
Collaborator Author

shym commented Feb 10, 2023

For the record, I also got a deadlock (Windows trunk):
https://github.com/ocaml-multicore/multicoretests/actions/runs/4136184869/jobs/7149683784#step:11:1277

random seed: 3565624
generated error fail pass / total     time test name

[ ]    0    0    0    0 /  100     0.0s negative Lin DSL Dynlink test with Domain
[ ]    0    0    0    0 /  100     0.0s negative Lin DSL Dynlink test with Domain (generating)Terminate batch job (Y/N)? 
^CFatal error: exception User interruption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ocaml5-issue A potential issue in the OCaml5 compiler/runtime
Projects
None yet
Development

No branches or pull requests

3 participants