Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM fails to load in 1.1 (JavaCall.jl) #31104

Closed
aviks opened this issue Feb 18, 2019 · 24 comments
Closed

JVM fails to load in 1.1 (JavaCall.jl) #31104

aviks opened this issue Feb 18, 2019 · 24 comments
Labels
regression Regression in behavior compared to a previous version

Comments

@aviks
Copy link
Member

aviks commented Feb 18, 2019

In 1.1, trying to initialise the java virtual machine in Julia via JavaCall.jl causes a stack overflow error. It works correctly when the code is loaded with -i or -e but fails when used from the REPL.

ERROR: StackOverflowError:
Stacktrace:
 [1] init(::Array{String,1}) at /Users/rene/.julia/packages/JavaCall/toamy/src/jvm.jl:176
 [2] top-level scope at none:0

Ref: JuliaInterop/JavaCall.jl#96

The call in question is the JNI_CreateJavaVM function in the JVM. More information about the JVM invocation API is at https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html

This is a regression from 1.0.x

@aviks aviks added the regression Regression in behavior compared to a previous version label Feb 18, 2019
@aviks
Copy link
Member Author

aviks commented Feb 27, 2019

@JeffBezanson suggested trying with ALWAYS_COPY_STACKS. However defining this in options.h and rebuilding julia did not change the failure.

@aviks
Copy link
Member Author

aviks commented Feb 27, 2019

Maybe related? JuliaInterop/RCall.jl#293

@JeffBezanson JeffBezanson added this to the 1.2 milestone Mar 30, 2019
@JeffBezanson
Copy link
Sponsor Member

I get a hang instead of a stack overflow:

julia> using JavaCall

julia> JavaCall.init(["-Xmx128M"])

stack trace:

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007ff55d7204e2 in os::PlatformEvent::park(long) () from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#2  0x00007ff55d70bd93 in ObjectMonitor::EnterI(Thread*) () from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#3  0x00007ff55d70c143 in ObjectMonitor::enter(Thread*) () from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#4  0x00007ff55d4fbf1a in instanceKlass::initialize_impl(instanceKlassHandle, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#5  0x00007ff55d4fc571 in instanceKlass::initialize(Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#6  0x00007ff55d4fc30a in instanceKlass::initialize_impl(instanceKlassHandle, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#7  0x00007ff55d4fc571 in instanceKlass::initialize(Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#8  0x00007ff55d4fc30a in instanceKlass::initialize_impl(instanceKlassHandle, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#9  0x00007ff55d4fc571 in instanceKlass::initialize(Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#10 0x00007ff55d4fc30a in instanceKlass::initialize_impl(instanceKlassHandle, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#11 0x00007ff55d4fc571 in instanceKlass::initialize(Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#12 0x00007ff55d4fc30a in instanceKlass::initialize_impl(instanceKlassHandle, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#13 0x00007ff55d4fc571 in instanceKlass::initialize(Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#14 0x00007ff55d4589de in Exceptions::new_exception(Thread*, Symbol*, Symbol*, JavaCallArguments*, Handle, Handle, Handle)
    () from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#15 0x00007ff55d458fc2 in Exceptions::new_exception(Thread*, Symbol*, char const*, Handle, Handle, Handle, Exceptions::ExceptionMsgToUtf8Mode) () from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#16 0x00007ff55d4594aa in Exceptions::_throw_msg(Thread*, char const*, int, Symbol*, char const*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#17 0x00007ff55d4fabf3 in instanceKlass::set_initialization_state_and_notify(instanceKlass::ClassState, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#18 0x00007ff55d4fc05c in instanceKlass::initialize_impl(instanceKlassHandle, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#19 0x00007ff55d4fc571 in instanceKlass::initialize(Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#20 0x00007ff55d4fc30a in instanceKlass::initialize_impl(instanceKlassHandle, Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#21 0x00007ff55d4fc571 in instanceKlass::initialize(Thread*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
---Type <return> to continue, or q <return> to quit---
#22 0x00007ff55d8532e6 in Threads::create_vm(JavaVMInitArgs*, bool*) ()
   from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so
#23 0x00007ff55d55c624 in JNI_CreateJavaVM () from /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so

It looks like the JVM is trying to throw an exception. I'll see if I can find the error message.

@Keno
Copy link
Member

Keno commented Apr 11, 2019

Jeff and I spent a lot of time looking into this today. The story is basically as follows:

  1. The JVM uses the stack extents to determine whether a held lock is owned by the current thread (if it's on the current thread stack, it is presumed to be owned).
  2. If this determination is wrong, things break in all sorts of ways (it tried to throw an exception before the system is initialized causing stack overflows or hangs, etc.)
  3. In order to find out the stack bounds, the JVM queries pthread_attr_getstack for the current stack extents.
  4. There is no officially sanctioned API to change this value.

The reason that ALWAYS_COPY_STACKS does not work is because that doesn't actually run on the main stack, it just only uses one stack, which is nice, but insufficient.

Jeff is trying to get ALWAYS_COPY_STACKS to go back to using the actual process stack, which may make this work temporarily, but certainly not on the pre-built binaries.

A full fix will require an enhancement to the JNI to allow setting the stack extents of the current stack before entering the JVM thread.

@Keno
Copy link
Member

Keno commented Apr 11, 2019

(Alternatively, we could have a way of always executing a ccall on the current thread's main stack).

@Keno
Copy link
Member

Keno commented Apr 11, 2019

This is the relevant function in the current jdk, though it looks a bit different in JDK8: https://github.com/unofficial-openjdk/openjdk/blob/3e601613f8c323f1129fa3f8aac389b90456c8c7/src/hotspot/os_cpu/linux_zero/os_linux_zero.cpp#L308

@aviks
Copy link
Member Author

aviks commented Apr 12, 2019

Alternatively, we could have a way of always executing a ccall on the current thread's main stack

That would be much better. Honestly, not many people will use this if it needs recompiling Julia.

@JeffBezanson
Copy link
Sponsor Member

We could possibly enable this with a command line option too.

@aviks
Copy link
Member Author

aviks commented Apr 12, 2019

That'd be great, thanks!

@JeffBezanson
Copy link
Sponsor Member

We've decided that the right way to address this is to add a new calling convention to ccall that performs calls on the original thread stack. That can be used whenever you need to call native code that depends on the stack pointer value like the JVM does. That puts the information in the right place, since it's specific to which C library/function you're calling, and doesn't require any options or configuration. The main limitation is that the native code won't be able to safely call back into julia.

Some other options considered were (1) provide builds of openjdk (via BinaryBuilder) patched to work around the problem (we could still potentially do this if it's practical), and (2) (long term) try to get the JNI to add a call to inform it that the stack has switched.

For 1.2, the most we can do is fix the ALWAYS_COPY_STACKS build option.

@ViralBShah
Copy link
Member

I think patched builds of openjdk are not practical for the users of this feature. Most likely the end users of java software are enterprises who are unlikely to want to pick an openjdk we provide.

Providing a patch to openjdk itself may be ok in hopes of future releases fixing this issue.

@StefanKarpinski
Copy link
Sponsor Member

I think the utility of that option is for people who are using a Julia library that depends on a Java library but the user doesn't know or care that it's implemented in Java—they just want it to work. For that use case, the JVM is no different than any other binary artifact that we load and call into.

JeffBezanson added a commit that referenced this issue Apr 18, 2019
@ExpandingMan
Copy link
Contributor

So I'm assuming one would need a non-default build of Julia for this to work in 1.2, correct?

@JeffBezanson JeffBezanson modified the milestones: 1.2, 1.3 May 8, 2019
JeffBezanson added a commit that referenced this issue May 8, 2019
KristofferC pushed a commit that referenced this issue May 9, 2019
this makes it possible to work around #31104

(cherry picked from commit 48634f9)
bswrundquist added a commit to bswrundquist/jn-gpu that referenced this issue Aug 9, 2019
JeffBezanson added a commit that referenced this issue Aug 13, 2019
Allows working around #31104 (JVM interop)
JeffBezanson added a commit that referenced this issue Aug 15, 2019
Allows working around #31104 (JVM interop)
@JeffBezanson
Copy link
Sponsor Member

Added environment variable to work around this for now.

@JeffBezanson JeffBezanson modified the milestones: 1.3, 1.4 Aug 15, 2019
JeffBezanson added a commit that referenced this issue Aug 15, 2019
@KristofferC
Copy link
Sponsor Member

What needs to be done here (thinking about the 1.4 milestone).

@mkitti
Copy link
Contributor

mkitti commented May 15, 2020

Thanks to @c42f for bringing the current open state of this issue to my attention.

Outstanding Issues:

  1. JULIA_COPY_STACKS=1 leads to a crash on Windows x64
    https://travis-ci.org/github/JuliaInterop/JavaCall.jl/jobs/687145088
    JavaCall works fine with JULIA_COPY_STACK=0 on Windows x64:
    https://ci.appveyor.com/project/aviks/javacall-jl-6c24s/build/job/m23nvnverlk6jmpq

  2. JavaCall does not work on x86 Windows:
    https://ci.appveyor.com/project/aviks/javacall-jl-6c24s/build/job/43x4i9c3slcjudbv

The above sound like separate platform specific issues for Julia.

Given that

  1. JavaCall tests are now generally passing with some environmental configuration set.
    https://travis-ci.org/github/JuliaInterop/JavaCall.jl
    https://ci.appveyor.com/project/aviks/javacall-jl-6c24s
  2. JavaCall can function in the REPL on the root Task (no @async) without JULIA_COPY_STACKS=1
    Allow the the REPL backend to run on the root Task #35048
    https://github.com/mkitti/RootTaskREPL.jl

I suggest that we close this issue. Do you agree @aviks ?

@c42f
Copy link
Member

c42f commented May 15, 2020

  • In order to find out the stack bounds, the JVM queries pthread_attr_getstack for the current stack extents.

  • There is no officially sanctioned API to change this value.

There is pthread_attr_setstack - perhaps that would help?

@c42f
Copy link
Member

c42f commented May 15, 2020

I suggest that we close this issue.

JVM may load, but I feel it may be useful to keep this issue open (perhaps we rename it?) to track JVM integration issues. Provided that some of the great analysis further up the thread is still relevant.

@ViralBShah
Copy link
Member

Wouldn't it be better to close this and have newer issues to track JavaCall stuff, or track it on the JavaCall.jl repo?

@ViralBShah
Copy link
Member

Please reopen if we should keep it around.

@barche
Copy link
Contributor

barche commented Oct 9, 2023

We've decided that the right way to address this is to add a new calling convention to ccall that performs calls on the original thread stack.

Sorry to dig up an old issue, but has this new calling convention been implemented, or is it still planned? I'm asking because since Qt version 6.5, QML has a new stack bounds checker, which makes QML.jl fail in the same way as JavaCall.jl. Thanks for the great analysis of this problem, BTW.

@mkitti
Copy link
Contributor

mkitti commented Oct 11, 2023

My solution is to implement a worker that runs on the root task. Then whenever we need to do something that requires the root task, send the call over a channel to the worker:

https://github.com/mkitti/TaskWorkers.jl

The package probably needs to be revisited, but overall the idea is sound I think.

@barche
Copy link
Contributor

barche commented Oct 11, 2023

Thanks, this is a golden tip, we actually don't need TaskWorkers.jl, since for QML.jl it is sufficient to process the events on the root task and then launch the REPL on a new task, I didn't know that was possible.

@mkitti
Copy link
Contributor

mkitti commented Oct 11, 2023

I didn't know that was possible.

When I started with Julia, it was not possible. The REPL backend used not to be using the root task. I had to change it: #35048 . I had been using Julia for about two months at this point, and this pull request getting processed made me realize this was a project I could to which I could meaningfully contribute.

@barche , you might have an interest in #35726 since checking if we are on the root task is currently not publicly documented API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

10 participants