-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.ClassNotFoundException: during object deserialization in openjdk11.07 with openj9 #9912
Comments
@theresa-m can you take a look at this? You resolved the last couple of issues in this code |
Set for the 0.22 milestone give it's late in the cycle for 0.21, however if there is a quick fix we can evaluate the risk of putting it into 0.21. |
I'll take a look. |
Are you using the The stack trace doesn't quite go back far enough for me to know where the call originated. |
If thats the case this may fix the problem: ibmruntimes/openj9-openjdk-jdk11#317 build with changes to try: https://drive.google.com/file/d/1UKnsL6mGu10qKz819Bon7pJo0yTX2GpA/view?usp=sharing |
Hello @theresa-m Thank you for the update. I've downloaded and tried the JDK you've linked, unfortunately it still fails in the same way. Below I've added all info including the full failing stacktrace.
Full stack:
|
Hi @theresa-m, We do use |
Oh, and I may have misread your question, we also override the default |
Okay. Thanks for the additional stack trace and information. I'll continue to investigate. |
I've updated my draft with a couple of other places where the cache should be refreshed ibmruntimes/openj9-openjdk-jdk11#317 Sample build with this change: https://drive.google.com/file/d/15lq9C460QuR-LVTXv5t-5gpLv7suAklW/view?usp=sharing If that doesn't work, can you also try including the |
Hello @theresa-m I gave your new jdk a try, unfortunately it fails very early during startup of our product (so not related to the issue reported). As the product does not start I can't test more. Something else seems wrong with deserialization in this jdk here:
The -Xint flag has no effect except making things slow (same stack). I double checked with your previous linked jdk, but that starts up fine (and has the original error we reported). |
hmm okay thanks for trying it out. That options helps me narrow down there's no issues with the JIT ludcl optimization either. I'll keep digging into it. |
Thank you, I hope you can find the problem. :) |
Hi @theresa-m, I see that this didn't make 11.08, is there anything that we can do to help in the diagnosis? |
Thanks for the offer! I'll dive back into this on Monday and let you know, this is still high on my list for the September release. |
I'm still not sure why the latest change caused the StreamCorruptedException. I'm thinking of adding some logging statements into the next build but still thinking about what would be the most useful information to help diagnose the problem. |
Hello, I think the most likely place the optimization is failing is when going in and out of the HashMap class but I was hoping to confirm if you would be willing to try out this build: https://drive.google.com/file/d/1YbIq8mWEwCsiKa0s9F498wLrPPaCUpzQ/view?usp=sharing It adds some information into the message of the exception. It prints current classloader names if you are comfortable with that: ibmruntimes/openj9-openjdk-jdk11@openj9...theresa-m:ludcl_refresh_2 |
Thank you for the build, I'll hope to have a look at it later this week/next week and will get back to you then. |
Hello, I've tested with your build and it fails and now with a large message containing your long log message (over 3000 lines). Just to play it safe for legal reasons, can I send/upload the result somewhere privately to you? Thanks. |
Thanks! Oh no I had hoped it would be shorter. The openj9 public slack channel could be a good place to dm me. Probably not going to need all 3000 lines either. This should be the link to join our slack if you are not already part of it: openj9 slack Again thanks for your willingness to try things. |
The main thing I'm looking for is the start where |
Moving this to the Oct quarterly update but if a low risk fix is found before we ship, we can consider porting to the release branch for jdk15 |
Hi @mreuvers I've sent a follow up re this issue on slack. |
@ChengJin01 will be investigating this |
I'm currently looking at optimizations limiting the ammount of stack we need to walk, since in a recursive situation we only really need to walk back as far as any previous calls where we already checked for ludcl. Currently we do have functionality in the vm for a truncated stack walk as part of the ORB code. I'm currently experimenting with it to see if it can be reused here without breaking anything. |
Have current prototype of my changes. Idea is based on accelerations used in the ORB code (which can't be reused without breaking the ORB code). In the orb helper code we provide a native method to internally "mark" the stack at a given point, and a special version of As an alternative I'm proposing a new api which broadly follows the same idea, but eliminates the global state. The new api consists of 2 functions
Implementation : master...dnakamura:openj9:ludcl_new also for reference the current ORB helpers: https://github.com/eclipse-openj9/openj9/blob/b9b733900ef25ba25435e8dd557a10f4e39a4fd6/runtime/jcl/common/orbvmhelpers.c#L150-L275 |
Given that the code can't compile, how was this tested? |
I like the idea of explicit stacking - in the eventual solution we should probably get rid of the J9VMThread fields entierly. Another solution would be to stack the fields on call-in (but I greatly prefer the JCL solution). |
@tajila the plan is to replace the existing optimization with the new approach. " also my prototype implementation of the JCL side Also my prototype |
@gacholio any concerns with the approach |
I think this is a good approach. |
Maybe related: |
@theresa-m Can you please take this up again now that Devin is gone? |
Yes I'll take a look. |
Java -version output
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64-64-Bit Compressed References 20200416_574 (JIT enabled, AOT enabled)
OpenJ9 - 05fa2d3
OMR - d4365f371
JCL - 838028fc9d based on jdk-11.0.7+10)
Summary of problem
This problem looks similar to #8454, however happens under different circumstances. We can confirm that the #8454 was fixed, and our test case for that instance still works.
When our code attempts to deserialise, we get the following stack trace (abbreviated:)
Diagnostic files
No crash happens, so no files are produced. It just fails to load the class.
Running this under 11.05 works, or running 11.07 with
-Dcom.ibm.enableClassCaching=false
also works.We have not yet been able to recreate a "simple" test case for this, but will try to spend some time next week. Ay suggestions on what we might need to do differently to #8454 to trigger this would be appreciated! We can trigger this at will on our (large) code base, so we could also potentially provide logging and/or testing of a pre-release for that.
The text was updated successfully, but these errors were encountered: