Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.ClassNotFoundException: during object deserialization in openjdk11.07 with openj9 #9912

Open
pwagland opened this issue Jun 17, 2020 · 45 comments
Assignees
Milestone

Comments

@pwagland
Copy link

Java -version output

openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64-64-Bit Compressed References 20200416_574 (JIT enabled, AOT enabled)
OpenJ9 - 05fa2d3
OMR - d4365f371
JCL - 838028fc9d based on jdk-11.0.7+10)

Summary of problem

This problem looks similar to #8454, however happens under different circumstances. We can confirm that the #8454 was fixed, and our test case for that instance still works.

When our code attempts to deserialise, we get the following stack trace (abbreviated:)

Caused by: java.lang.ClassNotFoundException: <our class name>
	at java.base/java.lang.Class.forNameImpl(Native Method)
	at java.base/java.lang.Class.forName(Class.java:412)
	at java.base/java.io.ClassCache$FutureValue.get(ClassCache.java:190)
	at java.base/java.io.ClassCache.get(ClassCache.java:161)
	at java.base/java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:848)
	at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2065)
	at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1951)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2239)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1768)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2541)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2433)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2266)

Diagnostic files

No crash happens, so no files are produced. It just fails to load the class.

Running this under 11.05 works, or running 11.07 with -Dcom.ibm.enableClassCaching=false also works.

We have not yet been able to recreate a "simple" test case for this, but will try to spend some time next week. Ay suggestions on what we might need to do differently to #8454 to trigger this would be appreciated! We can trigger this at will on our (large) code base, so we could also potentially provide logging and/or testing of a pre-release for that.

@DanHeidinga
Copy link
Member

@theresa-m can you take a look at this? You resolved the last couple of issues in this code

@pshipton pshipton added this to the Release 0.22 (Java 15) milestone Jun 17, 2020
@pshipton
Copy link
Member

Set for the 0.22 milestone give it's late in the cycle for 0.21, however if there is a quick fix we can evaluate the risk of putting it into 0.21.

@theresa-m
Copy link
Contributor

I'll take a look.

@theresa-m
Copy link
Contributor

theresa-m commented Jun 17, 2020

Are you using the defaultReadObject method in your custom serialization code? I believe there is a cache refresh missing here.

The stack trace doesn't quite go back far enough for me to know where the call originated.

@theresa-m
Copy link
Contributor

If thats the case this may fix the problem: ibmruntimes/openj9-openjdk-jdk11#317

build with changes to try: https://drive.google.com/file/d/1UKnsL6mGu10qKz819Bon7pJo0yTX2GpA/view?usp=sharing

@mreuvers
Copy link

Hello @theresa-m

Thank you for the update. I've downloaded and tried the JDK you've linked, unfortunately it still fails in the same way. Below I've added all info including the full failing stacktrace.

openjdk version "11.0.8-internal" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build master-d2ab44a37e0, JRE 11 Linux amd64-64-Bit Compressed References 20200617_2175 (JIT enabled, AOT enabled)
OpenJ9   - d2ab44a37e0
OMR      - b884d8266d7
JCL      - 9d820f766f3 based on jdk-11.0.8+7)

Full stack:

Caused by: java.lang.ClassNotFoundException: com.redwood.scheduler.custom.fca.balancesheet.documents.DocumentHandler
	at java.base/java.lang.Class.forNameImpl(Native Method)
	at java.base/java.lang.Class.forName(Class.java:412)
	at java.base/java.io.ClassCache$FutureValue.get(ClassCache.java:190)
	at java.base/java.io.ClassCache.get(ClassCache.java:161)
	at java.base/java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:851)
	at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2068)
	at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1954)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2242)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:548)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:458)
	at java.base/java.util.HashMap.readObject(HashMap.java:1460)
	at jdk.internal.reflect.GeneratedMethodAccessor88.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1175)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2403)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:548)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:458)

@pwagland
Copy link
Author

Hi @theresa-m, We do use defaultReadObject() in our code, I am not sure if that code would be invoked in this case though. At least it isn't in the call stack :-)

@pwagland
Copy link
Author

Oh, and I may have misread your question, we also override the default readObject() in some of our classes, again, I can't say if they are involved in this (de)serialisation, as they might be in the stream, but not in the stack trace.

@theresa-m
Copy link
Contributor

Okay. Thanks for the additional stack trace and information. I'll continue to investigate.

@theresa-m
Copy link
Contributor

I've updated my draft with a couple of other places where the cache should be refreshed ibmruntimes/openj9-openjdk-jdk11#317

Sample build with this change: https://drive.google.com/file/d/15lq9C460QuR-LVTXv5t-5gpLv7suAklW/view?usp=sharing

If that doesn't work, can you also try including the -Xint option on your command line to see if that makes a difference?

@mreuvers
Copy link

Hello @theresa-m

I gave your new jdk a try, unfortunately it fails very early during startup of our product (so not related to the issue reported). As the product does not start I can't test more. Something else seems wrong with deserialization in this jdk here:

OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build master-e006f8232bb, JRE 11 Linux amd64-64-Bit Compressed References 20200622_2197 (JIT enabled, AOT enabled)
OpenJ9   - e006f8232bb
OMR      - 934b0ff765c
JCL      - 899077a4d3c based on jdk-11.0.8+7)

java.io.StreamCorruptedException: invalid type code: 00
        at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:3179) ~[?:?]
        at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:3213) ~[?:?]
        at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:3115) ~[?:?]
        at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:2501) ~[?:?]
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2097) ~[?:?]
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1971) ~[?:?]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2259) ~[?:?]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1788) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:548) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:458) ~[?:?]
        at com.redwood.scheduler.extensionpoint.impl.ExtensionPointHTTPSession.remapMap(ExtensionPointHTTPSession.java:259) ~[extension-point.jar:?]

The -Xint flag has no effect except making things slow (same stack).

I double checked with your previous linked jdk, but that starts up fine (and has the original error we reported).

@theresa-m
Copy link
Contributor

hmm okay thanks for trying it out. That options helps me narrow down there's no issues with the JIT ludcl optimization either. I'll keep digging into it.

@mreuvers
Copy link

Thank you, I hope you can find the problem. :)

@pwagland
Copy link
Author

Hi @theresa-m, I see that this didn't make 11.08, is there anything that we can do to help in the diagnosis?

@theresa-m
Copy link
Contributor

Thanks for the offer! I'll dive back into this on Monday and let you know, this is still high on my list for the September release.

@theresa-m
Copy link
Contributor

I'm still not sure why the latest change caused the StreamCorruptedException. I'm thinking of adding some logging statements into the next build but still thinking about what would be the most useful information to help diagnose the problem.

@theresa-m
Copy link
Contributor

Hello, I think the most likely place the optimization is failing is when going in and out of the HashMap class but I was hoping to confirm if you would be willing to try out this build: https://drive.google.com/file/d/1YbIq8mWEwCsiKa0s9F498wLrPPaCUpzQ/view?usp=sharing

It adds some information into the message of the exception. It prints current classloader names if you are comfortable with that: ibmruntimes/openj9-openjdk-jdk11@openj9...theresa-m:ludcl_refresh_2

@mreuvers
Copy link

mreuvers commented Aug 4, 2020

Thank you for the build, I'll hope to have a look at it later this week/next week and will get back to you then.

@mreuvers
Copy link

Hello, I've tested with your build and it fails and now with a large message containing your long log message (over 3000 lines).

Just to play it safe for legal reasons, can I send/upload the result somewhere privately to you? Thanks.

@theresa-m
Copy link
Contributor

Thanks! Oh no I had hoped it would be shorter. The openj9 public slack channel could be a good place to dm me. Probably not going to need all 3000 lines either.

This should be the link to join our slack if you are not already part of it: openj9 slack

Again thanks for your willingness to try things.

@theresa-m
Copy link
Contributor

theresa-m commented Aug 11, 2020

The main thing I'm looking for is the start where "cached ludcl is: " <ludcl> " expected ludcl is: " <ludcl> print different ludcls to help narrow down when there is a cache error. That should hopefully be near to the last few lines of output.

@DanHeidinga
Copy link
Member

Moving this to the Oct quarterly update but if a low risk fix is found before we ship, we can consider porting to the release branch for jdk15

@theresa-m
Copy link
Contributor

Hi @mreuvers I've sent a follow up re this issue on slack.

@tajila
Copy link
Contributor

tajila commented Apr 19, 2021

@ChengJin01 will be investigating this

@dnakamura
Copy link
Contributor

I'm currently looking at optimizations limiting the ammount of stack we need to walk, since in a recursive situation we only really need to walk back as far as any previous calls where we already checked for ludcl. Currently we do have functionality in the vm for a truncated stack walk as part of the ORB code. I'm currently experimenting with it to see if it can be reused here without breaking anything.

@dnakamura
Copy link
Contributor

Have current prototype of my changes. Idea is based on accelerations used in the ORB code (which can't be reused without breaking the ORB code).

In the orb helper code we provide a native method to internally "mark" the stack at a given point, and a special version of latestUserDefinedLoader which will terminate early if it hits the marked stack frame without finding a user defined loader. Of note is that the "marked" stack frame is not actually marked, rather when a frame is marked, its stack depth is stashed in a special field in the vmthread. To support nesting, when a frame is marked, the helper function returns the old value of this special field, which is then restored by passing it in to a "unmark" helper. Because this design uses global state, its not possible for it to support multiple simultaneous users without tight coordination.

As an alternative I'm proposing a new api which broadly follows the same idea, but eliminates the global state. The new api consists of 2 functions

  • a helper function to get an opaque stack token representing the current stack frame
  • an implementation of latestUserDefinedLoader which terminates early when it hits the provided stack token

Implementation : master...dnakamura:openj9:ludcl_new

also for reference the current ORB helpers: https://github.com/eclipse-openj9/openj9/blob/b9b733900ef25ba25435e8dd557a10f4e39a4fd6/runtime/jcl/common/orbvmhelpers.c#L150-L275

@tajila @gacholio Any comments on the proposed api?

@dnakamura
Copy link
Contributor

@tajila @gacholio Any comments on the proposed api?

@tajila
Copy link
Contributor

tajila commented Nov 21, 2022

@tajila @gacholio Any comments on the proposed api?

I think its a good approach for optimizing latestUserDefinedLoader but does it solve the functional issues with readObject?

Or is the plan to replace the existing optimizations with a new approach?

@gacholio
Copy link
Contributor

Given that the code can't compile, how was this tested?

@gacholio
Copy link
Contributor

I like the idea of explicit stacking - in the eventual solution we should probably get rid of the J9VMThread fields entierly.

Another solution would be to stack the fields on call-in (but I greatly prefer the JCL solution).

@dnakamura
Copy link
Contributor

@tajila the plan is to replace the existing optimization with the new approach. "
@gacholio it did compile at one time, but probably broke with rebasing etc. I was mainly looking to make sure everybody approved of the general idea before putting in the time to finish it off

also my prototype implementation of the JCL side
ibmruntimes/openj9-openjdk-jdk11@openj9...dnakamura:openj9-openjdk-jdk11:ludcl_new

Also my prototype

@tajila
Copy link
Contributor

tajila commented Apr 12, 2023

@gacholio any concerns with the approach

@gacholio
Copy link
Contributor

I think this is a good approach.

@jschmied
Copy link

Maybe related:

gwtproject/gwt#9789

@gacholio
Copy link
Contributor

@theresa-m Can you please take this up again now that Devin is gone?

@theresa-m
Copy link
Contributor

Yes I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment