-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New eclipselink dead lock scenario discovered the logic of the sessions.MergeManager.mergeChangesOfWorkingCopyIntoOriginal is not safe when it calls upon AbstractSession.getCacheKeyFromTargetSessionForMerge #2094
Comments
Note: This bug will be existing on every version of eclipse link 2.7.6 and above. The version of eclipselink being used in this tracker is essential a 2.7.6 with patches. Therefore, the closest eclipselink version to look at in terms of the stack traces presented is the 2.7.9: That being said, I believe the core points of the stack traces of eclipselink for the interpreation of the analysis are all layed our bare in the analysis above. If we need a one liner summary for this tracker is that: We have a new pattern for eclipselnk deadlocks, whereby the org.eclipse.persistence.internal.sessions.MergeManager.mergeChangesOfWorkingCopyIntoOriginal(Object, ObjectChangeSet, ClassDescriptor, AbstractSession, UnitOfWorkImpl) goes into a "all-or-nothing" agressive attempt to acquire cache key with "deferral + followup eternal wait". This all-or-nothing bet that the thread will just eventually stop waiting is a losing bet, there is very real risk that somebody else is holding onto keys of the agressive thread, and if anyone is holding onto keys of the agreesive 137 thread, the it is game over. Our thread 444 did everything correct. Not really a one liner, but it makes the picture of the dead lock super clear. |
Today I have been provided some data of a new event of this sort in a different project. This data will be provided via a private channel to the eclipselink team. |
I intend to share my current approach for minimizing the risk of encountering this deadlock pattern. However, I must emphasize that this solution falls short of my ideal vision. The approach I'm pursuing involves a substantial rewrite of the method:
Here are the key modifications:
In summary, I hope we can surpass the current implementation by addressing this defect effectively. |
Here, with this post, I am uploading my patch. It gives an overall idea of what I changed in the code. 0001-TRK-32421-Implement-an-attempt-to-fix-for-issue.patch I am also uploading all of the core files i have changed. full_classes_related_to_patch_tempFilesToUpload.zip Overview of Uploaded Files: Key Method for Code Analysis: org.eclipse.persistence.internal.sessions.AbstractSession.getCacheKeyFromTargetSessionForMergeScenarioMergeManagerNotNullAndCacheKeyOriginalStillNull(CacheKey, Object, ObjectBuilder, ClassDescriptor, MergeManager) This method is positioned just above the older implementation, which has now been renamed to: org.eclipse.persistence.internal.sessions.AbstractSession.getCacheKeyFromTargetSessionForMergeDeprecated(Object, ObjectBuilder, ClassDescriptor, MergeManager) Begin your code analysis here, as it holds significant importance in the context of your changes. To execute this code, ensure that your <!--
Possibilities are:
"backwards-compatibility" or "wait-loop-with-time-limit"
The option "backwards compatibility" is not recommended since it may lead to a deadlock pattern.
For more details, refer to:
GitHub Issue #2094
-->
<property name="eclipselink.concurrency.manager.allow.abstractsession.getCacheKeyFromTargetSessionForMerge.modeofoperation" value="wait-loop-with-time-limit"/> This property configuration is crucial for managing concurrency, and it's advisable to avoid the "backwards compatibility" option due to the potential deadlock issue. For further context, please review the provided GitHub issue. |
In order to check the new code, I have had to heavily manipulate my debugger to override local variables so that I could finally get into the: org.eclipse.persistence.internal.sessions.AbstractSession.getCacheKeyFromTargetSessionForMergeScenarioMergeManagerNotNullAndCacheKeyOriginalStillNull(CacheKey, Object, ObjectBuilder, ClassDescriptor, MergeManager) Here is what the new massive dump with the new page 08 looks like. massive_dump_generated_by_2.7.6-wls14_1_1_0v003_2024_04_09_attempt_001.txt Refining Code Execution in Manipulated EclipseLink While working with the manipulated JAR file of EclipseLink, I successfully exercised the new code. The specific version we're using is 2.7.6-wls14_1_1_0v003_2024_04_09_attempt_001. This JAR includes not only the bugfix for the issue at hand but also integrates all other relevant bugfixes. You can find details about the issue on GitHub. Unfortunately, I lack the means to reproduce the bug directly. Consequently, my approach involves strategically placing debug breakpoints during the merge phase of EclipseLink. Specifically, I focus on the following methods:
By manipulating these critical areas, I can effectively exercise the new code. I'll keep you informed if we manage to eliminate this new deadlock pattern. However, it's essential to acknowledge that our current approach to patching this bug falls short of the ideal solution. I have already explained that the ideal solution here consists on the merge manager release 100% of its locks, stopping its attempt to update the server session cache, and flagging all cache keys as invalid / stale. That is the proper way to go. My approach my prove to not be sufficiently effective, we shall see. |
How is Entity
|
We have discovered a completely new dead-lock pattern never encountered before.
Until now, we have been conviced that with the persistence.xml settings
And all the development to diagnose and prevent dead-locks that we would be out of the woods.
For the most part, this appears to be the case, but we have today discover a new scenario where dead-locks can indeed happen in the concurrency manager cache.
In this specific dead-lock, we have thread committing a transaction.
Thread Name: [ACTIVE] ExecuteThread: '444' for queue: 'weblogic.kernel.Default (self-tuning)'
This thread is working on commmiting a business transaction and it is in commit step where it politely waits for th release deferred locks algorithm to comlete.
Here is what it looks like:
So far so good.
This thread 444 is doing what ti is supposed to do.
This thread is essentially getting burned/waiting for the thread 137 to finish its job.
So up until this point, we can say nothing to worry about.
The system would not be on a route dead lock.
But now comes the trouble/dead lock.
We now move over to our thread 137.
Out thread 137, has so far in its business logic found itself deferring on a total of 11 cache key owned by thread 444.
Here is an obfuscated listé
And up until here we do not necessarily have a problem.
The thread 444 is waiting for cache keys of thread 137 to be released or otherwise for thread 137 to also evolve to releaseDeferredLocks.
And thread 137 wants eleven cache keys that are being actively owned by the thread 444 but has used teh defer approach on the cach key.
So far, one would say "nothing out of the ordinary".
But then it comes the problem.
What is thread 137 actually doing right now?
It is doing the following:
The above stack trace shows an area of code that we have never manipulated to avoid dead locks.
It is the very first time i encoutner this specific dead lock pattern.
The thread 137 is not behaving properly.
It essentially as part of the merge process went into:
org.eclipse.persistence.internal.sessions.MergeManager.mergeChangesOfWorkingCopyIntoOriginal(Object, ObjectChangeSet, ClassDescriptor, AbstractSession, UnitOfWorkImpl)
Following that, it went into this method:
org.eclipse.persistence.internal.sessions.AbstractSession.getCacheKeyFromTargetSessionForMerge(Object, ObjectBuilder, ClassDescriptor, MergeManager)
I am not sure what is the right patch for this.
The very first obvious thing is that Eternal waits without a timeout are no go.
The second point I would have is that if feasible the logic of this method should somehow memic the logic we can find i the
org.eclipse.persistence.internal.helper.WriteLockManager.acquireRequiredLocksInternal(MergeManager, UnitOfWorkChangeSet)
The acquire required locks logic put the thread wanting its locks for writing behaving nicely.
If the thread is not getting what it wants that code wil then do
The text was updated successfully, but these errors were encountered: