-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] onShardResult
and onShardFailure
are executed on one shard causes opensearch jvm crashed
#11881
Comments
onShardResult
and onShardFailure
are executed on one shard causes opensearch jvm crashonShardResult
and onShardFailure
are executed on one shard causes opensearch jvm crashed
@reta Sounds like you have experience with this kind of issue, can you take a look? |
@kkewwei by any chance, do you have a reproducer for this issue? or at least some hints how to reproduce it? thank you. |
@reta , I will try to reproducer for this issue recently, if possible, can you assign it to me, I'm pleasure try to fix it. |
@reta It's can be reproduced changing the test code: OpenSearch/server/src/test/java/org/opensearch/action/search/AbstractSearchAsyncActionTests.java Line 194 in a8dd6a0
then you run the unit test |
Thanks @kkewwei , I am wondering if that what happens in the wild: the listener usually being called only once |
@reta I think the core code is
Here are the simplified steps, which also can be seen from the exception about: Phase1: Phase2:
Phase3: |
@reta, because it is most likely a bug, so I create pr in advance, pleas help review. |
Describe the bug
In #3626, we add
onPhaseFailure
trying to catch the exception throwing fromonPhaseDone
, it can avoid jvm crash in some case.But
onPhaseFailure
may throw exception, we encountered the new case in os2.9.0, and see from the stack that task cancellation in multisearch throws TaskCancelledException in theonPhaseFailure
, which causes onShardResult and onShardFailure are executed on one shard, then lead to jvm crashed.Expected behavior
We can see the root case is that TransportAction.execute unexpectedly threw an TaskCancelledException, if we should do as follows.
I'm pleasure to fix it.
Related component
Search
To Reproduce
If needed or unclear, I will try to reproduce it.
Additional Details
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: