-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disrupt_decommission_streaming_err - "status in nodetool.status is UL, but status in gossip NORMAL" #7067
Comments
something should take the node out of UL, maybe the behavior changed and the reboot that used to stop the decommission also restored the node from UL to UN. We need to make sure this nemesis isn't broken and won't fail our runs. |
For sure rebuild isn't changing any state This nemesis has so many options, we can't really tell what flow it's doing, and which is a correct behavior or not. I still think it needs to be broken into different nemesis, which would do a specific well defined process, way too many degrees of freedom in that one |
The problem is that
we can increase timeouts for nodetool as fast patch |
@aleksbykov not sure I understand, what ran more than 3 minutes? the decommission command? or nodetool status command? |
Start running staging jobs with increased nodetool duration |
With fix job is passed. |
PR: #7144 |
pr merged |
Happens in:
https://argus.scylladb.com/test/a5d1f97b-064a-40ed-a517-70e2092b51c2/runs?additionalRuns[]=38e1b036-3163-4f2a-92f4-5f66f3b0a116,
Discussion from @temichus:
I see that we hit
except NodeStayInClusterAfterDecommission:
self.log.debug('The decommission of target node is successfully interrupted')
< t:2023-12-31 19:05:59,665 f:nemesis.py l:3777 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: The decommission of target node is successfully interrupted
nodetool status output
one node in UL state
then function wait_node_fully_start calls tats for some reason wants "all nodes to be Up Normal"
but I think UL state will be removed only after
self.target_node.run_nodetool(sub_cmd="rebuild", retry=0)
It looks like SCT logic issue. @fruch please double-check me
cc @aleksbykov
The text was updated successfully, but these errors were encountered: