-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disrupt_disable_binary_gossip_execute_major_compaction
nemesis always fails on it's gate closed
assertion not checking CQL
and UN
state as it should
#6819
Comments
This was referenced Nov 17, 2023
Since @ShlomiBalalis isn't available currently, we'll need to handle it ourselves |
vponomaryov
added a commit
to vponomaryov/scylla-cluster-tests
that referenced
this issue
Jan 23, 2024
Check the gossip status and CQL workability in the end of the 'disrupt_disable_binary_gossip_execute_major_compaction' nemesis instead of looking for the 'gate closed' message in DB logs. Fixes: scylladb#6819
2 tasks
fruch
pushed a commit
that referenced
this issue
Jan 23, 2024
Check the gossip status and CQL workability in the end of the 'disrupt_disable_binary_gossip_execute_major_compaction' nemesis instead of looking for the 'gate closed' message in DB logs. Fixes: #6819
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Running the
disrupt_disable_binary_gossip_execute_major_compaction
nemesis against the DB node using2023.1.2
Scylla version on K8S the following error appears all the time:DB node logs indeed have the
gate closed
kind of messages for long time.But it is the only problem I observe about it - flood of those messages:
According to the https://github.com/scylladb/scylla-enterprise/issues/2897 we should check that CQL and gossiper work.
So, if we look at the
nodetool status
results, we see that it is considered to beUN
:Then, monitoring shows that the load is dynamic on it, which we can consider as
it serves requests
:So, it turns out that the referenced nemesis checks wrong thing.
It should check the
UN
state andCQL
availability, which was the original problem.Also, it must restart that node in case of the found bug, because it is the remedy for the bug.
Impact
Probably false nemesis failure?
How frequently does it reproduce?
100% using
2023.1.2
Scylla versionInstallation details
Kernel Version: 5.10.198-187.748.amzn2.x86_64
Scylla version (or git commit hash):
2023.1.2-20231001.646df23cc4b3
with build-id367fcf1672d44f5cbddc88f946cf272e2551b85a
Operator Image: scylladb/scylla-operator:1.11.0
Operator Helm Version: v1.11.0
Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/stable
Cluster size: 4 nodes (i4i.4xlarge)
Scylla Nodes used in this run (pod Ips ephemeral):
sct-cluster-eu-north-1-rack-1-0 | 10.0.11.23
sct-cluster-eu-north-1-rack-1-1 | 10.0.8.238
sct-cluster-eu-north-1-rack-1-2 | 10.0.11.5
sct-cluster-eu-west-1-rack-1-0 | 10.4.9.183
sct-cluster-eu-west-1-rack-1-1 | 10.4.9.133
sct-cluster-eu-west-1-rack-1-2 | 10.4.8.142
OS / Image: `` (k8s-eks: multi-dc:
eu-north-1
and `eu-west-1`)Test:
vp-longevity-scylla-operator-multidc-12h-eks
Test id:
181576c2-fca5-483c-a520-ce2108b9874a
Test name:
scylla-staging/valerii/vp-longevity-scylla-operator-multidc-12h-eks
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 181576c2-fca5-483c-a520-ce2108b9874a
$ hydra investigate show-logs 181576c2-fca5-483c-a520-ce2108b9874a
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: