Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution for a problem: SL does not get resources on the node that was added during disrupt_add_remove_dc #7083

Open
juliayakovlev opened this issue Jan 10, 2024 · 0 comments
Assignees
Labels

Comments

@juliayakovlev
Copy link
Contributor

disrupt_sla_decrease_shares_during_load ran in parallel with disrupt_add_remove_dc.
disrupt_add_remove_dc nemesis adds a new node in a new DC and the removes it.

disrupt_sla_decrease_shares_during_load failed with error:

(Node 10.4.3.76) - Service level sl:sl500_f87475aa did not get resources unexpectedly. CPU%: 0.2

The node "10.4.3.76" is new added node. Service level did not get resources because of node did not get load (it is the nemesis).

Sam with next running disrupt_sla_increase_shares_during_load nemesis

I can add the validation that no load on the node.
But the question here: if it (no load on the node) is not expected and it is the problem - we want to report it as problem

What is the problem here
disrupt_add_remove_dc nemesis add node in the new DC.
Keyspace for SLA test case (namesis) is created with replication(strategy=NetworkTopologyStrategy,replication_factor=3). As result the load will not run on the new added node because it is located in new DC.

Possible solution
Ignore this node during validation.
How to understand that this is the case - the way needs be found

Argus: https://argus.scylladb.com/test/1aebcb86-a767-4ce5-a88a-c977ab077ddc/runs?additionalRuns%5B%5D=1e57a6c9-e21e-40e4-ba8f-2aa6d677438f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants