-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test node operations with streaming (NOT with RBNO) #8289
Comments
@kbr-scylla @fruch |
The only question is if we wish to clone more jobs and change the method from RBNO to streaming. I prefer converting SOME of the existing jobs to use streaming instead of RBNO. I understand the concern with randomizing - it is less predicatable - but it also has its value as well, as it'll ensure we continue to test more features, they will be tested with streaming. I do suggest we just change some existing longevities to use streaming for the time being. |
Adding new, or converting some, both are o.k. Someone that owns the feature and its testing can take those calls. Since for quite some time it wasn't tested with streaming, we don't have any information regarding any part that is ready for that or not, at some point they did work with streaming. Picking the cases, should be random, but it might. We are not gonna randomize it at test run time, every time we did such a thing, it waste x10 of people time, chasing the wind with this no on remembered is randomized |
What exactly was the problem? Yes it can waste people's time, but I think it won't if we do it in a controlled manner: it must be clear which parameters were randomized in this test run, and how to rerun the test with exactly the same param values (most conveniently by passing the seed that was used) |
Note that there's a lot of randomness in longevity already. cassandra-stress loads are generated by random distributions. And this is the less convenient randomization case: you cannot really repeat what cassandra-stress did in a given run, it all depends on timing and the environment etc. The kind of randomization I propose is much more manageable |
I.e. someone would need to manage it, are you volunteering ? :) |
one idea would be enhancing SCT to include randomized params along with disruption name in Nemesis tab in Argus (and a log message in sct.log, close to disruption start if possible). So it's much clearer how/what we tested. |
it might be relevant, if it's a nemesis level call, |
I think that is good idea regardless, I think SCT needs more transparency in general, but I do not think it solves the randomization problem completely. By adding another layer of randomization we are lowering chances that the configuration we want happens, add additional level of review requires (did we run it with streaming or rbno, do we need to run again to test the other one as well?) and make in even less transparent that it is today. I also think randomization is not scalable, truth be told, current proposed solution (i.e. switching some tests to streaming) is also not scalable, and we need to discuss after this "hotfix" how to deal with these issues in the future, but at least it does not make SCT even less transparent |
We apparently do not test node operations with streaming - only with RBNO, which is the default. That's fine for the majority of the tests, but we need some sanity around streaming. Please make sure we have simple add/decommission/replace nemsis tests with streaming (RBNO disabled).
CC @kbr-scylla
The text was updated successfully, but these errors were encountered: