-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert CPU <50% at the end of ducktape tests #10939
Conversation
/ci-repeat |
8fd7ac4
to
d38293d
Compare
/ci-repeat 5 |
/ci-repeat 5 |
/ci-repeat 10 |
/ci-repeat 10 |
@@ -20,6 +21,15 @@ def __init__(self, test_context): | |||
super(SimpleK8sTest, self).__init__(test_context) | |||
self.redpanda = RedpandaServiceK8s(test_context, 1) | |||
|
|||
@property | |||
def debug_mode(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are reading an environment variable, and this property is read only from the cluster decorator, do we need this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting moving the read from the environment check directly into the cluster decorator?
I think there really should probably be some BaseTest
class that adds this to all our tests. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misread the code. I thought I was reading RedpandaTest class here.
Yeah, I'm not sure what would be best, probably it's appropriate to have this method/property in the base classes like you did. Personally, I would have just read the environment variable inside the cluster decorator, but it's not a great place either.
Did you try on a debug build and saw greater CPU utilization even when the test was done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try on a debug build and saw greater CPU utilization even when the test was done?
Yeah I added a comment in cluster
, but a ton of debug build tests where triggering this check, even for tests that seemingly did very little.
looks good, nice idea to use the metric uptime |
actual_utilization = (end_sample.value - | ||
start_sample.value) / actual_period | ||
shard_id = start_sample.labels["shard"] | ||
assert actual_utilization < max_utilization, f"Node: {node.name} shard: {shard_id} cpu utilization too high, actual: {actual_utilization}, expected: {max_utilization}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha i'm surprised the linter was cool with this long line
/ci-repeat |
In ducktape during test teardown, poll metrics to assert that we don't have a bunch of work that isn't being cleaned up properly. We poll over a 1 second interval, then fallback to a 5 second interval in the case of compaction happening in the background. Fixes: redpanda-data#10837 Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Force pushed to disable the check on a config test that disables metrics |
Assert that nodes have <50% CPU usage before teardown
In ducktape during test teardown, poll metrics to assert that we don't have a bunch of work that isn't being cleaned up properly.
We poll over a 1 second interval, then fallback to a 5 second interval in the case of compaction happening in the background.
These checks are disabled for now on a few tests that do not shutdown cleanly nodes, as the metrics requests fail. There should be followup work to enable the checks on those tests.
Fixes: #10837
Backports Required
Release Notes