Assert CPU <50% at the end of ducktape tests #10939

rockwotj · 2023-05-22T18:25:05Z

Assert that nodes have <50% CPU usage before teardown

In ducktape during test teardown, poll metrics to assert that we don't have a bunch of work that isn't being cleaned up properly.

We poll over a 1 second interval, then fallback to a 5 second interval in the case of compaction happening in the background.

These checks are disabled for now on a few tests that do not shutdown cleanly nodes, as the metrics requests fail. There should be followup work to enable the checks on those tests.

Fixes: #10837

Backports Required

Release Notes

none

rockwotj · 2023-05-24T04:13:57Z

/ci-repeat

rockwotj · 2023-06-02T03:09:00Z

CI Failures:

rockwotj · 2023-06-05T14:04:53Z

/ci-repeat 5
release
skip-unit
dt-repeat=100

rockwotj · 2023-06-05T19:09:22Z

/ci-repeat 5
release
skip-unit
dt-repeat=10
tests/rptest/test_suite_quick.yml

rockwotj · 2023-06-06T14:16:24Z

/ci-repeat 10
release
skip-unit
tests/rptest/test_suite_quick.yml

rockwotj · 2023-06-07T14:56:46Z

/ci-repeat 10
release
skip-unit
tests/rptest/test_suite_quick.yml

andijcr · 2023-06-07T16:35:53Z

tests/rptest/k8s_tests/simple_k8s_test.py

@@ -20,6 +21,15 @@ def __init__(self, test_context):
        super(SimpleK8sTest, self).__init__(test_context)
        self.redpanda = RedpandaServiceK8s(test_context, 1)

+    @property
+    def debug_mode(self):


Since we are reading an environment variable, and this property is read only from the cluster decorator, do we need this here?

Are you suggesting moving the read from the environment check directly into the cluster decorator?

I think there really should probably be some BaseTest class that adds this to all our tests. Thoughts?

Sorry, I misread the code. I thought I was reading RedpandaTest class here.

Yeah, I'm not sure what would be best, probably it's appropriate to have this method/property in the base classes like you did. Personally, I would have just read the environment variable inside the cluster decorator, but it's not a great place either.

Did you try on a debug build and saw greater CPU utilization even when the test was done?

Did you try on a debug build and saw greater CPU utilization even when the test was done?

Yeah I added a comment in cluster, but a ton of debug build tests where triggering this check, even for tests that seemingly did very little.

andijcr · 2023-06-07T16:37:00Z

looks good, nice idea to use the metric uptime

dotnwat · 2023-06-09T04:46:11Z

tests/rptest/services/redpanda.py

+        actual_utilization = (end_sample.value -
+                              start_sample.value) / actual_period
+        shard_id = start_sample.labels["shard"]
+        assert actual_utilization < max_utilization, f"Node: {node.name} shard: {shard_id} cpu utilization too high, actual: {actual_utilization}, expected: {max_utilization}"


haha i'm surprised the linter was cool with this long line

rockwotj · 2023-06-09T11:26:34Z

/ci-repeat

In ducktape during test teardown, poll metrics to assert that we don't have a bunch of work that isn't being cleaned up properly. We poll over a 1 second interval, then fallback to a 5 second interval in the case of compaction happening in the background. Fixes: redpanda-data#10837 Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj · 2023-06-12T14:23:42Z

Force pushed to disable the check on a config test that disables metrics

rockwotj force-pushed the rockwood/idle branch from 433a6c1 to eefc1a2 Compare May 22, 2023 18:30

rockwotj force-pushed the rockwood/idle branch 20 times, most recently from 8fd7ac4 to d38293d Compare June 1, 2023 16:51

rockwotj force-pushed the rockwood/idle branch from d38293d to 7adb00d Compare June 2, 2023 03:36

rockwotj mentioned this pull request Jun 6, 2023

CI Failure (SEGV) in rpk:smoke-test-rpk-container #11230

Closed

rockwotj marked this pull request as ready for review June 7, 2023 14:56

rockwotj requested review from andijcr and andrwng June 7, 2023 15:34

andijcr reviewed Jun 7, 2023

View reviewed changes

andijcr previously approved these changes Jun 7, 2023

View reviewed changes

dotnwat reviewed Jun 9, 2023

View reviewed changes

rockwotj dismissed andijcr’s stale review via 1154050 June 12, 2023 14:23

rockwotj force-pushed the rockwood/idle branch from 7adb00d to 1154050 Compare June 12, 2023 14:23

rockwotj requested a review from andijcr June 12, 2023 14:23

andijcr approved these changes Jun 12, 2023

View reviewed changes

rockwotj merged commit c0f9133 into redpanda-data:dev Jun 13, 2023

rockwotj mentioned this pull request Jun 14, 2023

Revert "Assert CPU <50% at the end of ducktape tests" #11423

Merged

7 tasks

michael-redpanda mentioned this pull request Jun 16, 2023

CI Failure (AssertionError: Expected successful HTTP response: 404) in UpgradeToLicenseChecks.test_basic_upgrade #11460

Closed

rockwotj deleted the rockwood/idle branch October 16, 2023 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assert CPU <50% at the end of ducktape tests #10939

Assert CPU <50% at the end of ducktape tests #10939

rockwotj commented May 22, 2023 •

edited

Loading

rockwotj commented May 24, 2023

rockwotj commented Jun 2, 2023

rockwotj commented Jun 5, 2023

rockwotj commented Jun 5, 2023

rockwotj commented Jun 6, 2023

rockwotj commented Jun 7, 2023

andijcr Jun 7, 2023

rockwotj Jun 7, 2023

andijcr Jun 7, 2023

rockwotj Jun 7, 2023

andijcr commented Jun 7, 2023

dotnwat Jun 9, 2023 •

edited

Loading

rockwotj commented Jun 9, 2023

rockwotj commented Jun 12, 2023

Assert CPU <50% at the end of ducktape tests #10939

Assert CPU <50% at the end of ducktape tests #10939

Conversation

rockwotj commented May 22, 2023 • edited Loading

Backports Required

Release Notes

rockwotj commented May 24, 2023

rockwotj commented Jun 2, 2023

rockwotj commented Jun 5, 2023

rockwotj commented Jun 5, 2023

rockwotj commented Jun 6, 2023

rockwotj commented Jun 7, 2023

andijcr Jun 7, 2023

Choose a reason for hiding this comment

rockwotj Jun 7, 2023

Choose a reason for hiding this comment

andijcr Jun 7, 2023

Choose a reason for hiding this comment

rockwotj Jun 7, 2023

Choose a reason for hiding this comment

andijcr commented Jun 7, 2023

dotnwat Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

rockwotj commented Jun 9, 2023

rockwotj commented Jun 12, 2023

rockwotj commented May 22, 2023 •

edited

Loading

dotnwat Jun 9, 2023 •

edited

Loading