Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (TimeoutError: failed to wait until status condition) in PartitionBalancerTest.test_fuzz_admin_ops #9315

Closed
ztlpn opened this issue Mar 8, 2023 · 10 comments
Labels

Comments

@ztlpn
Copy link
Contributor

ztlpn commented Mar 8, 2023

https://buildkite.com/redpanda/redpanda/builds/24565#0186bcc6-8627-41e6-87cd-e8e7fbc3f04b

Module: rptest.tests.partition_balancer_test
Class:  PartitionBalancerTest
Method: test_fuzz_admin_ops
test_id:    rptest.tests.partition_balancer_test.PartitionBalancerTest.test_fuzz_admin_ops
status:     FAIL
run time:   6 minutes 12.732 seconds


    TimeoutError('failed to wait until status condition')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/partition_balancer_test.py", line 607, in test_fuzz_admin_ops
    self.wait_until_ready(expected_unavailable_node=node)
  File "/root/tests/rptest/tests/partition_balancer_test.py", line 199, in wait_until_ready
    return self.wait_until_status(predicate, timeout_sec=timeout_sec)
  File "/root/tests/rptest/tests/partition_balancer_test.py", line 176, in wait_until_status
    return wait_until_result(
  File "/root/tests/rptest/util.py", line 88, in wait_until_result
    wait_until(wrapped_condition, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: failed to wait until status condition

Looks like one partition movement fails to finish:

[INFO  - 2023-03-07 16:39:42,034 - partition_balancer_test - check - lineno:166]: partition balancer status: {'status': 'in_progress', 'violations': {'unavailable_nodes': [1]}, 'seconds_since_last_tick': 0, 'current_reassignments_count': 1}
[DEBUG - 2023-03-07 16:39:44,035 - admin - _request - lineno:332]: Dispatching GET http://docker-rp-10:9644/v1/cluster/partition_balancer/status
[DEBUG - 2023-03-07 16:39:44,116 - admin - _request - lineno:355]: Response OK, JSON: {'status': 'in_progress', 'violations': {'unavailable_nodes': [1]}, 'seconds_since_last_tick': 2, 'current_reassignments_count': 1}
[INFO  - 2023-03-07 16:39:44,116 - partition_balancer_test - check - lineno:166]: partition balancer status: {'status': 'in_progress', 'violations': {'unavailable_nodes': [1]}, 'seconds_since_last_tick': 2, 'current_reassignments_count': 1}
[DEBUG - 2023-03-07 16:39:46,118 - admin - _request - lineno:332]: Dispatching GET http://docker-rp-1:9644/v1/cluster/partition_balancer/status
[DEBUG - 2023-03-07 16:39:46,139 - admin - _request - lineno:355]: Response OK, JSON: {'status': 'in_progress', 'violations': {'unavailable_nodes': [1]}, 'seconds_since_last_tick': 4, 'current_reassignments_count': 1}
[INFO  - 2023-03-07 16:39:46,139 - partition_balancer_test - check - lineno:166]: partition balancer status: {'status': 'in_progress', 'violations': {'unavailable_nodes': [1]}, 'seconds_since_last_tick': 4, 'current_reassignments_count': 1}
[DEBUG - 2023-03-07 16:39:48,141 - admin - _request - lineno:332]: Dispatching GET http://docker-rp-19:9644/v1/cluster/partition_balancer/status
[DEBUG - 2023-03-07 16:39:48,214 - admin - _request - lineno:355]: Response OK, JSON: {'status': 'in_progress', 'violations': {'unavailable_nodes': [1]}, 'seconds_since_last_tick': 1, 'current_reassignments_count': 1}
[INFO  - 2023-03-07 16:39:48,214 - partition_balancer_test - check - lineno:166]: partition balancer status: {'status': 'in_progress', 'violations': {'unavailable_nodes': [1]}, 'seconds_since_last_tick': 1, 'current_reassignments_count': 1}
@jcsp
Copy link
Contributor

jcsp commented Mar 8, 2023

@jcsp
Copy link
Contributor

jcsp commented Mar 9, 2023

As #9340 , this can also occur on PartitionBalancerTest.test_movement_cancellations (https://buildkite.com/redpanda/redpanda/builds/24655)

@jcsp
Copy link
Contributor

jcsp commented Mar 9, 2023

Seen here in PartitionBalancerTest.test_maintenance_mode.kill_same_node=False
(https://buildkite.com/redpanda/redpanda/builds/24695#0186c5d2-5cfc-4092-bf76-a0e67a1fece9)

@VladLazar
Copy link
Contributor

@bharathv
Copy link
Contributor

bharathv commented May 3, 2023

@abhijat
Copy link
Contributor

abhijat commented May 16, 2023

@VladLazar
Copy link
Contributor

FAIL test: PartitionBalancerTest.test_fuzz_admin_ops (1/69 runs)
  failure at 2023-05-22T07:29:22.666Z: TimeoutError('failed to wait until status condition')
      on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/29558#01884228-95f7-4df0-9030-8aec9a6c889d

@piyushredpanda
Copy link
Contributor

Closing old issues that have not occurred in 2 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants