Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (NodeCrash) in ShutdownTest.test_timely_shutdown_with_failures #12659

Closed
andijcr opened this issue Aug 8, 2023 · 9 comments · Fixed by #13739
Closed

CI Failure (NodeCrash) in ShutdownTest.test_timely_shutdown_with_failures #12659

andijcr opened this issue Aug 8, 2023 · 9 comments · Fixed by #13739
Assignees
Labels
ci-failure kind/bug Something isn't working sev/high loss of availability, pathological performance degradation, recoverable corruption

Comments

@andijcr
Copy link
Contributor

andijcr commented Aug 8, 2023

https://buildkite.com/redpanda/redpanda/builds/34603

Module: rptest.tests.timely_shutdown_test
Class: ShutdownTest
Method: test_timely_shutdown_with_failures
test_id:    ShutdownTest.test_timely_shutdown_with_failures
status:     FAIL
run time:   215.602 seconds

<NodeCrash docker-rp-2: Redpanda process unexpectedly stopped>
Traceback (most recent call last):
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/timely_shutdown_test.py", line 105, in test_timely_shutdown_with_failures
    self.redpanda.restart_nodes(leader)
  File "/root/tests/rptest/services/redpanda.py", line 891, in restart_nodes
    list(
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/root/tests/rptest/services/redpanda.py", line 892, in <lambda>
    executor.map(lambda n: self.stop_node(n, timeout=stop_timeout),
  File "/root/tests/rptest/services/redpanda.py", line 3141, in stop_node
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Redpanda node docker-rp-2 failed to stop in 30 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 103, in wrapped
    redpanda.raise_on_crash(log_allow_list=log_allow_list)
  File "/root/tests/rptest/services/redpanda.py", line 2838, in raise_on_crash
    raise NodeCrash(crashes)
rptest.services.utils.NodeCrash: <NodeCrash docker-rp-2: Redpanda process unexpectedly stopped>

docker-rp-2 shows a normal log, with no apparent failure, this might be a test debug timing issue

@rockwotj
Copy link
Contributor

@rystsov rystsov changed the title CI Failure (TimeoutError: Redpanda node docker-rp-2 failed to stop in 30 seconds) in ShutdownTest.test_timely_shutdown_with_failures CI Failure (NodeCrash) in ShutdownTest.test_timely_shutdown_with_failures Aug 16, 2023
@abhijat
Copy link
Contributor

abhijat commented Aug 29, 2023

@rockwotj
Copy link
Contributor

rockwotj commented Sep 6, 2023

@piyushredpanda piyushredpanda added the sev/high loss of availability, pathological performance degradation, recoverable corruption label Sep 23, 2023
@dotnwat
Copy link
Member

dotnwat commented Sep 23, 2023

docker-rp-2 shows a normal log, with no apparent failure, this might be a test debug timing issue

yeh, agree. timing issue sounds right, or something else on the ducktape side--the logs look fine.

@piyushredpanda
Copy link
Contributor

This has been happening a lot -- requesting @nvartolomei to take a look.

@nvartolomei
Copy link
Contributor

Investigation and proposed fix #13739

@bharathv
Copy link
Contributor

@nvartolomei can you please paste the crashing stack here for future reference (pattern matching existing/fixed issues etc).

@nvartolomei
Copy link
Contributor

@bharathv there is no crash observed. The only externally observed behavior what I see is that the system didn't shut down "timely".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working sev/high loss of availability, pathological performance degradation, recoverable corruption
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants