Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Segmentation fault) in RandomNodeOperationsTest.test_node_operations #16023

Closed
vbotbuildovich opened this issue Jan 9, 2024 · 2 comments
Assignees
Labels
area/storage auto-triaged used to know which issues have been opened from a CI job ci-failure sev/high loss of availability, pathological performance degradation, recoverable corruption

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 9, 2024

https://buildkite.com/redpanda/vtools/builds/11286

Module: rptest.tests.random_node_operations_test
Class: RandomNodeOperationsTest
Method: test_node_operations
{
  "enable_failures": true,
  "num_to_upgrade": 3,
  "with_tiered_storage": true
}
test_id:    rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.num_to_upgrade=3.with_tiered_storage=True
status:     FAIL
run time:   19 minutes 23.632 seconds


    <NodeCrash ip-172-31-3-189: Segmentation fault on shard 0.
>
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/random_node_operations_test.py", line 428, in test_node_operations
    executor.execute_operation(op)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 439, in execute_operation
    self.recommission(operation.node)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 378, in recommission
    wait_until(recommissioned, timeout_sec=self.timeout, backoff_sec=1)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 103, in wrapped
    redpanda.raise_on_crash(log_allow_list=log_allow_list)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2670, in raise_on_crash
    raise NodeCrash(crashes)
rptest.services.utils.NodeCrash: <NodeCrash ip-172-31-3-189: Segmentation fault on shard 0.
>

JIRA Link: CORE-1708

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels Jan 9, 2024
@michael-redpanda michael-redpanda changed the title CI Failure (key symptom) in RandomNodeOperationsTest.test_node_operations CI Failure (cannot double register same ntp) in RandomNodeOperationsTest.test_node_operations Jan 10, 2024
@ztlpn ztlpn changed the title CI Failure (cannot double register same ntp) in RandomNodeOperationsTest.test_node_operations CI Failure (Segmentation fault) in RandomNodeOperationsTest.test_node_operations Jan 11, 2024
@ztlpn
Copy link
Contributor

ztlpn commented Jan 11, 2024

Pandatriage incorrectly merged this node crash with an old one. The newer one is a segfault.

@piyushredpanda piyushredpanda added the sev/high loss of availability, pathological performance degradation, recoverable corruption label Mar 23, 2024
@mmaslankaprv mmaslankaprv self-assigned this Apr 11, 2024
@mmaslankaprv
Copy link
Member

closing this one as this is duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage auto-triaged used to know which issues have been opened from a CI job ci-failure sev/high loss of availability, pathological performance degradation, recoverable corruption
Projects
None yet
Development

No branches or pull requests

5 participants