Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for the victim node to apply the dirty offset #11350

Merged
merged 3 commits into from
Jun 13, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jun 12, 2023

Controller erasure test is supposed to validate if there is a mismatch
between the last appended entry in kvstore and controller max offset. In
order for the test to work correctly we must wait for all the messages
to be committed as we only delete the last segment that contains a
single message (new replicated configuration). In order to make the test
reliable change the condition to wait for the applied offset on the node
where controller log is going to be removed to be equal to the leader
dirty offset.

Fixes: #8217

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Sometimes controller log dirty offset may be helpful to understand the
gap between what is know to be committed and what is available in the
log.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Controller erasure test is supposed to validate if there is a mismatch
between the last appended entry in kvstore and controller max offset. In
order for the test to work correctly we must wait for all the messages
to be committed as we only delete the last segment that contains a
single message (new replicated configuration). In order to make the test
reliable change the condition to wait for the applied offset on the node
where controller log is going to be removed to be equal to the leader
dirty offset.

Fixes: redpanda-data#8217

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@@ -4050,16 +4050,19 @@ void admin_server::register_debug_routes() {
seastar::httpd::debug_json::get_controller_status,
[this](std::unique_ptr<ss::http::request>)
-> ss::future<ss::json::json_return_type> {
return _controller->get_last_applied_offset().then(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yikes. why wasn't this crashing before?

@piyushredpanda piyushredpanda merged commit f3ea988 into redpanda-data:dev Jun 13, 2023
@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11350-v23.1.x-16 remotes/upstream/v23.1.x
git cherry-pick -x f42a55e457b9002c9cd861abd3b1bcee95e7a0fd ed473916f961c9453365d882f0dc63c3a0230dc2 57fb4c055ad47ec02a5cc7e5990939aed78fe93a

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11350-v22.3.x-772 remotes/upstream/v22.3.x
git cherry-pick -x f42a55e457b9002c9cd861abd3b1bcee95e7a0fd ed473916f961c9453365d882f0dc63c3a0230dc2 57fb4c055ad47ec02a5cc7e5990939aed78fe93a

Workflow run logs.

@BenPope
Copy link
Member

BenPope commented Jun 14, 2023

Is this failure related or new? https://buildkite.com/redpanda/redpanda/builds/31277#0188b9b7-c41b-4adc-9a48-d93127c3a8dc

test_id:    rptest.tests.controller_erase_test.ControllerEraseTest.test_erase_controller_log.partial=True
status:     FAIL
run time:   54.762 seconds


    AttributeError("'NoneType' object has no attribute 'account'")
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 83, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/controller_erase_test.py", line 118, in test_erase_controller_log
    assert self.redpanda.search_log_node(bystander_node,
  File "/root/tests/rptest/services/redpanda.py", line 3436, in search_log_node
    for line in node.account.ssh_capture(
AttributeError: 'NoneType' object has no attribute 'account'

@@ -57,14 +57,14 @@ def test_erase_controller_log(self, partial):

# Stop the node we will intentionally damage
victim_node = self.redpanda.nodes[1]
bystander_node = self.redpanda.nodes[0]
bystander_node = self.redpanda.controller()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaslankaprv it looks like this is sometimes returning NoneType

    AttributeError("'NoneType' object has no attribute 'account'")
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 79, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/controller_erase_test.py", line 118, in test_erase_controller_log
    assert self.redpanda.search_log_node(bystander_node,
  File "/root/tests/rptest/services/redpanda.py", line 3433, in search_log_node
    for line in node.account.ssh_capture(
AttributeError: 'NoneType' object has no attribute 'account'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (search victim assert) in ControllerEraseTest.test_erase_controller_log
6 participants