Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Internal Server Error) in EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures #14898

Closed
StephanDollberg opened this issue Nov 10, 2023 · 34 comments · Fixed by #14978
Assignees
Labels
ci-failure ci-rca/infra CI Root Cause Analysis - Infrastructure Issue kind/bug Something isn't working team/devprod display on zenhub workspace for devprod team team/replication helper for jira sync

Comments

@StephanDollberg
Copy link
Member

StephanDollberg commented Nov 10, 2023

https://buildkite.com/redpanda/redpanda/builds/40810

Module: rptest.tests.e2e_shadow_indexing_test
Class: EndToEndShadowIndexingTestWithDisruptions
Method: test_write_with_node_failures
Arguments: {
    "cloud_storage_type": 1
}
test_id:    EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures
status:     FAIL
run time:   124.586 seconds

HTTPError('500 Server Error: Internal Server Error for url: http://docker-rp-24:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/__consumer_offsets/1')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 159, in wrapped
    self.redpanda.maybe_do_internal_scrub()
  File "/root/tests/rptest/services/redpanda.py", line 3880, in maybe_do_internal_scrub
    results = self.wait_for_internal_scrub(cloud_partitions)
  File "/root/tests/rptest/services/redpanda.py", line 3985, in wait_for_internal_scrub
    self._admin.reset_scrubbing_metadata(
  File "/root/tests/rptest/services/admin.py", line 1131, in reset_scrubbing_metadata
    return self._request(
  File "/root/tests/rptest/services/admin.py", line 363, in _request
    r.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://docker-rp-24:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/__consumer_offsets/1

JIRA Link: CORE-1573

@StephanDollberg StephanDollberg added ci-failure kind/bug Something isn't working labels Nov 10, 2023
@mmaslankaprv mmaslankaprv self-assigned this Nov 15, 2023
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Nov 15, 2023
When `persisted_stm::sync()` method fails it is indicating that the
current node is not longer a leader. The `sync()` executed before
`replicate` call in archival stm `command_batch_builder` prevents
replicate from being called. The end result for such an error is
deterministic and we can translate the sync error to `not_leader` error
code.

Fixes: redpanda-data#14898

Signed-off-by: Michal Maslanka <michal@redpanda.com>
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Nov 16, 2023
When `persisted_stm::sync()` method fails it is indicating that the
current node is not longer a leader. The `sync()` executed before
`replicate` call in archival stm `command_batch_builder` prevents
replicate from being called. The end result for such an error is
deterministic and we can translate the sync error to `not_leader` error
code.

Fixes: redpanda-data#14898

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 4dfdc53)
@abhijat
Copy link
Contributor

abhijat commented Nov 20, 2023

seen again in https://buildkite.com/redpanda/redpanda/builds/41402#018beb54-caf5-405c-b535-767865afcff5

====================================================================================================
test_id:    rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.ABS
status:     FAIL
run time:   2 minutes 7.351 seconds


    HTTPError('500 Server Error: Internal Server Error for url: http://docker-rp-4:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/__consumer_offsets/0')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 159, in wrapped
    self.redpanda.maybe_do_internal_scrub()
  File "/root/tests/rptest/services/redpanda.py", line 3917, in maybe_do_internal_scrub
    results = self.wait_for_internal_scrub(cloud_partitions)
  File "/root/tests/rptest/services/redpanda.py", line 4022, in wait_for_internal_scrub
    self._admin.reset_scrubbing_metadata(
  File "/root/tests/rptest/services/admin.py", line 1145, in reset_scrubbing_metadata
    return self._request(
  File "/root/tests/rptest/services/admin.py", line 363, in _request
    r.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://docker-rp-4:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/__consumer_offsets/0

@abhijat abhijat reopened this Nov 20, 2023
@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

1 similar comment
@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@dotnwat dotnwat added the team/replication helper for jira sync label Apr 18, 2024
@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

2 similar comments
@piyushredpanda piyushredpanda added team/devprod display on zenhub workspace for devprod team ci-rca/infra CI Root Cause Analysis - Infrastructure Issue labels Jun 18, 2024
@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@piyushredpanda
Copy link
Contributor

Closing older-bot-filed CI issues as we transition to a more reliable system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure ci-rca/infra CI Root Cause Analysis - Infrastructure Issue kind/bug Something isn't working team/devprod display on zenhub workspace for devprod team team/replication helper for jira sync
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants