Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archival/test: Ensure archiver is stopped #15169

Merged

Conversation

abhijat
Copy link
Contributor

@abhijat abhijat commented Nov 28, 2023

If test_manifest_spillover fails, its archiver is destroyed without stopping. This also destroys the scrubber, which may have requests active keeping its gate open.

This triggers an assertion from seastar related to gate destroyed while requests active. Usually this is prominently visible but when running multiple tests the SIGABRT results in the ctest invocation hanging in CI. This causes the build to hang and timeout.

NOTE: The root cause is the test_manifest_spillover sometimes not being able to add segments to the archival meta STM, which can be investigated separately. This change should enable that test to fail in a visible manner with logs seen in CI instead of hanging.

FIXES #14254

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

If the test affected by this change fails, archiver is destroyed without
stopping. This also destroys the scrubber, which may have requests
active keeping its gate open.

This triggers an assertion from seastar related to gate destroyed while
requests active. Usually this is prominently visible but in this test it
results in the ctest invocation hanging in CI.

This causes the build to hang and timeout. The root cause is the
test_manifest_spillover not adding segments to the archival meta STM,
which can be investigated separately. This change should enable the test
to fail in a visible manner with logs seen in CI instead of hanging.
@abhijat abhijat requested a review from Lazin November 28, 2023 06:14
@abhijat abhijat marked this pull request as ready for review November 28, 2023 06:14
@abhijat abhijat requested a review from VladLazar November 28, 2023 06:16
@vbotbuildovich
Copy link
Collaborator

Copy link
Contributor

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do you have any thoughts if we can prevent hanging too ("but when running multiple tests the SIGABRT")?

@abhijat
Copy link
Contributor Author

abhijat commented Nov 28, 2023

LGTM. Do you have any thoughts if we can prevent hanging too ("but when running multiple tests the SIGABRT")?

One way to prevent hanging that I was able to test locally is to supply --catch_system_errors=no to the binary but then the program just crashes and exits on assert failure and we don't get the test report etc. Perhaps there is a way to get boost test to handle the signal correctly.

@abhijat
Copy link
Contributor Author

abhijat commented Nov 28, 2023

.. but then the program just crashes and exits on assert failure

We have had assert failures in tests in the past which do not result in a hang of the ctest run, so maybe my assumption that the abort caused hang is mistaken, afaik in assert failures in the past we got a backtrace of the assertion (usually it is the same one as here - gate destroyed with open requests). So something might be different here which causes this hang.

@abhijat abhijat merged commit fc0c2d7 into redpanda-data:dev Nov 28, 2023
21 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (timeout) in test_archival_service_rpfixture
3 participants