Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI failure in test_concurrent_append_flush #15271

Merged

Conversation

travisdowns
Copy link
Member

In test_concurrent_append_flush, which is a fuzzer style test,
we now get() all futures returned by flush calls during the fuzz
portion, instead of only the last flush.

It is possible in some cases for prior futures to be unavailable
even after the last future has resolved which caused occasional
CI failures. See 13035 for more analysis.

Fixes #13035.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

Add stable_offset, flushed_offset and merged writes count to the
stream appender.
run_concurrent_append_flush is a fuzzer-like test and we may have
hard-to-diagnose failures there (e.g., see issue redpanda-data#13035) and to help
diagnose it we want to capture
some information from the segment_appender at each step of the
test.

Introduce segment_appender_info to do this.
@travisdowns travisdowns requested a review from andijcr December 2, 2023 01:36
@travisdowns travisdowns force-pushed the td-segment-appender-flush-order branch from 935fd00 to 081b508 Compare December 2, 2023 23:35
Copy link
Contributor

@andijcr andijcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only issue when the first action is a WAIT_APPEND, the rest are nits

config::mock_binding<size_t>(std::move(fallocate_size)));
auto appender = make_segment_appender(f, resources);
auto seg_file = open_file(filename);
storage::storage_resources resources(config::mock_binding(+fallocate_size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stray + in +fallocate_size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andijcr - actually it's not stray, it's needed to make it an rvalue, because config::mock_binding is declared in a way that requires an rvalue argument if you want to rely on template parameter deduction, unfortunately. I do plan to fix this, but this is one workaround for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do plan to fix this

To clarify I mean mock_binding could be fixed to avoid this problem.

vassert(false, "bad kind");
}();

return fmt::format("{:12}: {}", astr + extra, info.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related but one day we should bring magic_enum into the codebase or reimplement part of the functionality

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andijcr absolutely! I've never used magic_enum specifically but I definitely feel the pain of enum boilerplate every time I create a new enum in C++.

break;
case action::FLUSH:
futs.push_back(appender.flush());
// current_action.flush_future = appender.flush();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stray comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

andijcr
andijcr previously approved these changes Dec 11, 2023
Relates to log_segment_appender_test::test_concurrent_append_flush,
which is a fuzzer-style test, and output it when we fail.

In storage_single_thread_rpunit concurrent flush test we now log
test context which will be printed if the test fails. Critically this
includes the seem used to generate the random series of actions to
be performed on the appender.

In addition we generate a single seed per invocation and then use that
seed rather than the random helper methods which use an unspecified
random seed each time.

Finally we record more information about the operations performed in
test and output the full action sequence on failure.

Issue redpanda-data#13035.
In test_concurrent_append_flush, which is a fuzzer style test,
we now get() all futures returned by flush calls during the fuzz
portion, instead of only the last flush.

It is possible in some cases for prior futures to be unavailable
even after the last future has resolved which caused occasional
CI failures. See 13035 for more analysis.

Fixes redpanda-data#13035.
@travisdowns travisdowns force-pushed the td-segment-appender-flush-order branch from 222b8df to cc82d0d Compare December 11, 2023 18:59
@travisdowns travisdowns requested a review from andijcr December 11, 2023 19:00
@travisdowns
Copy link
Member Author

Looks like debug unit tests are timing out, I'll have a look.

@vbotbuildovich
Copy link
Collaborator

@travisdowns travisdowns merged commit 3e44a5d into redpanda-data:dev Dec 18, 2023
19 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-15271-v23.2.x-789 remotes/upstream/v23.2.x
git cherry-pick -x 664115acdc30e06d0509addbd7e33c99b7162084 4e4a1e32d808fbf4a37cfd6f146fc6f8cbdfb5fb b02c28c35a05121d8a82e495c89e81a4bb36c20e cc82d0d14cfbd811de014d9c5de0200dbe3b6882

Workflow run logs.

@piyushredpanda
Copy link
Contributor

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (critical check f.available() has failed) in test_concurrent_append_flush
4 participants