-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (critical check f.available() has failed) in test_concurrent_append_flush
#13035
Comments
can't reproduce locally, but it's kind of a fuzz test so it makes sense that the behavior is difficult to reproduce. the comment in the test states:
so it might be a symptom of an underlying bug in segment_appender (the source of these futures). will include further logging to generate a bit more context around the failure. update: also a check failed:
so this error makes a bit more sense |
Do we not have a fixed seed for the fuzz? (I am assuming that fuzz is driven off a seed for reproducibility) |
from https://buildkite.com/redpanda/redpanda/builds/37956#018ae10e-3932-417e-964d-8b4bf1aa5355
|
@travisdowns what do you think, is this failure a sev/high, since it's in the context of writes? |
@andijcr - it looks potentially scary since it's on the write path, high might be a good tag until we've diagnosed it a bit more? |
@piyushredpanda wrote:
Currently we are not using a fixed seed, but it's a good idea (or at least output the seed). H/e fixed seed doesn't guarantee too much here as the behavior of the underlying segment appender is timing dependent anyway. |
So the underlying problem is the assumption that when a First,
An easy way this can happen is if the inactive segment timer fires: this will dispatch a write of the current chunk without subsequently flushing it. So the conditions above will be met. However, that's not what happens here: there is actually a write+flush in progress when the So flush Then, the This situation probably arises at other times during the fuzz test but goes undetected as we only check this condition at the end of the test when we inspect the returned futures. To fix this we can just remove this check, as its assumption about |
I will push a fix for this today. |
run_concurrent_append_flush is a fuzzer-like test and we may have hard-to-diagnose failures there (e.g., see issue redpanda-data#13035) and to help diagnose it we want to capture some information from the segment_appender at each step of the test. Introduce segment_appender_info to do this.
Relates to log_segment_appender_test::test_concurrent_append_flush, which is a fuzzer-style test, and output it when we fail. In storage_single_thread_rpunit concurrent flush test we now log test context which will be printed if the test fails. Critically this includes the seem used to generate the random series of actions to be performed on the appender. In addition we generate a single seed per invocation and then use that seed rather than the random helper methods which use an unspecified random seed each time. Finally we record more information about the operations performed in test and output the full action sequence on failure. Issue redpanda-data#13035.
In test_concurrent_append_flush, which is a fuzzer style test, we now get() all futures returned by flush calls during the fuzz portion, instead of only the last flush. It is possible in some cases for prior futures to be unavailable even after the last future has resolved which caused occasional CI failures. See 13035 for more analysis. Fixes redpanda-data#13035.
Relates to log_segment_appender_test::test_concurrent_append_flush, which is a fuzzer-style test, and output it when we fail. In storage_single_thread_rpunit concurrent flush test we now log test context which will be printed if the test fails. Critically this includes the seem used to generate the random series of actions to be performed on the appender. In addition we generate a single seed per invocation and then use that seed rather than the random helper methods which use an unspecified random seed each time. Finally we record more information about the operations performed in test and output the full action sequence on failure. Issue redpanda-data#13035.
In test_concurrent_append_flush, which is a fuzzer style test, we now get() all futures returned by flush calls during the fuzz portion, instead of only the last flush. It is possible in some cases for prior futures to be unavailable even after the last future has resolved which caused occasional CI failures. See 13035 for more analysis. Fixes redpanda-data#13035.
run_concurrent_append_flush is a fuzzer-like test and we may have hard-to-diagnose failures there (e.g., see issue redpanda-data#13035) and to help diagnose it we want to capture some information from the segment_appender at each step of the test. Introduce segment_appender_info to do this.
Relates to log_segment_appender_test::test_concurrent_append_flush, which is a fuzzer-style test, and output it when we fail. In storage_single_thread_rpunit concurrent flush test we now log test context which will be printed if the test fails. Critically this includes the seem used to generate the random series of actions to be performed on the appender. In addition we generate a single seed per invocation and then use that seed rather than the random helper methods which use an unspecified random seed each time. Finally we record more information about the operations performed in test and output the full action sequence on failure. Issue redpanda-data#13035.
In test_concurrent_append_flush, which is a fuzzer style test, we now get() all futures returned by flush calls during the fuzz portion, instead of only the last flush. It is possible in some cases for prior futures to be unavailable even after the last future has resolved which caused occasional CI failures. See 13035 for more analysis. Fixes redpanda-data#13035.
Relates to log_segment_appender_test::test_concurrent_append_flush, which is a fuzzer-style test, and output it when we fail. In storage_single_thread_rpunit concurrent flush test we now log test context which will be printed if the test fails. Critically this includes the seem used to generate the random series of actions to be performed on the appender. In addition we generate a single seed per invocation and then use that seed rather than the random helper methods which use an unspecified random seed each time. Finally we record more information about the operations performed in test and output the full action sequence on failure. Issue redpanda-data#13035.
In test_concurrent_append_flush, which is a fuzzer style test, we now get() all futures returned by flush calls during the fuzz portion, instead of only the last flush. It is possible in some cases for prior futures to be unavailable even after the last future has resolved which caused occasional CI failures. See 13035 for more analysis. Fixes redpanda-data#13035.
run_concurrent_append_flush is a fuzzer-like test and we may have hard-to-diagnose failures there (e.g., see issue redpanda-data#13035) and to help diagnose it we want to capture some information from the segment_appender at each step of the test. Introduce segment_appender_info to do this. (cherry picked from commit 4e4a1e3)
Relates to log_segment_appender_test::test_concurrent_append_flush, which is a fuzzer-style test, and output it when we fail. In storage_single_thread_rpunit concurrent flush test we now log test context which will be printed if the test fails. Critically this includes the seem used to generate the random series of actions to be performed on the appender. In addition we generate a single seed per invocation and then use that seed rather than the random helper methods which use an unspecified random seed each time. Finally we record more information about the operations performed in test and output the full action sequence on failure. Issue redpanda-data#13035. (cherry picked from commit b02c28c)
In test_concurrent_append_flush, which is a fuzzer style test, we now get() all futures returned by flush calls during the fuzz portion, instead of only the last flush. It is possible in some cases for prior futures to be unavailable even after the last future has resolved which caused occasional CI failures. See 13035 for more analysis. Fixes redpanda-data#13035. (cherry picked from commit cc82d0d)
The
test_concurrent_append_flush
test from thestorage_single_thread_rpunit
has failed.https://buildkite.com/redpanda/redpanda/builds/35783#018a3818-bc30-40c6-8b1e-d45bc58308b6/6-5451
The error message:
The text was updated successfully, but these errors were encountered: