-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v24.1.x] cst: manual backport of chunk download changes PR 18278 #18854
Merged
abhijat
merged 8 commits into
redpanda-data:v24.1.x
from
abhijat:backport-pr-18278-v24.1.x
Jun 10, 2024
Merged
[v24.1.x] cst: manual backport of chunk download changes PR 18278 #18854
abhijat
merged 8 commits into
redpanda-data:v24.1.x
from
abhijat:backport-pr-18278-v24.1.x
Jun 10, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Chunk waiters are stored in a list which is swapped before processing. Because chunk downloads take time, it is possible that new waiters may be registered while some downloads are being processed. The remote segment may receive a stop request during this interval. If this happens, the hydration loop will shut down, not processing the remaining waiters, and the waiters will prevent gate closure of remote segment, causing a hang. The workaround cancels pending downloads while exiting the background loop. (cherry picked from commit b06281e)
The chunk API is now structured around a background loop similar to the remote segment. When a download request for a chunk arrives, it is added to the wait list for the chunk. Then a condvar is triggered, which wakes up the bg loop and request is issued to remote segment to download the chunk. Once the chunk is downloaded, all waiters are notified, which are also notified when there is an error during download. (cherry picked from commit e0d4fd7)
The prefetch logic is changed. Instead of downloading a single byte range for chunk + prefetch and splitting them while writing to disk, we now schedule separate API calls per chunk. This enables the consumer to get unblocked faster as soon as the first chunk is downloaded. (cherry picked from commit 7bfb6b5)
A new test is added where some chunk downloads fail and the assertion checks if the results are correctly reflected. The supporting tooling to fail requests is also added to s3 imposter. (cherry picked from commit 121e4e4)
Three new tests are added which test the chunk hydration process while under disruptive actions. Because the bg loop structure does not yield itself to unit testing easily, we use disruptive actions to test code paths within the download loop. The following scenarios are added: * Abruptly stop remote segment during pending downloads to ensure there are no hangs and exceptions are reported correctly. * Delete random chunk files while a large number of random chunk hydrations are in progress. This emulates cache eviction and forces re-hydration. * Randomly fail HTTP requests either with retryable or non retryable errors. (cherry picked from commit 1f8efb8)
(cherry picked from commit 0b8cf80)
Even if the test code fails, the background thread must be stopped to avoid a hung test. (cherry picked from commit dc87016)
(cherry picked from commit fd66184)
andrwng
approved these changes
Jun 7, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a manual backport of PR #18278. The conflicts were isolated to
remote_segment_test.cc
which was refactored in both the original PR and had a randomized test bucket name added.Backports Required
Release Notes