[v24.1.x] cst: manual backport of chunk download changes PR 18278 #18854

abhijat · 2024-06-07T11:31:01Z

This is a manual backport of PR #18278. The conflicts were isolated to remote_segment_test.cc which was refactored in both the original PR and had a randomized test bucket name added.

Backports Required

Release Notes

none

Chunk waiters are stored in a list which is swapped before processing. Because chunk downloads take time, it is possible that new waiters may be registered while some downloads are being processed. The remote segment may receive a stop request during this interval. If this happens, the hydration loop will shut down, not processing the remaining waiters, and the waiters will prevent gate closure of remote segment, causing a hang. The workaround cancels pending downloads while exiting the background loop. (cherry picked from commit b06281e)

The chunk API is now structured around a background loop similar to the remote segment. When a download request for a chunk arrives, it is added to the wait list for the chunk. Then a condvar is triggered, which wakes up the bg loop and request is issued to remote segment to download the chunk. Once the chunk is downloaded, all waiters are notified, which are also notified when there is an error during download. (cherry picked from commit e0d4fd7)

The prefetch logic is changed. Instead of downloading a single byte range for chunk + prefetch and splitting them while writing to disk, we now schedule separate API calls per chunk. This enables the consumer to get unblocked faster as soon as the first chunk is downloaded. (cherry picked from commit 7bfb6b5)

A new test is added where some chunk downloads fail and the assertion checks if the results are correctly reflected. The supporting tooling to fail requests is also added to s3 imposter. (cherry picked from commit 121e4e4)

Three new tests are added which test the chunk hydration process while under disruptive actions. Because the bg loop structure does not yield itself to unit testing easily, we use disruptive actions to test code paths within the download loop. The following scenarios are added: * Abruptly stop remote segment during pending downloads to ensure there are no hangs and exceptions are reported correctly. * Delete random chunk files while a large number of random chunk hydrations are in progress. This emulates cache eviction and forces re-hydration. * Randomly fail HTTP requests either with retryable or non retryable errors. (cherry picked from commit 1f8efb8)

(cherry picked from commit 0b8cf80)

Even if the test code fails, the background thread must be stopped to avoid a hung test. (cherry picked from commit dc87016)

(cherry picked from commit fd66184)

abhijat added 8 commits June 7, 2024 10:36

cst/tests: Add test where chunk downloads fail

9e02620

A new test is added where some chunk downloads fail and the assertion checks if the results are correctly reflected. The supporting tooling to fail requests is also added to s3 imposter. (cherry picked from commit 121e4e4)

cst: Remove unused code

7cfcec8

(cherry picked from commit 0b8cf80)

cst/ducktape: Make sure background thread stops

9276143

Even if the test code fails, the background thread must be stopped to avoid a hung test. (cherry picked from commit dc87016)

cst: Add clarifying comment about chunk map

64acab8

(cherry picked from commit fd66184)

github-actions bot added the area/redpanda label Jun 7, 2024

abhijat requested review from andrwng and dotnwat June 7, 2024 14:58

andrwng approved these changes Jun 7, 2024

View reviewed changes

abhijat merged commit 5cfe6b8 into redpanda-data:v24.1.x Jun 10, 2024
19 checks passed

piyushredpanda mentioned this pull request Jun 14, 2024

[v24.1.x] [CORE-2794]: cst: decouple chunk download from consumers #18713

Closed

BenPope added this to the v24.1.8 milestone Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v24.1.x] cst: manual backport of chunk download changes PR 18278 #18854

[v24.1.x] cst: manual backport of chunk download changes PR 18278 #18854

abhijat commented Jun 7, 2024

[v24.1.x] cst: manual backport of chunk download changes PR 18278 #18854

[v24.1.x] cst: manual backport of chunk download changes PR 18278 #18854

Conversation

abhijat commented Jun 7, 2024

Backports Required

Release Notes