[v23.2.x] cloud_storage: use remote index in cloud timequery #13105

VladLazar · 2023-08-30T10:05:24Z

Backport of #13011

Release Notes

Improvements

Timequeries (i.e ListOffsets requests) that land in the cloud log now use an index to speed up the search
and reduce the number of hydrated bytes required to serve the query. On average, a time query will have to download 4 times less data (if using the default segment and chunk size)

A lookup by timestamp is added to the remote index. It has the same semantics as the other lookup methods. If the index does not include the time index (i.e. it was created with serde version 1), the search comes up empty. (cherry picked from commit c1d0314)

This commit updates the timequery read path to skip up to the first index entry with a timestamp smaller than the searched one. The result is that timequeries will hydrate/materialize a maximum of two chunks (two because the chunk boundaries don't always line up with the entries in the index). If the index is not present, or was originally created in v1, the search starts from the first chunk as it did previously. (cherry picked from commit da8533b)

A shard-level metric is added to track the total number of chunks that were hydrated (i.e. downloaded). (cherry picked from commit b78d711)

This commit makes a couple of changes to the timequery tests: 1. Run timequery on more offests (10 for each of the 12 segments) 2. Check that a maximum of two chunks are downloaded by any given timequery 3. Use the admin api to get the precise boundary between the cloud and local log. Previously, it was estimated based on record size. (cherry picked from commit 6d507cd)

This commit refactors the handling of the index search result. If a new result type is introduced, it's author will be reminded to handle it by the assertions. (cherry picked from commit de5aa53)

VladLazar · 2023-08-30T14:44:53Z

Will merge after #13103 merges as that one is big and would be annoying to rebase.

VladLazar · 2023-08-30T16:14:37Z

/ci-repeat

VladLazar · 2023-08-30T16:14:55Z

Triggered ci-repeat for a rebase

Vlad Lazar added 5 commits August 30, 2023 10:58

cloud_storage: add a metric for chunk hydrations

8b29096

A shard-level metric is added to track the total number of chunks that were hydrated (i.e. downloaded). (cherry picked from commit b78d711)

cloud_storage: visit index search result

9bdf80f

This commit refactors the handling of the index search result. If a new result type is introduced, it's author will be reminded to handle it by the assertions. (cherry picked from commit de5aa53)

github-actions bot added the area/redpanda label Aug 30, 2023

andijcr approved these changes Aug 30, 2023

View reviewed changes

VladLazar merged commit 0f28eb9 into redpanda-data:v23.2.x Aug 31, 2023

BenPope added this to the v23.2.8 milestone Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v23.2.x] cloud_storage: use remote index in cloud timequery #13105

[v23.2.x] cloud_storage: use remote index in cloud timequery #13105

VladLazar commented Aug 30, 2023 •

edited

Loading

VladLazar commented Aug 30, 2023

VladLazar commented Aug 30, 2023

VladLazar commented Aug 30, 2023

[v23.2.x] cloud_storage: use remote index in cloud timequery #13105

[v23.2.x] cloud_storage: use remote index in cloud timequery #13105

Conversation

VladLazar commented Aug 30, 2023 • edited Loading

Release Notes

Improvements

VladLazar commented Aug 30, 2023

VladLazar commented Aug 30, 2023

VladLazar commented Aug 30, 2023

VladLazar commented Aug 30, 2023 •

edited

Loading