cloud_storage: more efficient timequery #11801

jcsp · 2023-06-30T12:27:40Z

Time queries currently scan from the start of a segment, even when doing chunked reads. This makes them an order of magnitude more expensive in some cases, compared with if they just read a single chunk.

This can be fixed by using the timestamps in the remote index that were added for 23.2.

However, the cost of doing a timequery will still be one chunk promotion per partition, which is very expensive if you've got e.g. 1000 partitions in a topic: 16GiB of downloads + disk writes to serve one ListOffsets.

It may also make sense to override the chunk size to something smaller when the reader is created for a time query, and/or to implement a "fuzzy mode" for time queries to tiered storage, and just give the caller the nearest indexed offset rather than a precise answer: many applications will not care if they see a few messages before the timestamp they requested, as long as they see all the timestamps after it. This is already the behavior for compressed batches, where we return the offset of the start of the batch rather than the exact record result.

This is a prerequisite to make timequeries seek to the proper chunk, instead of reading from the start of segments. The timestamp we care about is the batch max timestamp, because timequeries will want to find a batch that is definitely before the target timestamp, to start their scan from there. Related: redpanda-data#11801

dotnwat · 2023-10-02T01:52:08Z

@VladLazar @Lazin is this different than #13125? In that issue one of the observed behaviors is "Observe high delays or missing timestamps" which sounds pretty similar to "more efficient timequery".

VladLazar · 2023-10-02T07:55:32Z

@VladLazar @Lazin is this different than #13125? In that issue one of the observed behaviors is "Observe high delays or missing timestamps" which sounds pretty similar to "more efficient timequery".

This issues (11801) was more generic than the one Evgeny created. The goal here was to hydrate less data in the general case. 13125 is more about what happens when the timestamps are bad (either due to config batches or user provided ts skew), so it's a more specific case. We had the exact same issues when improving the local storage timequery a while back.

jcsp added kind/enhance New feature or request area/cloud-storage Shadow indexing subsystem labels Jun 30, 2023

jcsp mentioned this issue Jun 30, 2023

cloud_storage: add timestamps to remote segment index #11802

Merged

7 tasks

VladLazar mentioned this issue Aug 25, 2023

cloud_storage: use remote index in cloud timequery #13011

Merged

7 tasks

piyushredpanda closed this as completed in #13011 Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud_storage: more efficient timequery #11801

cloud_storage: more efficient timequery #11801

jcsp commented Jun 30, 2023 •

edited

Loading

dotnwat commented Oct 2, 2023

VladLazar commented Oct 2, 2023

cloud_storage: more efficient timequery #11801

cloud_storage: more efficient timequery #11801

Comments

jcsp commented Jun 30, 2023 • edited Loading

dotnwat commented Oct 2, 2023

VladLazar commented Oct 2, 2023

jcsp commented Jun 30, 2023 •

edited

Loading