-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store gateway overfetches chunks and series data #6421
Comments
Another thing to note is that, series size usually varies depending on block range. Thanos can have large blocks with 14d time range while in Cortex we usually use 1d block range. |
Though we have #6426, it is still not perfect as we need to set a static value. What I am thinking now is to include the chunk size and series size stats as part of the The value we need is |
I think we can close this one with the latest changes on meta file |
I think the issue of data overfetching is still unresolved. The gap size might be too big in some situation. Especially if the requested data is relatively sparse, then we are overfetching a lot. Let's say we want to fetch [10, 100] and [500KB, 501KB] for the first chunk. In this case, partitioner will let us fetch [10, 501KB] range but we can see about 500KB data will be discarded because this part of data doesn't contain data we need. A way to reduce overfetching is to reduce partitioner gap size. Let's say if gap size is 256KB then the two ranges won't be merged together. The issue of a smaller gap size is more requests going to objstore. Things are worse with cashing bucket. Cashing bucket caches subranges with 16KiB as the subrange size. I think the 500KB data in the middleware will also be cached in memcached somehow, though maybe we never need to read them. |
Is your proposal related to a problem?
Cortex uses Thanos Store Gateway for querying data on S3. With #6352, we are able to explore data fetched in store gateway. Here are some examples,
What I found is that, Store Gateway usually over fetched chunks and series. For the first example, total downloaded size is 16748 and SG fetched 16KB chunks. However, the actual chunk data touched is only 164. The over fetched data is just discarded afterwards. Same problem exists in the second log line, the actual chunk data touched is only 2.7M but SG fetched 66M chunks data.
The issue here is that there is no way to know how big a chunk/series is so store gateway tries to do estimate the size. https://github.com/thanos-io/thanos/blob/main/pkg/store/bucket.go#L77 The estimated chunk size is 16K and estimated series size is 64KB. This value might make sense in some situations, but in our real production block, the size is much lower than the limit, which means we are wasting a lot of resources fetching unused data.
Describe the solution you'd like
There are several ways to solve this:
GatherIndexHealthStats
will check the index file and collect some stats. Currently it collects average, min and max chunk size. I think we can collect series size as well. Collected stats can be included into themeta.json
file or another file associated with that block in the object storage. During the query time, max series and max chunk size can be loaded and we can use these size accordingly.The text was updated successfully, but these errors were encountered: