store: add streamed postings reading #6340

GiedriusS · 2023-05-05T14:42:29Z

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

readIndexRange dominates the profiles here so let's stream reading postings into index.Postings instead of allocating everything at once.

Work in progress.

Verification

Existing + ad-hoc tests.

`readIndexRange` dominates the profiles here so let's stream reading postings into `index.Postings` instead of allocating everything at once. Work in progress. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

yeya24 · 2023-06-02T06:26:09Z

Hi @GiedriusS, we are interested in this feature as we saw same high heap usage issue recently of this part of code.
Do we still plan to work on this or we want to approach this problem in a different way? We are happy to collaborate if you are willing to

yeya24 · 2023-06-02T18:17:41Z

pkg/store/bucket.go

@@ -2416,142 +2415,42 @@ func (r *bucketIndexReader) fetchPostings(ctx context.Context, keys []labels.Lab

 		// Fetch from object storage concurrently and update stats and posting list.
 		g.Go(func() error {
-			begin := time.Now()
+			for _, p := range ptrs[i:j] {
+				ir, err := r.block.bkt.GetRange(ctx, r.block.indexFilename(), p.ptr.Start, p.ptr.End-p.ptr.Start)


That means we have to send multiple requests to objstore while current logic is sending once per part?

Are we able to still send 1 request per part and create posting reader from the get range reader?
If this is not doable, I feel it is better to maybe just download postings to disk

Maybe more get range requests won't impact performance, hopefully we can have some datapoints to understand the impact.

Maybe we could add this under a feature flag? if it is enabled then we would send multiple requests which would mean bigger costs if using some SaaS that charges per-request but we would get constant RAM usage

yeya24 · 2023-06-13T05:32:24Z

pkg/store/streamed_postings.go

+		statsMtx: statsMtx,
+	}
+
+	postingsCount, err := getInt32(bktReader, r.readBuf[:0])


Should be r.readBuf, not r.readBuf[:0]?
Tested r.readBuf[:0] cannot read data cause the buffer size is 0.

yeya24 · 2023-06-13T05:32:42Z

pkg/store/streamed_postings.go

+}
+
+func getInt32(r io.Reader, buf []byte) (uint32, error) {
+	read, err := r.Read(buf[:0])


Same here buf[:0] needs to be buf

GiedriusS · 2023-06-13T16:05:44Z

@yeya24 #6442 here's a similar optimization that is based on this code and our discussions on Slack that is actually feasible. #6442 is unfinished but WDYT about such optimization?

yeya24 · 2023-06-13T16:27:36Z

@GiedriusS IIUC #6442 will supercede this pr? Or will we do both?

fpetkovski · 2023-06-13T16:32:18Z

What about the roaring bitmap approach? I feel like that one is superior since we can aggregate postings in a streaming manner. I don't know if making one or more requests per posting is sustainable at scale.

@yeya24 also had an idea to calculate the intersection by merging one (or a controlled number) of postings at a time, instead of maxing out the fanout.

yeya24 · 2023-06-13T16:44:00Z

also had an idea to calculate the intersection by merging one (or a controlled number) of postings at a time, instead of maxing out the fanout.

Yeah I am thinking about the same. We know the postings length so we can just sort and start from the two smallest size of postings. Need some benchmarks though.

I don't know if making one or more requests per posting is sustainable at scale.

#6442 seems not doing this anymore. It is still one request per part.

pull-request-size bot added the size/L label May 5, 2023

GiedriusS force-pushed the add_streamed_postings_reader branch 9 times, most recently from fb0895f to 1ee9043 Compare May 10, 2023 12:02

store: add streamed postings reading

2d10d0f

`readIndexRange` dominates the profiles here so let's stream reading postings into `index.Postings` instead of allocating everything at once. Work in progress. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

GiedriusS force-pushed the add_streamed_postings_reader branch from 1ee9043 to 2d10d0f Compare May 10, 2023 13:28

yeya24 marked this pull request as ready for review June 1, 2023 22:23

yeya24 marked this pull request as draft June 2, 2023 06:24

yeya24 reviewed Jun 2, 2023

View reviewed changes

yeya24 reviewed Jun 13, 2023

View reviewed changes

GiedriusS closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: add streamed postings reading #6340

store: add streamed postings reading #6340

GiedriusS commented May 5, 2023

yeya24 commented Jun 2, 2023

yeya24 Jun 2, 2023

yeya24 Jun 11, 2023

yeya24 Jun 11, 2023

GiedriusS Jun 12, 2023

yeya24 Jun 13, 2023

yeya24 Jun 13, 2023

GiedriusS commented Jun 13, 2023

yeya24 commented Jun 13, 2023

fpetkovski commented Jun 13, 2023 •

edited

Loading

yeya24 commented Jun 13, 2023 •

edited

Loading

store: add streamed postings reading #6340

store: add streamed postings reading #6340

Conversation

GiedriusS commented May 5, 2023

Changes

Verification

yeya24 commented Jun 2, 2023

yeya24 Jun 2, 2023

Choose a reason for hiding this comment

yeya24 Jun 11, 2023

Choose a reason for hiding this comment

yeya24 Jun 11, 2023

Choose a reason for hiding this comment

GiedriusS Jun 12, 2023

Choose a reason for hiding this comment

yeya24 Jun 13, 2023

Choose a reason for hiding this comment

yeya24 Jun 13, 2023

Choose a reason for hiding this comment

GiedriusS commented Jun 13, 2023

yeya24 commented Jun 13, 2023

fpetkovski commented Jun 13, 2023 • edited Loading

yeya24 commented Jun 13, 2023 • edited Loading

fpetkovski commented Jun 13, 2023 •

edited

Loading

yeya24 commented Jun 13, 2023 •

edited

Loading