-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging slices from the labels APIs using more than 1 cpu and k-way #5785
Conversation
…r tree Signed-off-by: Alan Protasio <alanprot@gmail.com>
d0143fc
to
1342145
Compare
Signed-off-by: Alan Protasio <alanprot@gmail.com>
|
||
// mergeSlicesParallelism is a constant of how much go routines we should use to merge slices, and | ||
// it was based on empirical observation: See BenchmarkMergeSlicesParallel | ||
mergeSlicesParallelism = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am fine with it. Just one question, what if we make batch size a constant and get parallelism based on the input size? Then for small input we can still use 1 core and we use more cores for larger batch size.
We can still cap the concurrency at 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda done inside of the function:
p := min(parallelism, len(a)/2)
So, we will only use parallelism if the input is > 4 (and we will use only 2 cores in this case)
Increasing parallelism to > 8 i think does not make much difference as the last merge will be using one core anyway.. so we are at the end of the day increasing the number of slices being merged at the end of the function.
make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the explanation. Input length is basically the same as number of ingesters for the user. Then if the user has at least 16 ingesters, then we will use 8 parallelism for it?
16 ingesters sounds a small number to me. I am worried about having very small batch sizes per goroutine. What about using a larger x below like 16 or even 32? Is it better than 2 or you think it doesn't make much difference
p := min(parallelism, len(a)/x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case both solutions will be pretty fast.. but lemme create a benchmak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description updated with 16 ingesters and test case added.
WDYT?
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
What this PR does:
This PR changes the strategy we use to merge sorted slices containing the labels values/keys on the GetLabels and GetLabelValues APIs.
Before we were getting the response from each ingesters, adding in a map and sorting at the end to dedup/sort the response.
Now we are using Loser Tree for merge (same as used on prometheus/prometheus#12878) and also using more than one core to merge those slices.
Benchmark from the GetLabelsValues APIs with different duplication factors (Ex: 0.67 duplication ratio means that the resulting slice will have 33% of the sum of the strings on the input slices):
Benchmark with the merge implementation in isolation:
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]