-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributing sum queries #1878
Distributing sum queries #1878
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really exciting, @owen-d! I've only had time to do a really quick pass with only a few questions. I still have review the astmapper as a whole which I'll do in a second pass soon.
66b3dd8
to
716bf53
Compare
0d64811
to
1cd3224
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, found some nits, but overall I think this is pretty much ready.
Once you've went through all feedbacks, I think we should get @pracucci to give a review.
This is a very big PR to review, and even when you know about the whole idea.
We might also want to add some metrics instrumentations to know how many requests has been // by shard sums, or even tag an existing metrics to compare shard vs non shard in the past, up to you. |
1cd3224
to
1a3be40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. The AST package seems to have bulk of the logic and looks solid. I have a few nits and questions.
Also was there a reason to move so many files into separate packages? In general, I agree the code base could use some refactoring. However, it makes it difficult to review the PR when there are so many non-related changes.
1c8be61
to
269b387
Compare
@owen-d can you rebase this on to master? The vendored version of prometheus recently changed and I think some variable names you use in astmapper need to be updated. |
9360cea
to
bd1e33b
Compare
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
fb6f97e
to
4c6f40b
Compare
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work is amazing @owen-d! I particularly appreciated the very detailed PR description and the extensive tests coverage. The implementation is also very clear, despite the intrinsic complexy 👏
I know you've heard this request a bunch of times, but I would be glad if you could rebase master
. In the last few days we've merged new integration tests and goimports
changes. I would like to make sure CI pass in the last of the recent changes too.
The query-frontend now requires a schema config to determine how/when to shard queries, either from a file or from flags (i.e. by the config-yaml CLI flag).
What's the impact of this requirement on projects using the query-frontend
standalone (ie. in front of Thanos)?
…rding-squashed Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Hey @pracucci, thanks for the kind words and the review. I've addressed all but one of your comments which I've pushed back on. Regarding
This is only required if one wishes to use the new optimizations. If no config file is specified, the configuration is loaded via flags, which by default adds a v1 schema. However, if a config file is specified, but it doesn't have any period configs, loading the frontend will now error. Should this case be handled differently or is this the right logic? Edit: I was mistaken. All looks good wrt config loading. |
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
b94c215
to
10dd597
Compare
This would be a problem both because of 3rd parties using the However, I can't find a way to reproduce it and even looking at the code it's not clear to me which specific change has introduced this "breaking change". Could you share more details, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally managed to put the time to review this. Sorry for the delay, this PR is really good! I mostly only have stylistic changes. You've been doing a bunch of those, but I hope the ones I have are the final ones.
Regarding the configs breaking change, I too couldn't find it. cfg.Schema.Load()
will return nil if there are no configs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given I believe there's no breaking change in the config file and the rest of changes LGTM, I'm happy to announce it's approved from my side 🎉 Again, good job!
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
d6253a9
to
65718db
Compare
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on the doc! I left few stylistic comments.
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed latest changes and LGTM! 🎉
What
This PR (squashed from #1730 for simplicity) introduces a new query-frontend flag,
querier.sum-shards
, which allows cortex to fan out and process queries in parallel where appropriate. In order for a query to be parallelizeable, the query itself must be parallelizeable and it must use a sharded schema (v10+).Sharding
The v10 schema introduced a shard factor which spreads series across n shards. This is a prerequisite for allowing us to query data from these shards in parallel.
Problem
Although the v10 schema change introduces a shard factor, all shards must still be processed on a single querier. This is compounded by the fact that query ASTs are not broken down before execution. Therefore, while sharding lets us split up and process data at it's source location, we still re-aggregate it and process it all in one place. The two goals of this PR are to fix these by splitting up child ASTs into
n
merge-able queries for individual execution, wheren = shard factor
.Details
Mapping ASTs
Firstly, we introduce a package called
astmapper
with the interface:This interface is used to map ASTs into different ASTs. We use this to turn i.e.
sum by (foo) (rate(bar{baz=”blip”}[1m]))
into approximatelyThere are more complex optimizations to our sum mapping, but this is a simplified example for the PR description.
This works largely because summing the sub-sums is equivalent. The principle should hold true for any merge operation that is associative, but sums are the initial focus.
Configuration
querier.sum-shard
(bool)This flag enables shard summing behavior
Schema config on frontend
The query-frontend now requires a schema config to determine how/when to shard queries, either from a file or from flags (i.e. by the
config-yaml
CLI flag). This is the same schema config the queriers consume. Queries are currently shardable when they only span a single shard-enabled schema definition (v10+) although this could be improved in later work.Concurrency controls
It's also advised to increase downstream concurrency controls as well to account for more queries of smaller sizes:
querier.max-outstanding-requests-per-tenant
querier.max-query-parallelism
querier.max-concurrent
server.grpc-max-concurrent-streams
It is also worth noting that instrumentation (traces) also scale with the number of sharded queries. We've had success scaling up our
JAEGER_REPORTER_MAX_QUEUE_SIZE
in order to handle the new volume.Caveats
Work distribution
With query sharding enabled, the query-frontends do more work than previously. This runs afoul of the fact that they aren't horizontally scalable yet (per tenant queues are per process). Therefore, it's highly suggested to vertically scale them compared to before. In the future, this can be mitigated by the following:
parallelization vs ast delegation
In addition to parallelizing sum operations, this PR still distributes non-parallelized operations to downstream queriers. In the following example, we parallelize the
sum(rate...))
and distribute the largest possible child ast (thesum(histogram_quantile...))
in this case) to a querier.->
shard
sum(rate(selector[1m]))
and delegatesum(histogram_quantile(0.5, rate(selector[1m])))
to a querier.Ideas for further improvements:
Parallelize merge operations
This PR only parallelizes sums, but it should be possible for any associative merge operation. Some examples could be:
avg(selector)
->sum(selector)/count(selector)
, which can be parallelized in turn by their shard factorsrate(selector[1m])
->rate(selector{shard=0}[1m]) + rate(selector{shard=1}[1m])
pipe dream: recursive parallelization
Theoretically, queriers could perform a single aggregation and parallelize/recursively delegate to other queriers. It would likely require a planner/scheduler and could be extended to support intermediate caching. However, it is beyond the scope of this PR.
Benchmarks
Here are some benchmark results for different query types, shard factors, and artificial delay introduced (to simulate downstream executions):
Tooling:
I've also added a little tool,
query-audit
, which can run a series of queries against multiple prometheus backends and compare the results (I can add make targets if requested). I've used this to empirically test differences in query results across sharded and non-shard enabled backends. Output is below.