Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sorted queries do not produce sorted results for shardable queries #6125

Merged
merged 9 commits into from
Feb 23, 2023

Conversation

harry671003
Copy link
Contributor

@harry671003 harry671003 commented Feb 14, 2023

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • This is my first PR in Thanos. Any feedback is appreciated.
  • When the query contains sort or sort_desc functions, the final results must be sorted by samples (instead of labels)
  • The MergeResponse() interface now takes the request as an argument in addition to the responses.
  • In MergeResponse() the query string from the request is parsed and analyzed for sort and sort_desc functions. If they're present the final merged response will be sorted by samples.
  • Fixes: query frontend: sorted queries do not produce sorted results for shardable queries #6059

Final Version

  • For topk and bottomk, there is no need to sort the results. The shards are already sorted. Hence, just merging the response will be good.
  • For sort and sort_desc the results will be sorted by values.
  • In all other cases, the results will be sorted by labels.

Verification

  • Added unit tests.

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some nits only. Thanks for tackling it.

pkg/queryfrontend/queryinstant_codec.go Outdated Show resolved Hide resolved
pkg/queryfrontend/queryinstant_codec.go Outdated Show resolved Hide resolved
pkg/queryfrontend/queryinstant_codec.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
yeya24
yeya24 previously approved these changes Feb 14, 2023
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Only one last comment.

if n.Func != nil {
if n.Func.Name == "sort" {
sortAsc = true
return errors.New("done")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a global variable for such kind of error? So that we don't have to allocate the struct every time.

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
yeya24
yeya24 previously approved these changes Feb 15, 2023
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @fpetkovski Wanna take another look?

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just one last question from my side.

var sortAsc bool = false
var sortDesc bool = false
done := errors.New("done")
promqlparser.Inspect(expr, func(n promqlparser.Node, _ []promqlparser.Node) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to traverse the entire AST here, or can we just check if expr is sort or sort_desc?

Copy link
Contributor

@yeya24 yeya24 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we need to traverse the AST until we hit first sort or sort_desc.
Tried query sort(prometheus_http_requests_total) + sort_desc(prometheus_http_requests_total) here and it gave me sort asc results.

Copy link
Contributor

@fpetkovski fpetkovski Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just check the root expression and decide how to sort. You example query is a binary expression and I think we need to sort the end result by labels, not by values. Worth double checking through.

Copy link
Contributor Author

@harry671003 harry671003 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the below cases Prometheus sorts the final result by value:

  • 2 + sort (sum by (instance, code) (prometheus_http_requests_total))
  • sort (sum by (instance, code) (prometheus_http_requests_total)) + sort _desc(sum by (instance, code) (prometheus_http_requests_total))

But topk and bottomk will break the logic:

  • topk (3, sort(sum by (instance, code) (prometheus_http_requests_total))) shouldn't be sorted in ascending order.
  • bottomk (3, sort_desc(sum by (instance, code) (prometheus_http_requests_total))) shouldn't be sorted in descending order.
  • Here, topk and bottomk decides the final sort order.

I think we need to re-sort the final results for topk and bottomk functions also in the query-frontend.
When walking the AST, the first of topk, bottomk, sort, sort_desc should decide the sort order.

Copy link
Contributor

@yeya24 yeya24 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You example query is a binary expression and I think we need to sort the end result by labels, not by values. Worth double checking through.

Just curious, is this a documented behavior? Like we should sort by labels when it is X.

I think in current promql implementation it doesn't sort results by labels at all in this case. I also doubt if it really sort the final results by the first sort or sort_desc. But looks like it is, seems a binary expression just sum up the two exprs without changing any order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For operations like topk by (label), the sorting is only maintained within a group, and not globally. So values from one group appear together in Prometheus

The current sort by series labels way produces incorrect order for topk. But it sounds very hard to maintain the order within the group in our final results after merge. Shall we just skip sharding in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to summarize what I understood from the discussion above:

  • For sort and sort_desc we want to continue to shard the query and sort the results at the end.
  • For topk and bottomk we want to disable sharding because sorting at the end will break topk by (label).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, for topk and bottomk we don't need actually do anything because the shards themselves will already be sorted. Since topk/bottomk by does not require a global sort, it should be safe to keep sharding these expressions.

So I think we're good with this implementation :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more discussions with @yeya24, here's the final proposal we came up with that works mostly:

  • If the root expr is topk or bottomk, the response will be concatenated without a global sort.
  • If the AST contains sort or sort_desc, the response will be sorted by values.
  • Else, the response will be sorted by labels.

Examples:

# Query Merge mechanism Prometheus Link Remarks
1 1 + sort(sum by (code) (http_requests_total) ) Merge and then sort by values link Prometheus sorts by value.
2 sort(topk by (code) (10, http_requests_total)) Merge and then sort by values link Prometheus sorts by value.
3 topk by (code) (10, http_requests_total) Merge responses but don't sort link Prometheus doesn't sort.
4 topk(5, http_requests_total) by (code) + sort(http_requests_total) Merge and then sort by values link Prometheus doesn't sort.
5 sort(http_requests_total) + topk(5, http_requests_total) by (code) Merge and then sort by values link Prometheus sorts by value.

There is discrepancy for case 4 where Prometheus doesn't sort the responses but Thanos will sort by value.

I believe the above approach will give reasonable compatibility with Prometheus without increasing the complexity in the merging logic in the frontend.

@fpetkovski @yeya24 - Please let me know if this approach makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the breakdown, this is what we needed and it makes sense to me 👍

In the AST analysis for case 2. we can also look at the root expression, and if it's a binary one we can look only at the LHS and RHS but not any deeper than that. Not a blocker though, either approach would works close enough.

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
fpetkovski
fpetkovski previously approved these changes Feb 23, 2023
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks!

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

CHANGELOG.md Outdated
@@ -34,6 +34,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6103](https://github.com/thanos-io/thanos/pull/6103) Mixins(Rule): Fix query for long rule evaluations.
- [#6121](https://github.com/thanos-io/thanos/pull/6121) Receive: Deduplicate metamonitoring queries.
- [#6137](https://github.com/thanos-io/thanos/pull/6137) Downsample: Repair of non-empty XOR chunks during 1h downsampling.
- [#6125](https://github.com/thanos-io/thanos/pull/6125) Query Frontend: Fix sorted queries do not produce sorted results for shardable queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention the function related in the changelog? sort, sort_desc, topk, bottomk

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

query frontend: sorted queries do not produce sorted results for shardable queries
3 participants