Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] sort, topk, bottomk fixes for instant queries #2792

Merged
merged 21 commits into from
Nov 5, 2020
Merged

Conversation

soundvibe
Copy link
Collaborator

What this PR does / why we need it:

This PR follows the work started with #2756.
It implements sort functionality and additionally fixes the issues when NaN values are not returned for instant queries in some cases.
It would be enough to merge only this PR because it contains all the changes from #2756.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

topk and bottomk results for instant queries are the same except that they are now sorted.

Does this PR require updating code package or user-facing documentation?:


@soundvibe soundvibe added the area:query All issues pertaining to query label Oct 22, 2020
case linear.SortType, linear.SortDescType:
return nil, false, err
p, err = linear.NewSortOp(name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/query/block/meta.go Outdated Show resolved Hide resolved
src/query/block/meta.go Outdated Show resolved Hide resolved
@@ -33,17 +35,34 @@ type ValueIndexPair struct {
type lessFn func(ValueIndexPair, ValueIndexPair) bool

func maxHeapLess(i, j ValueIndexPair) bool {
if i.Val == j.Val {
if i.Val == j.Val || math.IsNaN(i.Val) && math.IsNaN(j.Val) {
Copy link
Collaborator

@arnikola arnikola Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These conditionals are very hard to read and understand, can you make them more clear and verbose?

func maxHeapLess(i, j ValueIndexPair) bool {	func maxHeapLess(i, j ValueIndexPair) bool {
	if math.IsNaN(i.Val) {
		if math.IsNaN(j.Val) {
			return i.Index > j.Index
		}
		
		return true
	}
	
	if math.IsNaN(j.Val) {
		return true
	}
	
	if i.Val == j.Val {
		return i.Index > j.Index
	}
	
	return i.Val < j.Val
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that having more code makes it more clear (especially in this case). I believe that logical and and or operators have their place and were added to programming languages for good reasons. I highly doubt that avoiding them makes code more clear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reviewer, it took a while to parse the logic as is; would recommend changing it to be more straightforward

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified them and added some documentation to make it more clear.

Comment on lines +111 to +117
valuesToSort := make([]valueAndMeta, len(values))
for i, value := range values {
valuesToSort[i] = valueAndMeta{
val: value,
seriesMeta: seriesMetas[i],
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: safer to use append here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

@soundvibe soundvibe Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are relying on matching index when setting from seriesMetas[i] so append would just hide this and make it less clear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's just we've seen issues before with forward compat where future changes to slice size caused out of index errors, while append will deal with this case

@arnikola arnikola removed their assignment Oct 26, 2020
equalsPairs now checks individual struct elements instead of their printed values.
Used append in place when indices are not needed.
Documented and extracted some comparison functions for better readability and clearness.
* master:
  [query] Improve precision for variance and stddev of equal values (#2799)
  Support dynamic namespace resolution for embedded coordinators (#2815)
  [dbnode] Add index regexp DFA compilation cache to avoid allocating DFAs for same expressions (#2814)
  [dbnode] Set default cache on retrieve to false, prepare testing with cache none (#2813)
  [tools] Output annotations as base64 (#2743)
  Add Reset Transformation (#2794)
  [large-tiles] Fix for a reverse index when querying downsampled namespace (#2808)
  [dbnode] Evict info files cache before index bootstrap in peers bootstrapper (#2802)
  [dbnode] Bump default filesystem persist rate limit (#2806)
  [coordinator] Add metrics and configurability of downsample and ingest writer worker pool (#2797)
  [dbnode] Fix concurrency granularity of seekerManager.UpdateOpenLease (#2790)
  [dbnode] Use correct import path for atomic library (#2801)
  Regenerate carbon automapping rules on namespaces changes (#2793)
  Enforce matching retention and index block size (#2783)
  [query] Add ClusterNamespacesWatcher and use to generate downsample automapper rules (#2782)
  [query][dbnode] Wire up etcd-backed Clusters implementation in coordinator (#2758)
  [query] Fix quantile() argument not being passed through (#2780)
  [query] Add "median" aggregation to Graphite aggregate() function (#2774)
  [r2] Ensure KeepOriginal is propagated when adding/reviving rules (#2796)
  [dbnode] Update default db read limits  (#2784)
* master:
  [query] count_values label naming fixes (#2805)
  [dbnode] Read only namespaces (#2803)
  Remove old code: m3nsch (#2822)
  Update issue templates
  Update issue templates
  [proto] Allow zero-alloc reuse of AggregatedMetric protobuf payloads (#2823)
  Remove old code: aggregator/handler/trafficcontrol (#2821)
  Remove old code: aggregation/quantile/tdigest (#2820)
  AddToReset config to change Add transforms to Reset transforms (#2817)
  [metrics/rules] Add test to assert that mixed-mode keepOriginal uses keepOriginal=true (#2819)
  [dbnode] Update return unfulfilled for corrupt commit log files default (#2807)
  [dbnode] Move eviction logic up before ns loop (#2812)
  [query] Add additional parser tests (#2811)
  [config] Remove deprecated configuration fields (#2771)
Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple of small nits

}

// Checks which value is less if first of them is NaN.
// Basically, we do not want to add NaNs to the heap when it has reached it's cap so this fn should be used to prevent this.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; compareNaNs maybe?

Comment on lines +111 to +117
valuesToSort := make([]valueAndMeta, len(values))
for i, value := range values {
valuesToSort[i] = valueAndMeta{
val: value,
seriesMeta: seriesMetas[i],
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's just we've seen issues before with forward compat where future changes to slice size caused out of index errors, while append will deal with this case

@arnikola arnikola removed their assignment Nov 4, 2020
* master:
  [read_index_segments] Always validate index segment checksums before reading/validating contents (#2835)
  [query] Return additional warnings in /query{,_range} endpoints (#2836)
  Add a check for seriesIter.Err (#2834)
  [tools] Add concurrent read_index_segments validation option (#2833)
  [query] Add non-ready namespaces to Clusters interface and use in /namespace/ready endpoint (#2828)
  [query] Tests for when function argument is an expression (#2809)
  [large-tiles] Additional metrics added (#2720)
  [query] Refactor groupByNodes and implement `nodes` arg in asPercent (#2816)
  [read_index_segments] Add log that outlines which segment being processed (#2825)
  [aggregator] Handle follower canLead() for not yet flushed windows (#2818)
@soundvibe soundvibe merged commit 4129dbb into master Nov 5, 2020
@soundvibe soundvibe deleted the sort branch November 5, 2020 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:query All issues pertaining to query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants