-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] sort, topk, bottomk fixes for instant queries #2792
Conversation
Small optimization in appendValuesInstant.
made valueAndMeta struct package private.
OrderedFlush() now sorts the results instead popping them one by one. FloatHeap won't be created if it's not used.
FloatHeap now supports adding NaN values. Uncommented topk, bottomk and sort compatibility tests (now they pass).
case linear.SortType, linear.SortDescType: | ||
return nil, false, err | ||
p, err = linear.NewSortOp(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/query/functions/utils/heap.go
Outdated
@@ -33,17 +35,34 @@ type ValueIndexPair struct { | |||
type lessFn func(ValueIndexPair, ValueIndexPair) bool | |||
|
|||
func maxHeapLess(i, j ValueIndexPair) bool { | |||
if i.Val == j.Val { | |||
if i.Val == j.Val || math.IsNaN(i.Val) && math.IsNaN(j.Val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These conditionals are very hard to read and understand, can you make them more clear and verbose?
func maxHeapLess(i, j ValueIndexPair) bool { func maxHeapLess(i, j ValueIndexPair) bool {
if math.IsNaN(i.Val) {
if math.IsNaN(j.Val) {
return i.Index > j.Index
}
return true
}
if math.IsNaN(j.Val) {
return true
}
if i.Val == j.Val {
return i.Index > j.Index
}
return i.Val < j.Val
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that having more code makes it more clear (especially in this case). I believe that logical and
and or
operators have their place and were added to programming languages for good reasons. I highly doubt that avoiding them makes code more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reviewer, it took a while to parse the logic as is; would recommend changing it to be more straightforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified them and added some documentation to make it more clear.
valuesToSort := make([]valueAndMeta, len(values)) | ||
for i, value := range values { | ||
valuesToSort[i] = valueAndMeta{ | ||
val: value, | ||
seriesMeta: seriesMetas[i], | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: safer to use append here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are relying on matching index when setting from seriesMetas[i]
so append
would just hide this and make it less clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's just we've seen issues before with forward compat where future changes to slice size caused out of index errors, while append will deal with this case
equalsPairs now checks individual struct elements instead of their printed values.
Used append in place when indices are not needed. Documented and extracted some comparison functions for better readability and clearness.
* master: [query] Improve precision for variance and stddev of equal values (#2799) Support dynamic namespace resolution for embedded coordinators (#2815) [dbnode] Add index regexp DFA compilation cache to avoid allocating DFAs for same expressions (#2814) [dbnode] Set default cache on retrieve to false, prepare testing with cache none (#2813) [tools] Output annotations as base64 (#2743) Add Reset Transformation (#2794) [large-tiles] Fix for a reverse index when querying downsampled namespace (#2808) [dbnode] Evict info files cache before index bootstrap in peers bootstrapper (#2802) [dbnode] Bump default filesystem persist rate limit (#2806) [coordinator] Add metrics and configurability of downsample and ingest writer worker pool (#2797) [dbnode] Fix concurrency granularity of seekerManager.UpdateOpenLease (#2790) [dbnode] Use correct import path for atomic library (#2801) Regenerate carbon automapping rules on namespaces changes (#2793) Enforce matching retention and index block size (#2783) [query] Add ClusterNamespacesWatcher and use to generate downsample automapper rules (#2782) [query][dbnode] Wire up etcd-backed Clusters implementation in coordinator (#2758) [query] Fix quantile() argument not being passed through (#2780) [query] Add "median" aggregation to Graphite aggregate() function (#2774) [r2] Ensure KeepOriginal is propagated when adding/reviving rules (#2796) [dbnode] Update default db read limits (#2784)
* master: [query] count_values label naming fixes (#2805) [dbnode] Read only namespaces (#2803) Remove old code: m3nsch (#2822) Update issue templates Update issue templates [proto] Allow zero-alloc reuse of AggregatedMetric protobuf payloads (#2823) Remove old code: aggregator/handler/trafficcontrol (#2821) Remove old code: aggregation/quantile/tdigest (#2820) AddToReset config to change Add transforms to Reset transforms (#2817) [metrics/rules] Add test to assert that mixed-mode keepOriginal uses keepOriginal=true (#2819) [dbnode] Update return unfulfilled for corrupt commit log files default (#2807) [dbnode] Move eviction logic up before ns loop (#2812) [query] Add additional parser tests (#2811) [config] Remove deprecated configuration fields (#2771)
834e25e
to
99f874f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple of small nits
} | ||
|
||
// Checks which value is less if first of them is NaN. | ||
// Basically, we do not want to add NaNs to the heap when it has reached it's cap so this fn should be used to prevent this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; compareNaNs
maybe?
valuesToSort := make([]valueAndMeta, len(values)) | ||
for i, value := range values { | ||
valuesToSort[i] = valueAndMeta{ | ||
val: value, | ||
seriesMeta: seriesMetas[i], | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's just we've seen issues before with forward compat where future changes to slice size caused out of index errors, while append will deal with this case
* master: [read_index_segments] Always validate index segment checksums before reading/validating contents (#2835) [query] Return additional warnings in /query{,_range} endpoints (#2836) Add a check for seriesIter.Err (#2834) [tools] Add concurrent read_index_segments validation option (#2833) [query] Add non-ready namespaces to Clusters interface and use in /namespace/ready endpoint (#2828) [query] Tests for when function argument is an expression (#2809) [large-tiles] Additional metrics added (#2720) [query] Refactor groupByNodes and implement `nodes` arg in asPercent (#2816) [read_index_segments] Add log that outlines which segment being processed (#2825) [aggregator] Handle follower canLead() for not yet flushed windows (#2818)
What this PR does / why we need it:
This PR follows the work started with #2756.
It implements sort functionality and additionally fixes the issues when NaN values are not returned for instant queries in some cases.
It would be enough to merge only this PR because it contains all the changes from #2756.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: