[query] sort, topk, bottomk fixes for instant queries #2792

soundvibe · 2020-10-22T14:53:02Z

What this PR does / why we need it:

This PR follows the work started with #2756.
It implements sort functionality and additionally fixes the issues when NaN values are not returned for instant queries in some cases.
It would be enough to merge only this PR because it contains all the changes from #2756.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

topk and bottomk results for instant queries are the same except that they are now sorted.

Does this PR require updating code package or user-facing documentation?:

Small optimization in appendValuesInstant.

made valueAndMeta struct package private.

OrderedFlush() now sorts the results instead popping them one by one. FloatHeap won't be created if it's not used.

FloatHeap now supports adding NaN values. Uncommented topk, bottomk and sort compatibility tests (now they pass).

arnikola · 2020-10-22T15:17:55Z

src/query/parser/promql/matchers.go

 	case linear.SortType, linear.SortDescType:
-		return nil, false, err
+		p, err = linear.NewSortOp(name)


src/query/models/query_context.go

src/query/block/meta.go

src/query/api/v1/handler/prometheus/native/read.go

src/query/block/meta.go

src/query/functions/aggregation/take_test.go

src/query/functions/linear/sort.go

src/query/functions/linear/sort_test.go

src/query/functions/aggregation/take.go

src/query/functions/utils/heap_test.go

arnikola · 2020-10-26T13:04:43Z

src/query/functions/utils/heap.go

@@ -33,17 +35,34 @@ type ValueIndexPair struct {
 type lessFn func(ValueIndexPair, ValueIndexPair) bool

 func maxHeapLess(i, j ValueIndexPair) bool {
-	if i.Val == j.Val {
+	if i.Val == j.Val || math.IsNaN(i.Val) && math.IsNaN(j.Val) {


nit: These conditionals are very hard to read and understand, can you make them more clear and verbose?

func maxHeapLess(i, j ValueIndexPair) bool { func maxHeapLess(i, j ValueIndexPair) bool { if math.IsNaN(i.Val) { if math.IsNaN(j.Val) { return i.Index > j.Index } return true } if math.IsNaN(j.Val) { return true } if i.Val == j.Val { return i.Index > j.Index } return i.Val < j.Val }

I'm not sure that having more code makes it more clear (especially in this case). I believe that logical and and or operators have their place and were added to programming languages for good reasons. I highly doubt that avoiding them makes code more clear.

As a reviewer, it took a while to parse the logic as is; would recommend changing it to be more straightforward

Simplified them and added some documentation to make it more clear.

src/query/functions/aggregation/take.go

arnikola · 2020-10-26T13:33:56Z

src/query/functions/linear/sort.go

+	valuesToSort := make([]valueAndMeta, len(values))
+	for i, value := range values {
+		valuesToSort[i] = valueAndMeta{
+			val:        value,
+			seriesMeta: seriesMetas[i],
+		}
+	}


nit: safer to use append here

We are relying on matching index when setting from seriesMetas[i] so append would just hide this and make it less clear.

Yeah, it's just we've seen issues before with forward compat where future changes to slice size caused out of index errors, while append will deal with this case

src/query/functions/aggregation/take.go

equalsPairs now checks individual struct elements instead of their printed values.

Used append in place when indices are not needed. Documented and extracted some comparison functions for better readability and clearness.

* master: [query] Improve precision for variance and stddev of equal values (#2799) Support dynamic namespace resolution for embedded coordinators (#2815) [dbnode] Add index regexp DFA compilation cache to avoid allocating DFAs for same expressions (#2814) [dbnode] Set default cache on retrieve to false, prepare testing with cache none (#2813) [tools] Output annotations as base64 (#2743) Add Reset Transformation (#2794) [large-tiles] Fix for a reverse index when querying downsampled namespace (#2808) [dbnode] Evict info files cache before index bootstrap in peers bootstrapper (#2802) [dbnode] Bump default filesystem persist rate limit (#2806) [coordinator] Add metrics and configurability of downsample and ingest writer worker pool (#2797) [dbnode] Fix concurrency granularity of seekerManager.UpdateOpenLease (#2790) [dbnode] Use correct import path for atomic library (#2801) Regenerate carbon automapping rules on namespaces changes (#2793) Enforce matching retention and index block size (#2783) [query] Add ClusterNamespacesWatcher and use to generate downsample automapper rules (#2782) [query][dbnode] Wire up etcd-backed Clusters implementation in coordinator (#2758) [query] Fix quantile() argument not being passed through (#2780) [query] Add "median" aggregation to Graphite aggregate() function (#2774) [r2] Ensure KeepOriginal is propagated when adding/reviving rules (#2796) [dbnode] Update default db read limits (#2784)

* master: [query] count_values label naming fixes (#2805) [dbnode] Read only namespaces (#2803) Remove old code: m3nsch (#2822) Update issue templates Update issue templates [proto] Allow zero-alloc reuse of AggregatedMetric protobuf payloads (#2823) Remove old code: aggregator/handler/trafficcontrol (#2821) Remove old code: aggregation/quantile/tdigest (#2820) AddToReset config to change Add transforms to Reset transforms (#2817) [metrics/rules] Add test to assert that mixed-mode keepOriginal uses keepOriginal=true (#2819) [dbnode] Update return unfulfilled for corrupt commit log files default (#2807) [dbnode] Move eviction logic up before ns loop (#2812) [query] Add additional parser tests (#2811) [config] Remove deprecated configuration fields (#2771)

arnikola

LGTM with a couple of small nits

arnikola · 2020-11-04T20:07:52Z

src/query/functions/utils/heap.go

+}
+
+// Checks which value is less if first of them is NaN.
+// Basically, we do not want to add NaNs to the heap when it has reached it's cap so this fn should be used to prevent this.


nit; compareNaNs maybe?

arnikola · 2020-11-04T20:09:58Z

src/query/functions/linear/sort.go

+	valuesToSort := make([]valueAndMeta, len(values))
+	for i, value := range values {
+		valuesToSort[i] = valueAndMeta{
+			val:        value,
+			seriesMeta: seriesMetas[i],
+		}
+	}


Yeah, it's just we've seen issues before with forward compat where future changes to slice size caused out of index errors, while append will deal with this case

* master: [read_index_segments] Always validate index segment checksums before reading/validating contents (#2835) [query] Return additional warnings in /query{,_range} endpoints (#2836) Add a check for seriesIter.Err (#2834) [tools] Add concurrent read_index_segments validation option (#2833) [query] Add non-ready namespaces to Clusters interface and use in /namespace/ready endpoint (#2828) [query] Tests for when function argument is an expression (#2809) [large-tiles] Additional metrics added (#2720) [query] Refactor groupByNodes and implement `nodes` arg in asPercent (#2816) [read_index_segments] Add log that outlines which segment being processed (#2825) [aggregator] Handle follower canLead() for not yet flushed windows (#2818)

soundvibe added 10 commits October 15, 2020 16:46

fixes most of topk, bottomk ordered tests when instant queries are used

e488a3b

Merge branch 'master' of github.com:m3db/m3 into takeop-fix

054833f

formatted some code and extracted instant logic into separate method

f6ed6e1

Changes after review.

690dcd3

Small optimization in appendValuesInstant.

fixed imports.

5c7cf65

made valueAndMeta struct package private.

Optimized instant queries - now only single block will be created.

66746ac

OrderedFlush() now sorts the results instead popping them one by one. FloatHeap won't be created if it's not used.

improved how max series count is calculated

3d12b53

removed unused method

d8a5cc9

initial implementation

a7496e1

Fixes NaN issues with m3 queries for sort and topk/bottomk.

39eb746

FloatHeap now supports adding NaN values. Uncommented topk, bottomk and sort compatibility tests (now they pass).

soundvibe added the area:query All issues pertaining to query label Oct 22, 2020

soundvibe requested review from gediminasgu, linasm and arnikola October 22, 2020 14:53

soundvibe assigned gediminasgu, linasm and arnikola Oct 22, 2020

Retain old KeepNans flag for non instant queries

4d2b567