[query] Improve precision for variance and stddev of equal values #2799

vpranckaitis · 2020-10-26T13:50:03Z

What this PR does / why we need it:

Improves precision for variance() and stddev() aggregation operations when the vector contains equal values. In some cases, this caused in mismatch between Prometheus and M3, where the former would return 0.0 and the latter would return a number that is close to zero, but not equal.

Special notes for your reviewer:

Using the same algorithm as in Prometheus

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

vpranckaitis · 2020-10-26T13:58:06Z

src/query/test/compatibility/testdata/aggregators.test

-# FAILING issue #10. (it is almost zero)
-#eval instant at 50m stddev(http_requests)
-#  {} 0.0
+eval instant at 50m stddev(http_requests)
+  {} 0.0

-# FAILING issue #11. (it is almost zero)
-#eval instant at 50m stdvar(http_requests)
-#  {} 0.0
+eval instant at 50m stdvar(http_requests)
+  {} 0.0


While the previous implementation were returning not 0.0 for these tests, the tests were not failing since the results were inside the tolerance range (see code snippet below). Hence, I added a couple of tests in function_test.go file.

m3/src/query/test/compatibility/test.go

Line 70 in d47dcc3

epsilon = 0.000001 // Relative error allowed for sample values.

This is unexpected. If this was not failing, we would not have noticed the issue.
Please check, maybe the tolerance was added with some later commit (after these tests were commented out)?
If this tolerance is no longer needed, we might want to abandon it.

It was failing in the initial commit (2c4f65c), but stopped after this line change (440171d#diff-25ae4a7d6eea65e4db2787e6a61f5aea2646020856df3c3f5c7a5cb6b368be1cL563-R565).

I guess the tolerance is still needed, though we could try tuning a bit more how the 0.0 vs almost-zero values are compared.

An idea: maybe it would be useful to test both successes and failures? E.g. instead of commenting tests out, we could introduce a failing flag (e.g. eval_fail or whatever is easier to implement) and check in TestEvaluations that these tests are actually failing. That way we could see when some changes fix (or un-fail) not only the intended tests, but some others, too.

After discussing with @linasm we've decided to update this line 5db10aa#diff-25ae4a7d6eea65e4db2787e6a61f5aea2646020856df3c3f5c7a5cb6b368be1cL565-R565

vpranckaitis · 2020-10-26T14:30:35Z

src/query/functions/aggregation/function.go

+	// Using Welford's online algorithm for calculating variance
+	// https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm
+	//
+	// This algorithm is used in Prometheus and also should provide better numerical precision than


Code in prometheus:
https://github.com/prometheus/prometheus/blob/e7e60623ff312ba47906fd4af49a7a24eb32f380/promql/engine.go#L2017-L2021
https://github.com/prometheus/prometheus/blob/e7e60623ff312ba47906fd4af49a7a24eb32f380/promql/engine.go#L2062-L2066

vpranckaitis · 2020-10-26T14:32:55Z

src/query/functions/aggregation/function_test.go

+	{
+		"five 1.33e-5",
+		[]float64{1.33e-5, 1.33e-5, 1.33e-5, 1.33e-5, 1.33e-5},
+	},
+	{
+		"three 13.3",
+		[]float64{13.3, 13.3, 13.3},
+	},


13.3 is from the aggregators.test file. I've randomly tried a handful of values, 1.33e-5 is one that makes the previous version of varianceFn()/stddevFn() fail the test.

linasm

LGTM, just 1 nit. Also a question about the tolerance in tests.

linasm · 2020-10-26T19:14:52Z

src/query/functions/aggregation/function.go

+	count := 0
+	partialMean := 0.0
+	partialVarTimesCount := 0.0 // for better precision, calculate `variance * count` and divide at the end


Nit: consider putting these into a var (...) block. Usually it looks cleaner that way.

arnikola · 2020-10-28T19:18:27Z

src/query/test/compatibility/test.go

@@ -562,7 +562,7 @@ func almostEqual(a, b float64) bool {
 	diff := math.Abs(a - b)

 	if a == 0 || b == 0 || diff < minNormal {
-		return diff < epsilon
+		return diff < epsilon*minNormal


This seems a bit weird, thought that minNormal is the lowest possible float, although guess this is what Prom does in their tests too 🤷‍♂️

TIL: min-normal is not the lowest one, a nice description in here. Seems that when you go below that, you start losing precision (number of digits) and there might be a performance penalty.

arnikola

LGTM

* master: [query] Improve precision for variance and stddev of equal values (#2799) Support dynamic namespace resolution for embedded coordinators (#2815) [dbnode] Add index regexp DFA compilation cache to avoid allocating DFAs for same expressions (#2814) [dbnode] Set default cache on retrieve to false, prepare testing with cache none (#2813) [tools] Output annotations as base64 (#2743) Add Reset Transformation (#2794) [large-tiles] Fix for a reverse index when querying downsampled namespace (#2808) [dbnode] Evict info files cache before index bootstrap in peers bootstrapper (#2802) [dbnode] Bump default filesystem persist rate limit (#2806) [coordinator] Add metrics and configurability of downsample and ingest writer worker pool (#2797) [dbnode] Fix concurrency granularity of seekerManager.UpdateOpenLease (#2790) [dbnode] Use correct import path for atomic library (#2801) Regenerate carbon automapping rules on namespaces changes (#2793) Enforce matching retention and index block size (#2783) [query] Add ClusterNamespacesWatcher and use to generate downsample automapper rules (#2782) [query][dbnode] Wire up etcd-backed Clusters implementation in coordinator (#2758) [query] Fix quantile() argument not being passed through (#2780) [query] Add "median" aggregation to Graphite aggregate() function (#2774) [r2] Ensure KeepOriginal is propagated when adding/reviving rules (#2796) [dbnode] Update default db read limits (#2784)

vpranckaitis added 3 commits October 26, 2020 15:40

[query] Improve precision for variance and stddev of equal values

8a9df66

typos

00fc8e8

rename tests

306cb04

vpranckaitis commented Oct 26, 2020

View reviewed changes

vpranckaitis requested a review from linasm October 26, 2020 14:58

vpranckaitis assigned linasm Oct 26, 2020

linasm reviewed Oct 26, 2020

View reviewed changes

linasm removed their assignment Oct 26, 2020

vpranckaitis added 2 commits October 27, 2020 10:01

PR comments

e5a9d98

update a precision check in almostEqual()

5db10aa

vpranckaitis requested a review from arnikola October 27, 2020 12:34

vpranckaitis assigned arnikola Oct 27, 2020

arnikola reviewed Oct 28, 2020

View reviewed changes

arnikola approved these changes Oct 28, 2020

View reviewed changes

arnikola removed their assignment Oct 28, 2020

Merge branch 'master' into vilius/variance_precision

9b57ea7

vpranckaitis merged commit 72fd76b into master Oct 29, 2020

vpranckaitis deleted the vilius/variance_precision branch October 29, 2020 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] Improve precision for variance and stddev of equal values #2799

[query] Improve precision for variance and stddev of equal values #2799

vpranckaitis commented Oct 26, 2020 •

edited

Loading

vpranckaitis Oct 26, 2020

linasm Oct 26, 2020 •

edited

Loading

vpranckaitis Oct 27, 2020

vpranckaitis Oct 27, 2020

vpranckaitis Oct 26, 2020

vpranckaitis Oct 26, 2020

linasm left a comment

linasm Oct 26, 2020

vpranckaitis Oct 27, 2020

arnikola Oct 28, 2020

vpranckaitis Oct 29, 2020 •

edited

Loading

arnikola left a comment

[query] Improve precision for variance and stddev of equal values #2799

[query] Improve precision for variance and stddev of equal values #2799

Conversation

vpranckaitis commented Oct 26, 2020 • edited Loading

Choose a reason for hiding this comment

linasm Oct 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vpranckaitis Oct 29, 2020 • edited Loading

Choose a reason for hiding this comment

arnikola left a comment

Choose a reason for hiding this comment

vpranckaitis commented Oct 26, 2020 •

edited

Loading

linasm Oct 26, 2020 •

edited

Loading

vpranckaitis Oct 29, 2020 •

edited

Loading