Feature aggregate refactor #771

shanson7 · 2017-11-24T15:26:55Z

This is a preliminary change that makes follow up features easier.

This change combines the similar basic aggregation functions into a single aggregate function. This can be extended to support the new aggregate function and groupByTags.

Dieterbe · 2017-11-27T18:20:38Z

expr/seriesaggregators.go

+
+func getCrossSeriesAggFunc(c string) crossSeriesAggFunc {
+	switch c {
+	case "avg", "average":


isn't it always average never avg?

actually, can't we do away with this entire function, perhaps we should assign the crossSeriesAggFunc as a member to the FuncAggregate

I use this function for groupByTags in a followup branch, so it will still be needed there.

Dieterbe · 2017-11-27T18:27:29Z

expr/funcs.go

-		"avg":            {NewAvgSeries, true},
-		"averageSeries":  {NewAvgSeries, true},
+		"avg":            {NewAggregateConstructor("average"), true},
+		"averageSeries":  {NewAggregateConstructor("average"), true},


seems a bit weird to set these strings

at runtime, would rather do it at compile time

in a way that each instance gets a copy instead of just hardcoded in one place

IOW I'm thinking have different structs for each function with their own Name method or something.

That's totally doable. I doubt it will have any real performance impact on this transient object, but it would help catch typo's at compile-time when new functions are added/changed.

So, I implemented something, but now I'm not as sure about the benefit. I created a struct that holds a function and a name. I cannot use a set of structs, or else we move back to the interfaces. So, now the name and the struct can get out of sync (bad copy/pasta, etc). Can you give me more insight into what you would like?

So, at compile time, a non-existent function can be caught. At test time, typo's / mismatch between name and function can be caught.

At startup (specifically, funcs::init()) the function and name would be assigned (no lookup needed).

Is this acceptable?

Dieterbe · 2017-11-27T18:37:52Z

expr/seriesaggregators.go

+			point.Val = math.NaN()
+		} else {
+			point.Val = sum
+		}


comparing what was in func_sumseries.go with this, i see 2 differences.

instead of nan bool, you use num int

instead of working with point throughout you create sum first, then put it in point?

why? what's the benefit of this? (in general I would advise keep the logic the same, unless there's a benefit. makes reviewing easier)

These were each slightly modified versions of each other based on averageSeries. I would prefer these functions to all look like each other, than to all look like the original xxxSeries impl (since those were independent and these are co-located). For sum I can switch to using a bool.

sounds good

Dieterbe · 2017-11-27T18:44:46Z

expr/seriesaggregators.go

+		max := math.NaN()
+		for j := 0; j < len(in); j++ {
+			p := in[j].Datapoints[i].Val
+			if !math.IsNaN(p) && (math.IsNaN(max) || p > max) {


i see you replaced math.Max with a custom clause (which compiles to a jump, which could be expensive in case the branch is mispredicted). in master we use math.Max which i'm not sure exactly how it works, but i assume it's just 1 native instruction and hence likely faster.

I can change that back to math.Max

Shockingly, math.Min is MUCH slower:

bin/benchcmp using_math.txt not_using_math.txt benchmark old ns/op new ns/op delta BenchmarkSeriesAggregateMin10k_100NoNulls-8 14308915 3435607 -75.99% BenchmarkSeriesAggregateMin10k_100WithNulls-8 14437599 3477521 -75.91% benchmark old MB/s new MB/s speedup BenchmarkSeriesAggregateMin10k_100NoNulls-8 838.64 3492.83 4.16x BenchmarkSeriesAggregateMin10k_100WithNulls-8 831.16 3450.73 4.15x

It was 4x faster to put the logic in the if statement.

Dieterbe · 2017-11-27T23:05:50Z

note that crossSeriesCnt != graphite's countSeries (http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.countSeries) you don't claim it is, so that's fine, just a warning. i supect you might introduce it via http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.aggregate which is ok i think (despite graphite not having it).

one that i'm more sceptical of is crossSeriesLst. if a user uses this, they should probably just have filtered down better. note that graphite also doesn't support aggregate with last. could also be a bit confusing as "last" in this context has nothing to do with time order, rather lexical ordering. but i'm ok with that too, if you want to add it.although for this function, you'd be better off I think iterating in reverse order and breaking when you find the first non-nan point.

Dieterbe · 2017-11-27T23:07:18Z

expr/seriesaggregators.go

+func crossSeriesMin(in []models.Series, out *[]schema.Point) {
+	for i := 0; i < len(in[0].Datapoints); i++ {
+		nan := true
+		min := math.NaN()


start with inf, since math.Min() returns NaN if one of the inputs is NaN (every time)

shanson7 · 2017-11-28T14:44:45Z

I think I'll remove Cnt/Lst for now. I took them as-is from batch/aggregator.go and just embedded them in a second loop. They would be easy enough to add back in later.

Cnt is actually implementing something I'd done by composing sum + isNonNull, which I find interesting.

shanson7 · 2017-11-28T16:10:53Z

I removed Cnt and Lst and fixed Min. I added test cases and benchmarks as well.

Dieterbe · 2017-11-28T22:04:23Z

looks like you accidentally reverted to the non-math.Max and non-math.Min approaches.
that stuff doesn't belong in the "looks like the "Improve performance, fix bug in min" commit anyway, please take it out of that commit.

shanson7 · 2017-11-28T22:05:32Z

That was the "Improve performance". math.Max/math.Min was considerably slower.

shanson7 · 2017-11-28T22:08:03Z

See #771 (comment) re: performance difference. I can split the commit if you want.

Dieterbe · 2017-11-28T22:18:24Z

aha sweet. missed your comment. ok so that's good then.
we could clarify the difference between the first benchmarks like BenchmarkAggregate10k_1NoNulls and benchmarkAggregate; versus BenchmarkSeriesAggregateAvg10k_100NoNulls and benchmarkSeriesAggregate

the difference between them is that the former runs the graphite processing chain (incl reading inputs, generating output structure with proper name and attributes, dealing with the slicepool, etc) through a fake "average", whereas the latter runs the selected cross series aggregation function.

i don't know, maybe it's too obvious to warrant comments. and i'm not sure what we would rename to.

shanson7 added 2 commits November 27, 2017 07:56

Replace simple aggregations with a single base implementation

6031c7f

Use function pointers instead of interface to maintain performance

39a3f5f

shanson7 force-pushed the feature_aggregate_refactor branch from 9c8c94b to 39a3f5f Compare November 27, 2017 12:57

DanCech requested a review from Dieterbe November 27, 2017 15:19

Dieterbe suggested changes Nov 27, 2017

View reviewed changes

shanson7 added 2 commits November 27, 2017 14:57

Small modifications to series aggregators to better conform

5657a3b

Pass function in constructor to prevent run-time lookup

943e4f0

Dieterbe reviewed Nov 27, 2017

View reviewed changes

shanson7 added 3 commits November 28, 2017 11:06

Add tests and benchmarks

856081e

Improve performance, fix bug in min

ea4d215

Remove currently unused functions cnt and lst

24b0043

Dieterbe approved these changes Nov 28, 2017

View reviewed changes

Dieterbe merged commit e1ab774 into grafana:master Nov 28, 2017

Aergonus deleted the feature_aggregate_refactor branch January 22, 2018 14:49

Dieterbe mentioned this pull request Mar 24, 2019

different-sized series causing panic in the expression functions #761

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature aggregate refactor #771

Feature aggregate refactor #771

shanson7 commented Nov 24, 2017

Dieterbe Nov 27, 2017

Dieterbe Nov 27, 2017

shanson7 Nov 27, 2017

Dieterbe Nov 27, 2017

shanson7 Nov 27, 2017

shanson7 Nov 27, 2017 •

edited

Loading

shanson7 Nov 27, 2017

Dieterbe Nov 27, 2017

shanson7 Nov 27, 2017

Dieterbe Nov 27, 2017

Dieterbe Nov 27, 2017

shanson7 Nov 27, 2017

shanson7 Nov 28, 2017 •

edited

Loading

Dieterbe commented Nov 27, 2017 •

edited

Loading

Dieterbe Nov 27, 2017

shanson7 commented Nov 28, 2017

shanson7 commented Nov 28, 2017

Dieterbe commented Nov 28, 2017

shanson7 commented Nov 28, 2017

shanson7 commented Nov 28, 2017

Dieterbe commented Nov 28, 2017

Feature aggregate refactor #771

Feature aggregate refactor #771

Conversation

shanson7 commented Nov 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shanson7 Nov 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shanson7 Nov 28, 2017 • edited Loading

Choose a reason for hiding this comment

Dieterbe commented Nov 27, 2017 • edited Loading

Choose a reason for hiding this comment

shanson7 commented Nov 28, 2017

shanson7 commented Nov 28, 2017

Dieterbe commented Nov 28, 2017

shanson7 commented Nov 28, 2017

shanson7 commented Nov 28, 2017

Dieterbe commented Nov 28, 2017

shanson7 Nov 27, 2017 •

edited

Loading

shanson7 Nov 28, 2017 •

edited

Loading

Dieterbe commented Nov 27, 2017 •

edited

Loading