-
Notifications
You must be signed in to change notification settings - Fork 105
Feature aggregate refactor #771
Feature aggregate refactor #771
Conversation
9c8c94b
to
39a3f5f
Compare
|
||
func getCrossSeriesAggFunc(c string) crossSeriesAggFunc { | ||
switch c { | ||
case "avg", "average": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it always average
never avg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can't we do away with this entire function, perhaps we should assign the crossSeriesAggFunc as a member to the FuncAggregate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this function for groupByTags
in a followup branch, so it will still be needed there.
expr/funcs.go
Outdated
"avg": {NewAvgSeries, true}, | ||
"averageSeries": {NewAvgSeries, true}, | ||
"avg": {NewAggregateConstructor("average"), true}, | ||
"averageSeries": {NewAggregateConstructor("average"), true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit weird to set these strings
- at runtime, would rather do it at compile time
- in a way that each instance gets a copy instead of just hardcoded in one place
IOW I'm thinking have different structs for each function with their own Name method or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's totally doable. I doubt it will have any real performance impact on this transient object, but it would help catch typo's at compile-time when new functions are added/changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I implemented something, but now I'm not as sure about the benefit. I created a struct that holds a function and a name. I cannot use a set of structs, or else we move back to the interfaces. So, now the name and the struct can get out of sync (bad copy/pasta, etc). Can you give me more insight into what you would like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, at compile time, a non-existent function can be caught. At test time, typo's / mismatch between name and function can be caught.
At startup (specifically, funcs::init()
) the function and name would be assigned (no lookup needed).
Is this acceptable?
expr/seriesaggregators.go
Outdated
point.Val = math.NaN() | ||
} else { | ||
point.Val = sum | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparing what was in func_sumseries.go with this, i see 2 differences.
- instead of nan bool, you use num int
- instead of working with
point
throughout you createsum
first, then put it in point?
why? what's the benefit of this? (in general I would advise keep the logic the same, unless there's a benefit. makes reviewing easier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were each slightly modified versions of each other based on averageSeries
. I would prefer these functions to all look like each other, than to all look like the original xxxSeries
impl (since those were independent and these are co-located). For sum
I can switch to using a bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
expr/seriesaggregators.go
Outdated
max := math.NaN() | ||
for j := 0; j < len(in); j++ { | ||
p := in[j].Datapoints[i].Val | ||
if !math.IsNaN(p) && (math.IsNaN(max) || p > max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see you replaced math.Max with a custom clause (which compiles to a jump, which could be expensive in case the branch is mispredicted). in master we use math.Max which i'm not sure exactly how it works, but i assume it's just 1 native instruction and hence likely faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that back to math.Max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shockingly, math.Min is MUCH slower:
bin/benchcmp using_math.txt not_using_math.txt
benchmark old ns/op new ns/op delta
BenchmarkSeriesAggregateMin10k_100NoNulls-8 14308915 3435607 -75.99%
BenchmarkSeriesAggregateMin10k_100WithNulls-8 14437599 3477521 -75.91%
benchmark old MB/s new MB/s speedup
BenchmarkSeriesAggregateMin10k_100NoNulls-8 838.64 3492.83 4.16x
BenchmarkSeriesAggregateMin10k_100WithNulls-8 831.16 3450.73 4.15x
It was 4x faster to put the logic in the if statement.
note that crossSeriesCnt != graphite's countSeries (http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.countSeries) you don't claim it is, so that's fine, just a warning. i supect you might introduce it via http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.aggregate which is ok i think (despite graphite not having it). one that i'm more sceptical of is crossSeriesLst. if a user uses this, they should probably just have filtered down better. note that graphite also doesn't support aggregate with last. could also be a bit confusing as "last" in this context has nothing to do with time order, rather lexical ordering. but i'm ok with that too, if you want to add it.although for this function, you'd be better off I think iterating in reverse order and breaking when you find the first non-nan point. |
expr/seriesaggregators.go
Outdated
func crossSeriesMin(in []models.Series, out *[]schema.Point) { | ||
for i := 0; i < len(in[0].Datapoints); i++ { | ||
nan := true | ||
min := math.NaN() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start with inf, since math.Min() returns NaN if one of the inputs is NaN (every time)
I think I'll remove Cnt/Lst for now. I took them as-is from batch/aggregator.go and just embedded them in a second loop. They would be easy enough to add back in later. Cnt is actually implementing something I'd done by composing sum + isNonNull, which I find interesting. |
I removed |
looks like you accidentally reverted to the non-math.Max and non-math.Min approaches. |
That was the "Improve performance". math.Max/math.Min was considerably slower. |
See #771 (comment) re: performance difference. I can split the commit if you want. |
aha sweet. missed your comment. ok so that's good then. the difference between them is that the former runs the graphite processing chain (incl reading inputs, generating output structure with proper name and attributes, dealing with the slicepool, etc) through a fake "average", whereas the latter runs the selected cross series aggregation function. i don't know, maybe it's too obvious to warrant comments. and i'm not sure what we would rename to. |
This is a preliminary change that makes follow up features easier.
This change combines the similar basic aggregation functions into a single
aggregate
function. This can be extended to support the new aggregate function and groupByTags.