-
Notifications
You must be signed in to change notification settings - Fork 105
implement nudging similar to graphite. #647
Conversation
aa05b72
to
50ce641
Compare
wanna do some more work on this first. |
50ce641
to
7ddec47
Compare
* move nudging-based consolidation to consolidation package this will make it easier to unit test, as opposed to having it in the plan. * since our aggregation returns points that are clean multiples of the new interval, and we don't want to move information into the past (which would be lying about the future) e.g. instead of moving a spike of data to an earlier timestamp, we move it to a later one; since we do this, we need to adjust the nudging logic to treat the point after a clean interval as the first one of any bucket.
7ddec47
to
ffc134c
Compare
at this point, i'm pretty happy with it I think. it is simple and assures MaxDataPoints is always honored (especially important when mdp=1) (unlike the other approach i explored and described in the comments but decided not to persue cause it was too complex). for a demo see https://vimeo.com/220016610 i start fakemetrics with secondly points, and then visualize the points with a few MDP settings. |
consolidation/consolidate.go
Outdated
// move start until it maps to the first point of an aggregation bucket | ||
// since clean multiples of the new postAggInterval are the last point to go into an aggregation | ||
// we want a point that comes preAggInterval after it. | ||
remainder := (start - preAggInterval) % postAggInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely clear why we subtract preAggInterval here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a commit which should explain this stuff (and some of the other stuff) better. let me know what you think / if that clears it up.
After spending some time going over this and the graphite implementation with @replay I think it makes sense, just that one question above. I'm going to dig deeper on the graphite side and figure out what the story is with its approach:
while it seems like it should be:
(the |
// with our nudging approach, we strip all the points that do not form a full aggregation bucket. | ||
// IOW the desired first point must be the the first point of a bucket. | ||
// in the example above, it means we strip the first 2 points and start at ts=25 (which will go into ts=40). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. This actually seems to be different from the way graphite does things (based on my reading of the code), where the timestamp of the aggregated points is the start of the interval rather than the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is. it's also how we've implemented runtime consolidation in MT since forever
it's based on a guiding principle that i came up with but that is afaik in the wider graphite ecosystem not often really discussed or cared about and as a result also implemented oppositely across different tools (eg statsd also postmarks like MT). see https://grafana.com/blog/2016/03/03/25-graphite-grafana-and-statsd-gotchas/#timestamps for details
basically my point is that data should never move into the past (which breaks the space time continuum: observations can afaik not preceed the event they observe), but that people can expect data to be delayed (and move into the future instead, which is just a delay and doesn't break laws of physics) e.g. visually, a spike should never appear on a chart at an earlier time then when it actually happened, and people should expect that with consolidation, data will move it a bit to the right. so that whenever you look at a point in time (whether aggregated or not) you can always know that happened at some point in time since the previous timestamp.
perhaps this is just one of my quixotic quirks? is there a flaw in my reasoning?
especially now that we care more about stricter compatibility with graphite, do we want to undo this and do things the graphite way? though i would rather keep the postmarking, i'm open to discussion and/or change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all, I think your reasoning makes sense and it's also how things like movingAverage
work in graphite, so I'm seriously considering a PR to change graphite to work according to your logic. Will need to chew on that a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would make more sense to change Graphite than to change the current MT implementation (Although I wouldn't argue based on the theory of relativity).
can i get an approve on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
seems to work fine, but still have to make unit tests