-
Notifications
You must be signed in to change notification settings - Fork 105
Incorrect normalization causes panic #1811
Comments
@shanson7 what version was this? please always mention the version |
Lately it's been a (possibly significant, based on the number of outstanding PRs) modified version of master at the time of posting. I believe this one is from this BB release branch https://github.com/bloomberg/metrictank/tree/release_20200429 which boils down to this commit 75884f2 |
I was able to briefly reproduce this with v0.13.1-800-g74ba401c i launched a fresh docker-dev and ran this in grafana:
stackdump below. strangely after a few minutes it started working
|
Yeah I'll update this soon with more info, but I think the gist of it is that summarize does not return series in canonical form, whereas basically everywhere else in the code we do, and we assume that we have canonical series everywhere. |
If a client requests a cross series aggregation between a canonical series and one that is not canonical because it had alignToFrom, I think we may want to just return "http 400 bad request" in practice (e.g. inside of sumSeries) we may not know why a series is not canonical, so perhaps we should first log all these cases, then review that indeed they're only caused by queries with |
Note that So we could add a "plan validator": after constructing the AST we can detect if a query wants to run The more interesting case is alignToFrom=false.
Switching wholesale to postmarking is a non-trivial departure from how graphite does things, and wouldn't really solve the problem: it would still create an excess datapoint, except this time it would lie at the end (and have a ts > from), also resulting in a non-canonical series. And plotting it in Grafana would create a shifted graph because between graphite and metrictank, all points would be moved over by Because the current implementation returns a first point that lies before |
To demonstrate, I ran
canonical output would have first TS 1592217240 and last TS 1592217540, so 6 points in total. |
See #1811 for more details Before, it was common for output to contain an extraneous point that lies before 'from' (and wouldn't be rendered by Grafana in graph panels), which made the output invalid. This would result in trouble when * combining such a series with another series (#1811) * or when needing normalization (#1845) This effectively slightly changes the output format, but is more robust.
Describe the bug
Relevant stack:
I was able to narrow the crash down to a query like
sum(a, summarize(a, '10min'))
, where the lcm used to normalize is 600 (e.g. interval of a is a factor of 600). It seems that summarize adds in an extra datapoint at the beginning (pre-marking, as graphite does) so it has an extra datapoint after normalization.It seems that perhaps summarize should be following almost identical logic to
normalizeTo
(with some exception for thealignToFrom
case). This would result in postmarked result, but would be compatible for all later normalization.The text was updated successfully, but these errors were encountered: