-
Notifications
You must be signed in to change notification settings - Fork 105
Fix 323 #333
Conversation
this results in: --- FAIL: TestGetTarget (0.00s) dataprocessor_test.go:489: case "off-by-5" req "quantization moves points 55s later" from 3011 to 3121 - exp: [{3005 3060} {3065 3120}] - got [{NaN 3060} {3065 3120}] FAIL exit status 1 FAIL github.com/raintank/metrictank 0.029s
when we quantize, sometimes we need an earlier point to move to the right (into the requested range), but the point we need fell before the requested range. So we just fetch `interval-1` more when needed to make sure we have the right point in case this happens. We can just extend the use of the aggSpan (now interval) argument of getSeries to always refer to the interval of the requested archive (whether aggregated or not)
// so that if we have quantize later (in fix()) we have a valid first point. | ||
// (at this point we can't tell whether the raw series will be quantized or not) | ||
if consolidator == consolidation.None { | ||
fromUnix = fromUnix - interval + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesnt seem right to me.
assume my interval is 10, and my request is from 15 to 35. Then i would expect quantized values at 20 and 30.
With this change, you are now requesting between 6 and 35.
If the raw values are offset by between 1 and 6, then things will work, as the adjusted from time will only match points between 11 and 16 inclusive, which all get quantized to 20
but if the offset is 7,8,9 or 0 the request will match points at 7,8,9,10 which get quantized to 10
I think what you need to do is adjust both fromUnix and ToUnix to be on the appropriate quantized boundaries.
eg. the fromUnix=10, toUnix=30 so that any points that match are quantized to either 20 or 30
Raw | quantized | expected match | this commit match |
---|---|---|---|
5 | 10 | no | no |
6 | 10 | no | no |
7 | 10 | no | yes |
8 | 10 | no | yes |
9 | 10 | no | yes |
10 | 10 | no | yes |
11 | 20 | yes | yes |
12 | 20 | yes | yes |
. | . | . | . |
19 | 20 | yes | yes |
20 | 20 | yes | yes |
21 | 30 | yes | yes |
. | . | . | . |
29 | 30 | yes | yes |
30 | 30 | yes | yes |
31 | 40 | no | yes |
32 | 40 | no | yes |
So if there are raw values recorded at 9, 19, 29, 39, our query of 15-35 should return the values at 19 and 29. However as can be seen from the table above, values at 9, 19 and 29 are returned when using the logic in this commit.
if the raw values are at 4, 14, 24, 34 then our query should return values at 14 and 24. However this commit would result in the values 14, 24 and 34 being returned.
Both of these outcomes are clearly wrong as a query for a 20second range can't possibly include 3 values that are 10seconds apart.
while we now use half the amount of lines, we actually test all of the possible scenarios for a given interval.
with this approach we make more precise adjustments to from/to which has two advantages: 1) we now query a slightly narrower timerange, which can be faster 2) historically, fix() has always been needed after getSeries(). the previous patch relied on this and also requires it in order to: A provide the correct amount of points to the user (fix removes unneeded points out of the range) B provide the user points with clean, quantized timestamps that actually lie in the requested range With the new patch, we only require calling fix() after getSeries() for B, not for A. So it's still a requirement, and we still have to honor it. Technically, if you only wanted to query for points between a given range and did not care about B, we don't need to call fix to "pretty up" the output, and could bypass fix() and be more efficient. Maybe later we can make it optional.
Instead of having the somewhat awkward requirement that data returned by getSeries() needs an additional pass through fix(), just have getSeries take care of it. An additional benefit is that it simplifies the code.
@woodsaj I implemented the improved solution, added more complete unit tests and also made getSeries itself call fix() which removes any confusion about getSeries() output requiring an additional fix() step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great.
now that getSeries calls fix() itself, and by calling getTarget with archive 0 and aggNum 1, it basically just calls getSeries anyway, which is the thing we wanted to test in the first place, we can just call getSeries()
No description provided.