-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix first/last aggregation functions #5208
Conversation
// We only need to pay attention to the first bucket; if | ||
// there are more buckets, there's nothing to rollup for | ||
// them as the next changes we wrote are for `block_time`, | ||
// and we'll catch that on the next iteration of the loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how in the next iteration we catch the other buckets, wont we miss all the buckets between the first bucket and block_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking here is this: assume we get passed in block_times = [b1, b2, b3, .. ]
but b1
and b2
are far apart. We call rollup.interval.buckets(b1, b2)
at some point in the iteration which produces timestamps [t1, t2, ..]
. Since b1
and b2
are far apart, we have something like t1 <= b1 < t2 < t3 < t4 < t5 <= b2
but we know that there are no writes between b1
and b2
- if there were, we'd have some block time in block_times
between b1
and b2
. So we only need to do a rollup for t1 < b1 < t2
. After that, we set last_rollup = b2
and repeat the loop for that, which will roll up the bucket t5 <= b2 < t6
. So there's no need to worry about the buckets starting at t2
, t3
, and t4
.
Maybe there's an error in my logic, but I think that's correct (and tests haven't shown any missed rollups)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you find the above convincing, I should probably add that to the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes its convincing, thanks for the clarification.
For subgraphs with big gaps in between writes, we tried to do rollups for times where we already know that we don't have changes.
The two names are aliases, but we will use them in aggregations; to make notation more consistent, we'll use int4 and int8
a2a0243
to
ccb992e
Compare
The
first
/last
aggregation functions were not working as intended. This PR fixes that; unfortunately, we need to require that theid
of time series data points are in insertion order to achieve that and jump through a bunch of hoops to definearg_min
/arg_max
functions.