Fix first/last aggregation functions #5208

lutter · 2024-02-14T00:57:03Z

The first/last aggregation functions were not working as intended. This PR fixes that; unfortunately, we need to require that the id of time series data points are in insertion order to achieve that and jump through a bunch of hoops to define arg_min/arg_max functions.

incrypto32 · 2024-02-16T11:57:46Z

store/postgres/src/relational.rs

+                // We only need to pay attention to the first bucket; if
+                // there are more buckets, there's nothing to rollup for
+                // them as the next changes we wrote are for `block_time`,
+                // and we'll catch that on the next iteration of the loop.


I don't understand how in the next iteration we catch the other buckets, wont we miss all the buckets between the first bucket and block_time?

My thinking here is this: assume we get passed in block_times = [b1, b2, b3, .. ] but b1 and b2 are far apart. We call rollup.interval.buckets(b1, b2) at some point in the iteration which produces timestamps [t1, t2, ..]. Since b1 and b2 are far apart, we have something like t1 <= b1 < t2 < t3 < t4 < t5 <= b2 but we know that there are no writes between b1 and b2 - if there were, we'd have some block time in block_times between b1 and b2. So we only need to do a rollup for t1 < b1 < t2. After that, we set last_rollup = b2 and repeat the loop for that, which will roll up the bucket t5 <= b2 < t6. So there's no need to worry about the buckets starting at t2, t3, and t4.

Maybe there's an error in my logic, but I think that's correct (and tests haven't shown any missed rollups)

If you find the above convincing, I should probably add that to the comment

Yes its convincing, thanks for the clarification.

For subgraphs with big gaps in between writes, we tried to do rollups for times where we already know that we don't have changes.

The two names are aliases, but we will use them in aggregations; to make notation more consistent, we'll use int4 and int8

lutter mentioned this pull request Feb 14, 2024

Support cumulative aggregations #5209

Merged

incrypto32 approved these changes Feb 16, 2024

View reviewed changes

lutter added 5 commits February 19, 2024 11:17

graph, store: Improve performance for sparse subgraphs

e6b98b3

For subgraphs with big gaps in between writes, we tried to do rollups for times where we already know that we don't have changes.

store: Define first/last functions that do not depend on sort order

8d76a1b

store: Change SQL name for 'integer' to 'int4'

a1ad90a

The two names are aliases, but we will use them in aggregations; to make notation more consistent, we'll use int4 and int8

store: Use arg_min/arg_max to implement first/last aggregates

1da1159

all: Require timeseries id to be Int8 and autogenerate them

ccb992e

lutter force-pushed the lutter/agg-minmax branch from a2a0243 to ccb992e Compare February 19, 2024 19:19

lutter merged commit ccb992e into master Feb 19, 2024
7 checks passed

lutter deleted the lutter/agg-minmax branch February 19, 2024 19:42

lutter restored the lutter/agg-minmax branch February 19, 2024 20:16

lutter deleted the lutter/agg-minmax branch February 19, 2024 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix first/last aggregation functions #5208

Fix first/last aggregation functions #5208

lutter commented Feb 14, 2024 •

edited

Loading

incrypto32 Feb 16, 2024

lutter Feb 16, 2024

lutter Feb 16, 2024

incrypto32 Feb 19, 2024

Fix first/last aggregation functions #5208

Fix first/last aggregation functions #5208

Conversation

lutter commented Feb 14, 2024 • edited Loading

incrypto32 Feb 16, 2024

Choose a reason for hiding this comment

lutter Feb 16, 2024

Choose a reason for hiding this comment

lutter Feb 16, 2024

Choose a reason for hiding this comment

incrypto32 Feb 19, 2024

Choose a reason for hiding this comment

lutter commented Feb 14, 2024 •

edited

Loading