-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple AS-OF JOIN #162
Comments
The simple way to do this join is a lateral join as follows. CREATE TABLE foo ( time timestamptz, id int, val double precision);
CREATE INDEX on foo(id, time DESC);
SELECT t1.time, t1.value as t1_val, t2.value as t2_val
FROM foo AS t1,
LATERAL (
SELECT value
FROM foo t2
WHERE t2.id = 2 AND t2.time <= t1.time
ORDER BY t2.time DESC
LIMIT 1
) t2
WHERE t1.id = 1
ORDER BY t1.time; (That's an example where we define the two series by using a where clause on the same table, separate tables is actually a bit easier and involves less aliasing). The index should make this reasonably efficient, however, the merge in memory that we could do with the Timeseries API will probably be far more efficient. |
It turns out that any SQL-generating function runs afoul of pushdown, specifically that the SQL-generation prevents outer WHERE clauses to be pushed into the LATERAL. This harms performance enough that creating such a function is not worth doing. We'll still implement a timeseries version. |
closing this issue for now. We will likely implement a timeseries version of this, but design on that will need to wait until we have multi-value timeseries. A new issue will be opened then. |
What's the functionality you would like to add
Originally requested in timescale/timescaledb#271 an AS-OF JOIN is like a LEFT JOIN except that it's not restricted to exact matches on time; the row selected from the right table is the latest time that matches the time value of the left row. While a proper version of this functionality requires planner support, and thus would have to be in the core extension, we believe we may be able to create a procedure that generates SQL approximating the join for a 80% solution.
How would the function be used
When analyzing data that is recorded at the same logical time but not the same physical time.
Why should this feature be added?
It simplifies a common class of queries that are difficult to write in plain SQL. It is also one of the most commonly requested features.
What scale is this useful at?
Small scale and up
Drawbacks
If we do add a more principled version, we would likely want to deprecate this version.
Open Questions
We will want to add a version of this feature to the timeseries API. Does that suffice?
Alternatives
Just add a timeseries-API version and wait for the core extension to be ready for a SQL-native version.
The text was updated successfully, but these errors were encountered: