Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple AS-OF JOIN #162

Closed
JLockerman opened this issue Jun 21, 2021 · 3 comments
Closed

Simple AS-OF JOIN #162

JLockerman opened this issue Jun 21, 2021 · 3 comments
Labels
proposed-feature A proposed feature or function to ease a task

Comments

@JLockerman
Copy link
Contributor

What's the functionality you would like to add

Originally requested in timescale/timescaledb#271 an AS-OF JOIN is like a LEFT JOIN except that it's not restricted to exact matches on time; the row selected from the right table is the latest time that matches the time value of the left row. While a proper version of this functionality requires planner support, and thus would have to be in the core extension, we believe we may be able to create a procedure that generates SQL approximating the join for a 80% solution.

How would the function be used

When analyzing data that is recorded at the same logical time but not the same physical time.

Why should this feature be added?

It simplifies a common class of queries that are difficult to write in plain SQL. It is also one of the most commonly requested features.

What scale is this useful at?

Small scale and up

Drawbacks

If we do add a more principled version, we would likely want to deprecate this version.

Open Questions

We will want to add a version of this feature to the timeseries API. Does that suffice?

Alternatives

Just add a timeseries-API version and wait for the core extension to be ready for a SQL-native version.

@JLockerman JLockerman added the proposed-feature A proposed feature or function to ease a task label Jun 21, 2021
@davidkohn88
Copy link
Contributor

davidkohn88 commented Jun 21, 2021

The simple way to do this join is a lateral join as follows.

CREATE TABLE foo ( time timestamptz, id int, val double precision);

CREATE INDEX on foo(id, time DESC);

SELECT t1.time, t1.value as t1_val, t2.value as t2_val
FROM foo AS t1, 
LATERAL (
  SELECT value
  FROM foo t2
  WHERE t2.id = 2 AND t2.time <= t1.time
  ORDER BY t2.time DESC
  LIMIT 1
) t2
WHERE t1.id = 1
ORDER BY t1.time;

(That's an example where we define the two series by using a where clause on the same table, separate tables is actually a bit easier and involves less aliasing).

The index should make this reasonably efficient, however, the merge in memory that we could do with the Timeseries API will probably be far more efficient.

@JLockerman
Copy link
Contributor Author

It turns out that any SQL-generating function runs afoul of pushdown, specifically that the SQL-generation prevents outer WHERE clauses to be pushed into the LATERAL. This harms performance enough that creating such a function is not worth doing. We'll still implement a timeseries version.

@JLockerman
Copy link
Contributor Author

closing this issue for now. We will likely implement a timeseries version of this, but design on that will need to wait until we have multi-value timeseries. A new issue will be opened then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposed-feature A proposed feature or function to ease a task
Projects
None yet
Development

No branches or pull requests

2 participants