-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for JOINing time series via "AS OF" time #271
Comments
Definitely an analytical function on our short list. Thanks for the request. (And if others would similar like this, please upvote!) |
Hi, is there an update on support for an As Of Join. I am looking to test out various Timeseries solutions and AsOf is something we would really like to support. |
Hi, any plan for this feature? Thanks! |
I agree. Having a performant global "AS OF" on a traditional RDMS like postgres would be a holy grail. Maintaining history today with postgres is, hmm, painful. This alone has made me seriously consider using Datomic (https://docs.datomic.com/on-prem/clojure/index.html#datomic.api/as-of). A dream query for me would be writing:
and getting the state of the world from that datetime. I realise this is moving from time series and into another territory which might not be a good fit for timescaledb. But a man can hope :) |
Any updates or best practices? |
This seems to be a possibility? |
This is the best I've managed to get so far:
Still not very performant though, takes >1s on two tables of ~200k rows (pandas can do it >50x faster). |
Are there any plans to add this? Would be a really big deal for financial applications, among many others |
+1 Any update here? Is there an ETA for as-of join support? |
Very under the (my) radar, mariaDB suddenly has very cool looking support for AS OF time machining: |
@runekaagaard thanks, so no ETA? Just to be crystal clear: We are referring to as-of joins and not AS OF for data versioning. Good example for AS OF JOINS is here: https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html --> see bottom of that page which illustrates how to join market trades & quotes The kdb+ equivalent command is "aj", see https://code.kx.com/q/ref/aj/ If I misunderstood the purpose of this issue here, let me know and I'm happy to open another one |
The addition of as-of joins within Timescale would be a huge benefit to us. We have lots of high frequency time series datasets from environmental sensors that have slightly differing timestamps (at the ms scale) and occasional gaps. Our requirements are pretty much met by the functionality described in the pandas.merge_asof documentation but it would be great if we could do the same sort of thing at the database level. A possible addition would be to allow for aligning the sensor time series datasets against a strict timeline (e.g. every exact second, minute etc.). |
The simple way to do this is to write the following sort of join: CREATE TABLE foo ( time timestamptz, id int, val double precision);
CREATE INDEX on foo(id, time DESC);
SELECT t1.time, t1.value as t1_val, t2.value as t2_val
FROM foo AS t1,
LATERAL (
SELECT value
FROM foo t2
WHERE t2.id = 2 AND t2.time <= t1.time
ORDER BY t2.time DESC
LIMIT 1
) t2
WHERE t1.id = 1
ORDER BY t1.time;
You will definitely want the index on In cases where the tables are separate it will be slightly different, simpler in some ways as it involves less aliasing: CREATE TABLE foo1 ( time timestamptz, val double precision);
CREATE TABLE foo2( time timestamptz, val double precision)
CREATE INDEX on foo1(time DESC);
CREATE INDEX ON foo2 (time DESC) INCLUDE (value);
SELECT foo1.time, foo1.value as foo1_val, foo2.value as foo2_val
FROM foo1,
LATERAL (
SELECT foo2.value
FROM foo2
WHERE foo2.time <= foo1.time
ORDER BY foo2.time DESC
LIMIT 1
) foo2
ORDER BY foo1.time;
This may not always be the most efficient, but it should work reasonably well in smallish cases. We'll also work on some ways of doing this with the timeseries API as discussed: timescale/timescaledb-toolkit#162 that may be more efficient in some cases. |
note: the LATERAL query above benefits significantly from being able to perform an index-only scan on CREATE INDEX ON foo2 (time DESC) INCLUDE (value); |
Thank you @davidkohn88 this is a fantastic solution suggestion. 👍 |
I am also very interested in this "as_of" join, what would be the best solution for now? |
The joins above ( #271 (comment)) are reasonable solutions for now and should be reasonably performant, depending on what you're doing, we're also thinking about adding some more functionality around this in the toolkit, but it's probably a bit of a ways off, but you can add comments in this issue for now, and maybe explain more what exactly you're trying to achieve and what you think we should prioritize: timescale/timescaledb-toolkit#162 |
With LATERAL join on timestamp inequality, I guess the right side is going to do index lookups, but it's still O(left_rows * log(right_rows)). A merge-join-like algorithm will be O(left_rows + right_rows). An older attempt of implementing this in vanilla postgres: https://www.postgresql.org/message-id/flat/bc494762-26bd-b100-e1f9-a97901ddad57%40postgrespro.ru For the reference, ClickHouse uses special grammar for this: https://clickhouse.com/docs/en/sql-reference/statements/select/join/#asof-join-usage |
The different grammar is probably required because the semantics is different from the normal join on inequality condition -- we only have to return the closest righthand row that matches. In an extension, we can't introduce new grammar (or can we?), so we can consider using a special dummy function for the join, e.g. I had an old patch that extended the merge join executor to support full join on inequality: https://www.postgresql.org/message-id/flat/b31e1a2d-5ed2-cbca-649e-136f1a7c4c31@postgrespro.ru |
Wrote a design memo for internal use: https://docs.google.com/document/d/1YEX038V-gq-iLfM-KpLV0zpNn3yW88I9qPVa0_bs4aQ/edit |
Is there an agreed way forward on this issue even if addressing it isn't imminent? A related timescaledb-toolkit issue was closed last year pending work on 'multi-value timeseries' but it is not clear what this functionality refers to. The latest posts in this issue suggest the asof-join functionality could be addressed within Postgres itself but it is difficult to know from the discussion whether this is likely to happen in the near-future. |
271: Lambda fixes r=WireBaron a=JLockerman Two minor fixes to lambdas: 1. Allow newlines within lambdas. 2. Allow the parsing of multiple successive `let`s instead of stopping at the first. Altogether this allows the parsing of expressions such as ```SQL let $foo = -2; let $bar = $foo * $foo; $bar * $bar ``` Co-authored-by: Joshua Lockerman <josh@timescale.com>
We're planning to try and prototype this inside the TimescaleDB extension in the Q4 of 2022, using the no. 3 merge-hash algorithm I posted upstream. That's just a research prototype, can't promise when and if we will release something that is actually usable. |
We have found the “as-of” join to be essential in our analyses involving high frequency financial data (market data, trade data etc.). The approach we present in the link below seems to be very fast and execution time appears to scale linearly with row count (n), unlike lateral join type approaches that typically scale as O(n^2). The SQL query we link to could perhaps be further optimized but we think that this is a promising approach in general. https://gist.github.com/RMB-eQuant/758539f8914f2dd4461ec0ce144b048b The table below compares the execution time of our approach (called “UNION ALL ALGO”) to the lateral join approach presented in the post #271 (comment) (“LATERAL JOIN”). Execution time is in seconds and the benchmarks were run on a Timescale-pro-100-16gb-2cpu-compute-optimized (2 CPU, 16 GB RAM, 100 GB storage) instance. In the table below, NaN values correspond to runs that were too slow to complete. |
For time series data, a powerful function is asof pioneered in kx/kdb, also being implemented by Pandas:
http://pandas.pydata.org/pandas-docs/version/0.19.0/whatsnew.html#whatsnew-0190-enhancements-asof-merge
It would be really cool if Timescaledb could support it natively instead of having write 3 tricky queries to join two time series tables.
The text was updated successfully, but these errors were encountered: