Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: join with multiple equality conditions #7107

Open
fordfrog opened this issue Jul 5, 2024 · 6 comments
Open

[Feature]: join with multiple equality conditions #7107

fordfrog opened this issue Jul 5, 2024 · 6 comments
Labels
continuous_aggregate enhancement An enhancement to an existing feature for functionality

Comments

@fordfrog
Copy link

fordfrog commented Jul 5, 2024

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Continuous aggregate

What happened?

i'm trying to create a 2-hour continuous aggregate from 1-hour continuous aggregate. the culprit is that this is a market data and i want each time bucket start at the market session start (rounded down to hours), so for each asset the origin might be different, and it also differs because of dst (my data is timestamp without time zone, but in fact utc).

create materialized view market_bars_h2
with (timescaledb.continuous) as
SELECT t.ticker,
	time_bucket('2 hours', t.bar_start, ms.first_bar_start) as bar_start,
        sum(ticks) AS ticks,
        ...
FROM market_bars_h1 t
JOIN market_session_bar_h1_starts ms ON t.ticker = ms.ticker and t.bar_start = ms.bar_start
GROUP BY t.ticker, 2
ORDER BY t.ticker, 2
with no data;

i tried with market_session_bar_h1_starts being a view and now it is a table, but i still get the same error:

ERROR:  invalid continuous aggregate view
DETAIL:  Unsupported expression in join clause.
HINT:  Only equality conditions are supported in continuous aggregates.

if i run just the select part, the query works fine and returns results. i'm atm on version 2.15.3 and postgresql 16.3.
here is a sample content of the joined table:

# select * from market_session_bar_h1_starts where ticker = 'EUR.USD@IDEALPRO' order by bar_start limit 5;
      ticker      |      bar_start      |   first_bar_start   
------------------+---------------------+---------------------
 EUR.USD@IDEALPRO | 2023-02-19 22:00:00 | 2023-02-16 22:00:00
 EUR.USD@IDEALPRO | 2023-02-19 23:00:00 | 2023-02-19 22:00:00
 EUR.USD@IDEALPRO | 2023-02-20 00:00:00 | 2023-02-19 22:00:00
 EUR.USD@IDEALPRO | 2023-02-20 01:00:00 | 2023-02-19 22:00:00
 EUR.USD@IDEALPRO | 2023-02-20 02:00:00 | 2023-02-19 22:00:00
(5 rows)

i guess this is the related code, but nothing that would be clear to me. the else branch is the one that is triggered imo.

just some more info to clarify it. the joined table contains bar_start for each hour that appears in the market_bars_h1 so this should really be an equality join (and it works when i just run the select, only creating the continous view fails). from market_session_bar_h1_starts it can be seen that there are several different times but for all of them the first_bar_start is the same (because they are in the same trading session).

TimescaleDB version affected

2.15.3

PostgreSQL version used

16.3

What operating system did you use?

gentoo linux

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

it is described above how to trigger the issue.
@fordfrog fordfrog added the bug label Jul 5, 2024
@fordfrog
Copy link
Author

fordfrog commented Jul 7, 2024

ok, now i see that the documentation shows this example as an invalid join:

CREATE MATERIALIZED VIEW my_view WITH (timescaledb.continuous) AS
SELECT ...
FROM table_1 t1
JOIN table_2 t2 ON t1.t2_id = t2.id AND t1.t2_id_2 = t2.id
GROUP BY ...

which is similar to what i'm trying to use. in the code the condition fails on IsA(op, OpExpr):

        /* Only equality conditions are permitted on joins. */
        if (op && IsA(op, OpExpr) &&
            list_length(castNode(OpExpr, op)->args) == CONTINUOUS_AGG_MAX_JOIN_RELATIONS)
        {
            Oid left_type = exprType(linitial(op->args));
            Oid right_type = exprType(lsecond(op->args));
            if (!ts_is_equality_operator(op->opno, left_type, right_type))
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("invalid continuous aggregate view"),
                         errdetail(
                             "Only equality conditions are supported in continuous aggregates.")));
        }
        else
            ereport(ERROR,
                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                     errmsg("invalid continuous aggregate view"),
                     errdetail("Unsupported expression in join clause."),
                     errhint("Only equality conditions are supported in continuous aggregates.")));

because in this case op is *op:{xpr = {type = T_BoolExpr}, opno = 0, opfuncid = 4211802456, opresulttype = 21972, opretset = 7, opcollid = 0, inputcollid = 0, args = 0x14000000033, location = 1}, so T_BoolExpr and not T_OpExpr.

@fordfrog fordfrog changed the title [Bug]: cannot create join with equality conditions [Feature Request]: join with multiple equality conditions Jul 7, 2024
@fordfrog fordfrog changed the title [Feature Request]: join with multiple equality conditions [Feature]: join with multiple equality conditions Jul 7, 2024
@fordfrog
Copy link
Author

fordfrog commented Jul 7, 2024

pr #7032 lets my join expression pass, but another issue arises:

ERROR:  only immutable expressions allowed in time bucket function
HINT:  Use an immutable expression as third argument to the time bucket function.

this prevents me (at least i think so) from setting the bar start based on the market session opening hour.

@fabriziomello fabriziomello added enhancement An enhancement to an existing feature for functionality and removed bug labels Jul 8, 2024
@fabriziomello
Copy link
Contributor

@fordfrog can u please try with this PR: #7111

If it don't work can u please copy and paste here the complete example (all SQL steps to reproduce the error).

@fordfrog
Copy link
Author

fordfrog commented Jul 9, 2024

@fabriziomello the issue with the multiple conditions is not triggered with the patch anymore, but i get stuck on the time_bucket() restriction.

here's my test case:

CREATE TABLE market_bars_m30 (
    ticker text NOT NULL,
    bar_start TIMESTAMP NOT NULL,
    ticks integer NOT NULL
);

SELECT create_hypertable('market_bars_m30', 'bar_start');

INSERT INTO market_bars_m30 VALUES ('EUR.USD@IDEALPRO', '2024-07-01 00:00:00', 10);
INSERT INTO market_bars_m30 VALUES ('EUR.USD@IDEALPRO', '2024-07-01 00:30:00', 15);
INSERT INTO market_bars_m30 VALUES ('EUR.USD@IDEALPRO', '2024-07-01 01:00:00', 20);
INSERT INTO market_bars_m30 VALUES ('EUR.USD@IDEALPRO', '2024-07-01 01:30:00', 25);

CREATE TABLE market_session_bar_h1_starts
    (ticker text NOT NULL, bar_start TIMESTAMP NOT NULL, first_bar_start TIMESTAMP NOT NULL);

INSERT INTO market_session_bar_h1_starts
    VALUES('EUR.USD@IDEALPRO', '2024-07-01 00:00:00', '2024-07-01 00:00:00');
INSERT INTO market_session_bar_h1_starts
    VALUES('EUR.USD@IDEALPRO', '2024-07-01 01:00:00', '2024-07-01 00:00:00');

CREATE MATERIALIZED VIEW market_bars_h1 WITH (timescaledb.continuous) AS
SELECT ticker,
    time_bucket('01:00:00'::interval, bar_start) AS bar_start,
    sum(ticks) AS ticks
FROM market_bars_m30
GROUP BY ticker, (time_bucket('01:00:00'::interval, bar_start))
ORDER BY ticker, (time_bucket('01:00:00'::interval, bar_start));

i hope i didn't forget any command. it simply creates normal table with 30-minute data, turns it into a hypertable, fills in some data, it also creates table with market session opens, and creates simple continuous aggregate. this part works fine. then i try to run this command:

# CREATE MATERIALIZED VIEW market_bars_h2 WITH (timescaledb.continuous) AS
SELECT t.ticker, time_bucket('2 hours', t.bar_start, ms.first_bar_start) AS bar_start, sum(ticks) AS ticks
FROM market_bars_h1 t JOIN market_session_bar_h1_starts ms
    ON t.ticker = ms.ticker AND t.bar_start = ms.bar_start
GROUP BY 1, 2
ORDER BY 1, 2;
ERROR:  only immutable expressions allowed in time bucket function
HINT:  Use an immutable expression as third argument to the time bucket function.

before it failed on the JOIN conditions. now that is not triggered anymore and i get instead this complain about the third parameter to time_bucket(INTERVAL, TIMESTAMP, TIMESTAMP) not being constant.

@fabriziomello
Copy link
Contributor

before it failed on the JOIN conditions. now that is not triggered anymore and i get instead this complain about the third parameter to time_bucket(INTERVAL, TIMESTAMP, TIMESTAMP) not being constant.

This is a know restriction, the time_bucket should be IMMUTABLE and passing the origin as the row from another table make it as STABLE and it is not allowed on continuous aggregate. Only IMMUTABLE buckets are allowed on continuous aggregate.

@fordfrog
Copy link
Author

fordfrog commented Jul 9, 2024

This is a know restriction, the time_bucket should be IMMUTABLE and passing the origin as the row from another table make it as STABLE and it is not allowed on continuous aggregate. Only IMMUTABLE buckets are allowed on continuous aggregate.

that i was afraid of. i just didn't come up with a working solution for that task tbh. i just need each day the first bar start on the hour that market opens... and for different assets it's different, and sometimes the trading hours are also reduced, and the trading hours might change over time. so i doubt it would even work if i just do it per exchange. any suggestions or best practices with regard to this? i've read timescaledb is used by financial companies, so there might be a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
continuous_aggregate enhancement An enhancement to an existing feature for functionality
Projects
None yet
Development

No branches or pull requests

2 participants