Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous aggregation calculation error #7019

Open
nieyuan1980 opened this issue Jun 9, 2024 · 2 comments
Open

Continuous aggregation calculation error #7019

nieyuan1980 opened this issue Jun 9, 2024 · 2 comments

Comments

@nieyuan1980
Copy link

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Continuous aggregate, Query executor

What happened?

There is a transformer and every 15 minutes the collector automatically records its readings.
Its readings are downsampled to the maximum hourly, daily, monthly, and yearly, respectively.
But it has an incredible number of data on continuous aggregation.

Step 1, the original table

select time, epad0cc
from transformer_meter2
where meter_code = 'METXJNG1BYQA4'
order by time desc limit 5;

source
We see that the current maximum readings are: 6339985.166

STEP 2, CREATE HOURLY DOWNSAMPLING (HERE I USED THE TIMESCALEDB.MATERIALIZED_ONLY = FALSE PARAMETER)

CREATE MATERIALIZED VIEW TRANSFORMER_DOWNSAMPLING_HOUR
    WITH (TIMESCALEDB.CONTINUOUS,
        TIMESCALEDB.MATERIALIZED_ONLY = FALSE) AS
SELECT TIME_BUCKET('1 hour'::INTERVAL, "time") AS RECODE_TIME,
       METER_CODE,
       MAX(EPAD0CC)                            AS MAX_READING
FROM TRANSFORMER_METER2
GROUP BY TIME_BUCKET('1 hour'::INTERVAL, "time"), METER_CODE;

2
We see that its data is correct:6339985.166

Step 3, create a built-in daily downsampling (create a continuous aggregate on top of another continuous aggregate)

CREATE MATERIALIZED VIEW TRANSFORMER_DOWNSAMPLING_DAY
    WITH (TIMESCALEDB.CONTINUOUS,
        TIMESCALEDB.MATERIALIZED_ONLY = FALSE) AS
SELECT TIME_BUCKET('1 day'::INTERVAL, "recode_time") AS RECODE_TIME,
       METER_CODE,
       MAX(MAX_READING)                              AS MAX_READING
FROM TRANSFORMER_DOWNSAMPLING_HOUR
GROUP BY TIME_BUCKET('1 day'::INTERVAL, "recode_time"), METER_CODE;

3

Its data is also correct:6339985.166

Step 3, Create a Built-in Monthly Downsampling (Create a Continuous Aggregate on Top of Another Continuous Aggregate)

CREATE MATERIALIZED VIEW TRANSFORMER_DOWNSAMPLING_MTH
    WITH (TIMESCALEDB.CONTINUOUS,
        TIMESCALEDB.MATERIALIZED_ONLY = FALSE) AS
SELECT TIME_BUCKET('1 month'::INTERVAL, "recode_time") AS RECODE_TIME,
       METER_CODE,
       MAX(MAX_READING)                                AS MAX_READING
FROM TRANSFORMER_DOWNSAMPLING_DAY
GROUP BY TIME_BUCKET('1 month'::INTERVAL, "recode_time"), METER_CODE;

4

The data here confuses me:6329992.613。 I think it should be:6339985.166

Excuse me, what is the reason for this? How should it be resolved?

TimescaleDB version affected

2.15.0

PostgreSQL version used

PostgreSQL 16.3

What operating system did you use?

(Ubuntu 16.3-1.pgdg22.04+1

What installation method did you use?

Deb/Apt

What platform did you run on?

Other

Relevant log output and stack trace

No response

How can we reproduce the bug?

Here's the script:

-- hour
CREATE MATERIALIZED VIEW TRANSFORMER_DOWNSAMPLING_HOUR
    WITH (TIMESCALEDB.CONTINUOUS,
        TIMESCALEDB.MATERIALIZED_ONLY = FALSE) AS
SELECT TIME_BUCKET('1 hour'::INTERVAL, "time") AS RECODE_TIME,
       METER_CODE,
       MAX(EPAD0CC)                            AS MAX_READING
FROM TRANSFORMER_METER2
GROUP BY TIME_BUCKET('1 hour'::INTERVAL, "time"), METER_CODE;

-- day
CREATE MATERIALIZED VIEW TRANSFORMER_DOWNSAMPLING_DAY
    WITH (TIMESCALEDB.CONTINUOUS,
        TIMESCALEDB.MATERIALIZED_ONLY = FALSE) AS
SELECT TIME_BUCKET('1 day'::INTERVAL, "recode_time") AS RECODE_TIME,
       METER_CODE,
       MAX(MAX_READING)                              AS MAX_READING
FROM TRANSFORMER_DOWNSAMPLING_HOUR
GROUP BY TIME_BUCKET('1 day'::INTERVAL, "recode_time"), METER_CODE;

-- month
CREATE MATERIALIZED VIEW TRANSFORMER_DOWNSAMPLING_MTH
    WITH (TIMESCALEDB.CONTINUOUS,
        TIMESCALEDB.MATERIALIZED_ONLY = FALSE) AS
SELECT TIME_BUCKET('1 month'::INTERVAL, "recode_time") AS RECODE_TIME,
       METER_CODE,
       MAX(MAX_READING)                                AS MAX_READING
FROM TRANSFORMER_DOWNSAMPLING_DAY
GROUP BY TIME_BUCKET('1 month'::INTERVAL, "recode_time"), METER_CODE;
@akuzm
Copy link
Member

akuzm commented Jul 19, 2024

Made a simplified repro for this:

drop materialized view if exists metrics_weekly;
drop materialized view if exists metrics_daily;
drop materialized view if exists metrics_hourly;
drop table if exists metrics;

create table metrics(time timestamptz, value float);

select create_hypertable('metrics', 'time');

create materialized view metrics_hourly with (timescaledb.continuous, timescaledb.materialized_only = false) as select time_bucket('1 hour', time) time, max(value) value from metrics        group by 1;
create materialized view metrics_daily  with (timescaledb.continuous, timescaledb.materialized_only = false) as select time_bucket('1 day',  time) time, max(value) value from metrics_hourly group by 1;
create materialized view metrics_weekly with (timescaledb.continuous, timescaledb.materialized_only = false) as select time_bucket('1 week', time) time, max(value) value from metrics_daily  group by 1;

insert into metrics values ('2021-01-01', 1);
call refresh_continuous_aggregate('metrics_hourly', null, null);
call refresh_continuous_aggregate('metrics_daily', null, null);
call refresh_continuous_aggregate('metrics_weekly', null, null);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;

insert into metrics values ('2021-01-02', 2);
call refresh_continuous_aggregate('metrics_hourly', null, null);
call refresh_continuous_aggregate('metrics_daily', null, null);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;

insert into metrics values ('2021-01-03', 3);
call refresh_continuous_aggregate('metrics_hourly', null, null);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;

insert into metrics values ('2021-01-04', 4);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;

The third-level real-time aggregate (weekly) gives the wrong result when it's not updated explicitly.

@fabriziomello
Copy link
Contributor

@nieyuan1980 @akuzm actually this is not a bug and instead a limitation of the current Hierarchical Continuous Aggregate architecture. You need to always refresh all the caggs in the chain in order to properly move the watermark to get the correct results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants