-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable now() usage in plan-time chunk exclusion #4340
Conversation
src/planner/planner.c
Outdated
FromExpr *from = castNode(FromExpr, node); | ||
if (from->quals) | ||
{ | ||
from->quals = ts_constify_now(context->root, from->quals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a more generic variant here: https://github.com/timescale/timescaledb/blob/main/tsl/src/fdw/scan_plan.c#L552-L563
We use it to evaluate stable constraints at the access node before sending them to the data nodes. Can we just use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code evaluates many timestamp-related stable functions, not just now()
: https://github.com/timescale/timescaledb/blob/main/tsl/src/fdw/deparse.c#L118-L180
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, this is for local hypertables, right? This means we don't need a whitelist and can evaluate all stable functions before performing constraint exclusion. Although I'm not sure if preprocess_queries
is the right place to do this, because this happens at planning time, and the stable functions should be evaluated at execution time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a draft so i'll expand the commit message but you cannot just constify now() like that at plan time that would not be safe. What this patch does is transform column > now() - '5m'
into column > '2022-05-16 14:49' AND column > now() - '5min'
. This allows the constified value to be used during plan time chunk exclusion while still returning correct results even for prepared statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice idea!
0dc975a
to
ca34508
Compare
Codecov Report
@@ Coverage Diff @@
## main #4340 +/- ##
==========================================
+ Coverage 90.75% 90.79% +0.03%
==========================================
Files 216 217 +1
Lines 39996 40040 +44
==========================================
+ Hits 36300 36355 +55
+ Misses 3696 3685 -11
Continue to review full report at Codecov.
|
b04019f
to
0aa4b11
Compare
-- LICENSE-TIMESCALE for a copy of the license. | ||
SET timescaledb.enable_chunk_append TO false; | ||
SET timescaledb.enable_constraint_aware_append TO false; | ||
SET timescaledb.current_timestamp_mock TO '1990-01-01'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have the tsl_override_current_timestamptz
thing, need to merge them sometime I guess...
1ed85f5
to
6ef5869
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, gives some impressive speedup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
---------------------------------------------------------------------------------------------------------------- | ||
Append | ||
-> Index Only Scan using _hyper_X_X_chunk_metrics_time_idx on _hyper_X_X_chunk | ||
Index Cond: (("time" > now()) AND ("time" > 'Mon Jan 01 00:00:00 1990 PST'::timestamp with time zone)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to change the query plan and show the constant expression in the explain to enable this functionality? I worry that it can be confusing?
I am thinking it is enough to add the constant expression to the quals we use to scan for chunks during chunk exclusion. That would make the functionality transparent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you are referring to but at the level the optimization is happening there are no scans yet. We need this all to happen way before anything scan related is there because we want to get rid of the planning overhead of excluded chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i think I understand what you mean but that would limit the usefulness of the optimization. This optimization works with both our own hypertable expansion and postgres inheritance expansion. That means it applies to SELECT/UPDATE/DELETE. If we would do this only in our own hypertable expansion it would no longer work for UPDATE/DELETE
@@ -0,0 +1,87 @@ | |||
-- This file and its contents are licensed under the Timescale License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected tests with prepared statements combined with new data that would affect the plan given that it is important that this optimization works correctly with prepared statements, However, I can't find any such tests.
This implements an optimization to allow now() expression to be used during plan time chunk exclusions. Since now() is stable it would not normally be considered for plan time chunk exclusion. To enable this behaviour we convert `column > now()` expressions into `column > const AND column > now()`. Assuming that time always moves forward this is safe even for prepared statements. This optimization works for SELECT, UPDATE and DELETE. On hypertables with many chunks this can lead to a considerable speedup for certain queries. The following expressions are supported: - column > now() - column >= now() - column > now() - Interval - column > now() + Interval - column >= now() - Interval - column >= now() + Interval Interval must not have a day or month component as those depend on timezone settings. Some microbenchmark to show the improvements, I did best of five for all of the queries. -- hypertable with 1k chunks -- with optimization select * from metrics1k where time > now() - '5m'::interval; Time: 3.090 ms -- without optimization select * from metrics1k where time > now() - '5m'::interval; Time: 145.640 ms -- hypertable with 5k chunks -- with optimization select * from metrics5k where time > now() - '5m'::interval; Time: 4.317 ms -- without optimization select * from metrics5k where time > now() - '5m'::interval; Time: 775.259 ms -- hypertable with 10k chunks -- with optimization select * from metrics10k where time > now() - '5m'::interval; Time: 4.853 ms -- without optimization select * from metrics10k where time > now() - '5m'::interval; Time: 1766.319 ms (00:01.766) -- hypertable with 20k chunks -- with optimization select * from metrics20k where time > now() - '5m'::interval; Time: 6.141 ms -- without optimization select * from metrics20k where time > now() - '5m'::interval; Time: 3321.968 ms (00:03.322) Speedup with 1k chunks: 47x Speedup with 5k chunks: 179x Speedup with 10k chunks: 363x Speedup with 20k chunks: 540x
This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * timescale#4045 Custom origin's support in CAGGs * timescale#4120 Add logging for retention policy * timescale#4158 Allow ANALYZE command on a data node directly * timescale#4169 Add support for chunk exclusion on DELETE to PG14 * timescale#4209 Add support for chunk exclusion on UPDATE to PG14 * timescale#4269 Continuous Aggregates finals form * timescale#4301 Add support for bulk inserts in COPY operator * timescale#4311 Support non-superuser move chunk operations * timescale#4330 Add GUC "bgw_launcher_poll_time" * timescale#4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * timescale#3899 Fix segfault in Continuous Aggregates * timescale#4225 Fix TRUNCATE error as non-owner on hypertable * timescale#4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * timescale#4249 Fix option "timescaledb.create_group_indexes" * timescale#4251 Fix INSERT into compressed chunks with dropped columns * timescale#4255 Fix option "timescaledb.create_group_indexes" * timescale#4259 Fix logic bug in extension update script * timescale#4269 Fix bad Continuous Aggregate view definition reported in timescale#4233 * timescale#4289 Support moving compressed chunks between data nodes * timescale#4300 Fix refresh window cap for cagg refresh policy * timescale#4315 Fix memory leak in scheduler * timescale#4323 Remove printouts from signal handlers * timescale#4342 Fix move chunk cleanup logic * timescale#4349 Fix crashes in functions using AlterTableInternal * timescale#4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * #4045 Custom origin's support in CAGGs * #4120 Add logging for retention policy * #4158 Allow ANALYZE command on a data node directly * #4169 Add support for chunk exclusion on DELETE to PG14 * #4209 Add support for chunk exclusion on UPDATE to PG14 * #4269 Continuous Aggregates finals form * #4301 Add support for bulk inserts in COPY operator * #4311 Support non-superuser move chunk operations * #4330 Add GUC "bgw_launcher_poll_time" * #4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * #3899 Fix segfault in Continuous Aggregates * #4225 Fix TRUNCATE error as non-owner on hypertable * #4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * #4249 Fix option "timescaledb.create_group_indexes" * #4251 Fix INSERT into compressed chunks with dropped columns * #4255 Fix option "timescaledb.create_group_indexes" * #4259 Fix logic bug in extension update script * #4269 Fix bad Continuous Aggregate view definition reported in #4233 * #4289 Support moving compressed chunks between data nodes * #4300 Fix refresh window cap for cagg refresh policy * #4315 Fix memory leak in scheduler * #4323 Remove printouts from signal handlers * #4342 Fix move chunk cleanup logic * #4349 Fix crashes in functions using AlterTableInternal * #4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * timescale#4045 Custom origin's support in CAGGs * timescale#4120 Add logging for retention policy * timescale#4158 Allow ANALYZE command on a data node directly * timescale#4169 Add support for chunk exclusion on DELETE to PG14 * timescale#4209 Add support for chunk exclusion on UPDATE to PG14 * timescale#4269 Continuous Aggregates finals form * timescale#4301 Add support for bulk inserts in COPY operator * timescale#4311 Support non-superuser move chunk operations * timescale#4330 Add GUC "bgw_launcher_poll_time" * timescale#4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * timescale#3899 Fix segfault in Continuous Aggregates * timescale#4225 Fix TRUNCATE error as non-owner on hypertable * timescale#4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * timescale#4249 Fix option "timescaledb.create_group_indexes" * timescale#4251 Fix INSERT into compressed chunks with dropped columns * timescale#4255 Fix option "timescaledb.create_group_indexes" * timescale#4259 Fix logic bug in extension update script * timescale#4269 Fix bad Continuous Aggregate view definition reported in timescale#4233 * timescale#4289 Support moving compressed chunks between data nodes * timescale#4300 Fix refresh window cap for cagg refresh policy * timescale#4315 Fix memory leak in scheduler * timescale#4323 Remove printouts from signal handlers * timescale#4342 Fix move chunk cleanup logic * timescale#4349 Fix crashes in functions using AlterTableInternal * timescale#4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * #4045 Custom origin's support in CAGGs * #4120 Add logging for retention policy * #4158 Allow ANALYZE command on a data node directly * #4169 Add support for chunk exclusion on DELETE to PG14 * #4209 Add support for chunk exclusion on UPDATE to PG14 * #4269 Continuous Aggregates finals form * #4301 Add support for bulk inserts in COPY operator * #4311 Support non-superuser move chunk operations * #4330 Add GUC "bgw_launcher_poll_time" * #4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * #3899 Fix segfault in Continuous Aggregates * #4225 Fix TRUNCATE error as non-owner on hypertable * #4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * #4249 Fix option "timescaledb.create_group_indexes" * #4251 Fix INSERT into compressed chunks with dropped columns * #4255 Fix option "timescaledb.create_group_indexes" * #4259 Fix logic bug in extension update script * #4269 Fix bad Continuous Aggregate view definition reported in #4233 * #4289 Support moving compressed chunks between data nodes * #4300 Fix refresh window cap for cagg refresh policy * #4315 Fix memory leak in scheduler * #4323 Remove printouts from signal handlers * #4342 Fix move chunk cleanup logic * #4349 Fix crashes in functions using AlterTableInternal * #4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
This implements an optimization to allow now() expression to be
used during plan time chunk exclusions. Since now() is stable it
would not normally be considered for plan time chunk exclusion.
To enable this behaviour we convert
column > now()
expressionsinto
column > const AND column > now()
. Assuming that timealways moves forward this is save even for prepared statements.
This optimization works for SELECT, UPDATE and DELETE.
On hypertables with many chunks this can lead to a considerable
speedup for certain queries.
The following expressions are supported:
Interval must not have a day or month component as those depend
on timezone settings.
Some microbenchmark to show the improvements, I did best of five
for all of the queries.
-- hypertable with 1k chunks
-- with optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 3.090 ms
-- without optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 145.640 ms
-- hypertable with 5k chunks
-- with optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 4.317 ms
-- without optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 775.259 ms
-- hypertable with 10k chunks
-- with optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 4.853 ms
-- without optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 1766.319 ms (00:01.766)
-- hypertable with 20k chunks
-- with optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 6.141 ms
-- without optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 3321.968 ms (00:03.322)
Speedup with 1k chunks: 47x
Speedup with 5k chunks: 179x
Speedup with 10k chunks: 363x
Speedup with 20k chunks: 540x