Enable now() usage in plan-time chunk exclusion #4340

svenklemm · 2022-05-16T12:22:56Z

This implements an optimization to allow now() expression to be
used during plan time chunk exclusions. Since now() is stable it
would not normally be considered for plan time chunk exclusion.
To enable this behaviour we convert column > now() expressions
into column > const AND column > now(). Assuming that time
always moves forward this is save even for prepared statements.
This optimization works for SELECT, UPDATE and DELETE.
On hypertables with many chunks this can lead to a considerable
speedup for certain queries.

The following expressions are supported:

column > now()
column >= now()
column > now() - Interval
column > now() + Interval
column >= now() - Interval
column >= now() + Interval

Interval must not have a day or month component as those depend
on timezone settings.

Some microbenchmark to show the improvements, I did best of five
for all of the queries.

-- hypertable with 1k chunks
-- with optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 3.090 ms

-- without optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 145.640 ms

-- hypertable with 5k chunks
-- with optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 4.317 ms

-- without optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 775.259 ms

-- hypertable with 10k chunks
-- with optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 4.853 ms

-- without optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 1766.319 ms (00:01.766)

-- hypertable with 20k chunks
-- with optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 6.141 ms

-- without optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 3321.968 ms (00:03.322)

Speedup with 1k chunks: 47x
Speedup with 5k chunks: 179x
Speedup with 10k chunks: 363x
Speedup with 20k chunks: 540x

akuzm · 2022-05-16T12:45:28Z

src/planner/planner.c

+		FromExpr *from = castNode(FromExpr, node);
+		if (from->quals)
+		{
+			from->quals = ts_constify_now(context->root, from->quals);


We already have a more generic variant here: https://github.com/timescale/timescaledb/blob/main/tsl/src/fdw/scan_plan.c#L552-L563
We use it to evaluate stable constraints at the access node before sending them to the data nodes. Can we just use it here?

That code evaluates many timestamp-related stable functions, not just now(): https://github.com/timescale/timescaledb/blob/main/tsl/src/fdw/deparse.c#L118-L180

By the way, this is for local hypertables, right? This means we don't need a whitelist and can evaluate all stable functions before performing constraint exclusion. Although I'm not sure if preprocess_queries is the right place to do this, because this happens at planning time, and the stable functions should be evaluated at execution time.

This is still a draft so i'll expand the commit message but you cannot just constify now() like that at plan time that would not be safe. What this patch does is transform column > now() - '5m' into column > '2022-05-16 14:49' AND column > now() - '5min'. This allows the constified value to be used during plan time chunk exclusion while still returning correct results even for prepared statements.

Oh, nice idea!

codecov · 2022-05-16T19:27:35Z

Codecov Report

Merging #4340 (b8373d3) into main (c6c64c4) will increase coverage by 0.03%.
The diff coverage is 98.19%.

@@            Coverage Diff             @@
##             main    #4340      +/-   ##
==========================================
+ Coverage   90.75%   90.79%   +0.03%     
==========================================
  Files         216      217       +1     
  Lines       39996    40040      +44     
==========================================
+ Hits        36300    36355      +55     
+ Misses       3696     3685      -11

Impacted Files	Coverage Δ
src/compat/compat.h	`94.73% <ø> (ø)`
src/copy.c	`93.79% <ø> (ø)`
src/loader/bgw_launcher.c	`91.82% <ø> (ø)`
src/nodes/chunk_append/chunk_append.c	`97.94% <ø> (ø)`
src/planner/add_hashagg.c	`52.85% <ø> (ø)`
src/planner/agg_bookend.c	`92.30% <ø> (ø)`
src/planner/expand_hypertable.c	`93.94% <ø> (ø)`
src/planner/partialize.c	`97.91% <ø> (ø)`
src/planner/planner.h	`100.00% <ø> (ø)`
tsl/src/fdw/relinfo.c	`96.83% <ø> (-0.02%)`	⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01c6125...b8373d3. Read the comment docs.

src/planner/add_hashagg.c

src/planner/constify_now.c

tsl/test/shared/expected/constify_now.out

akuzm · 2022-05-17T08:45:51Z

tsl/test/shared/expected/constify_now.out

+-- LICENSE-TIMESCALE for a copy of the license.
+SET timescaledb.enable_chunk_append TO false;
+SET timescaledb.enable_constraint_aware_append TO false;
+SET timescaledb.current_timestamp_mock TO '1990-01-01';


We also have the tsl_override_current_timestamptz thing, need to merge them sometime I guess...

src/planner/constify_now.c

akuzm

Tested locally, gives some impressive speedup.

nikkhils

LGTM!

erimatnor · 2022-05-17T15:15:30Z

tsl/test/shared/expected/constify_now-12.out

+----------------------------------------------------------------------------------------------------------------
+ Append
+   ->  Index Only Scan using _hyper_X_X_chunk_metrics_time_idx on _hyper_X_X_chunk
+         Index Cond: (("time" > now()) AND ("time" > 'Mon Jan 01 00:00:00 1990 PST'::timestamp with time zone))


Do we really need to change the query plan and show the constant expression in the explain to enable this functionality? I worry that it can be confusing?

I am thinking it is enough to add the constant expression to the quals we use to scan for chunks during chunk exclusion. That would make the functionality transparent.

Not sure what you are referring to but at the level the optimization is happening there are no scans yet. We need this all to happen way before anything scan related is there because we want to get rid of the planning overhead of excluded chunks.

Oh i think I understand what you mean but that would limit the usefulness of the optimization. This optimization works with both our own hypertable expansion and postgres inheritance expansion. That means it applies to SELECT/UPDATE/DELETE. If we would do this only in our own hypertable expansion it would no longer work for UPDATE/DELETE

erimatnor · 2022-05-17T15:21:20Z

tsl/test/shared/sql/constify_now.sql.in

@@ -0,0 +1,87 @@
+-- This file and its contents are licensed under the Timescale License.


I would have expected tests with prepared statements combined with new data that would affect the plan given that it is important that this optimization works correctly with prepared statements, However, I can't find any such tests.

This implements an optimization to allow now() expression to be used during plan time chunk exclusions. Since now() is stable it would not normally be considered for plan time chunk exclusion. To enable this behaviour we convert `column > now()` expressions into `column > const AND column > now()`. Assuming that time always moves forward this is safe even for prepared statements. This optimization works for SELECT, UPDATE and DELETE. On hypertables with many chunks this can lead to a considerable speedup for certain queries. The following expressions are supported: - column > now() - column >= now() - column > now() - Interval - column > now() + Interval - column >= now() - Interval - column >= now() + Interval Interval must not have a day or month component as those depend on timezone settings. Some microbenchmark to show the improvements, I did best of five for all of the queries. -- hypertable with 1k chunks -- with optimization select * from metrics1k where time > now() - '5m'::interval; Time: 3.090 ms -- without optimization select * from metrics1k where time > now() - '5m'::interval; Time: 145.640 ms -- hypertable with 5k chunks -- with optimization select * from metrics5k where time > now() - '5m'::interval; Time: 4.317 ms -- without optimization select * from metrics5k where time > now() - '5m'::interval; Time: 775.259 ms -- hypertable with 10k chunks -- with optimization select * from metrics10k where time > now() - '5m'::interval; Time: 4.853 ms -- without optimization select * from metrics10k where time > now() - '5m'::interval; Time: 1766.319 ms (00:01.766) -- hypertable with 20k chunks -- with optimization select * from metrics20k where time > now() - '5m'::interval; Time: 6.141 ms -- without optimization select * from metrics20k where time > now() - '5m'::interval; Time: 3321.968 ms (00:03.322) Speedup with 1k chunks: 47x Speedup with 5k chunks: 179x Speedup with 10k chunks: 363x Speedup with 20k chunks: 540x

@abrownsword

This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * timescale#4045 Custom origin's support in CAGGs * timescale#4120 Add logging for retention policy * timescale#4158 Allow ANALYZE command on a data node directly * timescale#4169 Add support for chunk exclusion on DELETE to PG14 * timescale#4209 Add support for chunk exclusion on UPDATE to PG14 * timescale#4269 Continuous Aggregates finals form * timescale#4301 Add support for bulk inserts in COPY operator * timescale#4311 Support non-superuser move chunk operations * timescale#4330 Add GUC "bgw_launcher_poll_time" * timescale#4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * timescale#3899 Fix segfault in Continuous Aggregates * timescale#4225 Fix TRUNCATE error as non-owner on hypertable * timescale#4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * timescale#4249 Fix option "timescaledb.create_group_indexes" * timescale#4251 Fix INSERT into compressed chunks with dropped columns * timescale#4255 Fix option "timescaledb.create_group_indexes" * timescale#4259 Fix logic bug in extension update script * timescale#4269 Fix bad Continuous Aggregate view definition reported in timescale#4233 * timescale#4289 Support moving compressed chunks between data nodes * timescale#4300 Fix refresh window cap for cagg refresh policy * timescale#4315 Fix memory leak in scheduler * timescale#4323 Remove printouts from signal handlers * timescale#4342 Fix move chunk cleanup logic * timescale#4349 Fix crashes in functions using AlterTableInternal * timescale#4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler

@abrownsword

This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * #4045 Custom origin's support in CAGGs * #4120 Add logging for retention policy * #4158 Allow ANALYZE command on a data node directly * #4169 Add support for chunk exclusion on DELETE to PG14 * #4209 Add support for chunk exclusion on UPDATE to PG14 * #4269 Continuous Aggregates finals form * #4301 Add support for bulk inserts in COPY operator * #4311 Support non-superuser move chunk operations * #4330 Add GUC "bgw_launcher_poll_time" * #4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * #3899 Fix segfault in Continuous Aggregates * #4225 Fix TRUNCATE error as non-owner on hypertable * #4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * #4249 Fix option "timescaledb.create_group_indexes" * #4251 Fix INSERT into compressed chunks with dropped columns * #4255 Fix option "timescaledb.create_group_indexes" * #4259 Fix logic bug in extension update script * #4269 Fix bad Continuous Aggregate view definition reported in #4233 * #4289 Support moving compressed chunks between data nodes * #4300 Fix refresh window cap for cagg refresh policy * #4315 Fix memory leak in scheduler * #4323 Remove printouts from signal handlers * #4342 Fix move chunk cleanup logic * #4349 Fix crashes in functions using AlterTableInternal * #4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler

@abrownsword

This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * timescale#4045 Custom origin's support in CAGGs * timescale#4120 Add logging for retention policy * timescale#4158 Allow ANALYZE command on a data node directly * timescale#4169 Add support for chunk exclusion on DELETE to PG14 * timescale#4209 Add support for chunk exclusion on UPDATE to PG14 * timescale#4269 Continuous Aggregates finals form * timescale#4301 Add support for bulk inserts in COPY operator * timescale#4311 Support non-superuser move chunk operations * timescale#4330 Add GUC "bgw_launcher_poll_time" * timescale#4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * timescale#3899 Fix segfault in Continuous Aggregates * timescale#4225 Fix TRUNCATE error as non-owner on hypertable * timescale#4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * timescale#4249 Fix option "timescaledb.create_group_indexes" * timescale#4251 Fix INSERT into compressed chunks with dropped columns * timescale#4255 Fix option "timescaledb.create_group_indexes" * timescale#4259 Fix logic bug in extension update script * timescale#4269 Fix bad Continuous Aggregate view definition reported in timescale#4233 * timescale#4289 Support moving compressed chunks between data nodes * timescale#4300 Fix refresh window cap for cagg refresh policy * timescale#4315 Fix memory leak in scheduler * timescale#4323 Remove printouts from signal handlers * timescale#4342 Fix move chunk cleanup logic * timescale#4349 Fix crashes in functions using AlterTableInternal * timescale#4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler

@abrownsword

This release adds major new features since the 2.6.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Optimize continuous aggregate query performance and storage * The following query clauses and functions can now be used in a continuous aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) * Optimize now() query planning time * Improve COPY insert performance * Improve performance of UPDATE/DELETE on PG14 by excluding chunks This release also includes several bug fixes. If you are upgrading from a previous version and were using compression with a non-default collation on a segmentby-column you should recompress those hypertables. **Features** * #4045 Custom origin's support in CAGGs * #4120 Add logging for retention policy * #4158 Allow ANALYZE command on a data node directly * #4169 Add support for chunk exclusion on DELETE to PG14 * #4209 Add support for chunk exclusion on UPDATE to PG14 * #4269 Continuous Aggregates finals form * #4301 Add support for bulk inserts in COPY operator * #4311 Support non-superuser move chunk operations * #4330 Add GUC "bgw_launcher_poll_time" * #4340 Enable now() usage in plan-time chunk exclusion **Bugfixes** * #3899 Fix segfault in Continuous Aggregates * #4225 Fix TRUNCATE error as non-owner on hypertable * #4236 Fix potential wrong order of results for compressed hypertable with a non-default collation * #4249 Fix option "timescaledb.create_group_indexes" * #4251 Fix INSERT into compressed chunks with dropped columns * #4255 Fix option "timescaledb.create_group_indexes" * #4259 Fix logic bug in extension update script * #4269 Fix bad Continuous Aggregate view definition reported in #4233 * #4289 Support moving compressed chunks between data nodes * #4300 Fix refresh window cap for cagg refresh policy * #4315 Fix memory leak in scheduler * #4323 Remove printouts from signal handlers * #4342 Fix move chunk cleanup logic * #4349 Fix crashes in functions using AlterTableInternal * #4358 Fix crash and other issues in telemetry reporter **Thanks** * @abrownsword for reporting a bug in the telemetry reporter and testing the fix * @jsoref for fixing various misspellings in code, comments and documentation * @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables * @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler

svenklemm self-assigned this May 16, 2022

akuzm reviewed May 16, 2022

View reviewed changes

svenklemm force-pushed the now_constify branch 9 times, most recently from 0dc975a to ca34508 Compare May 16, 2022 19:18

svenklemm force-pushed the now_constify branch 3 times, most recently from b04019f to 0aa4b11 Compare May 16, 2022 21:11

svenklemm marked this pull request as ready for review May 16, 2022 21:15

svenklemm requested a review from a team as a code owner May 16, 2022 21:15

svenklemm requested review from konskov and gayyappan and removed request for a team May 16, 2022 21:15

svenklemm force-pushed the now_constify branch from 0aa4b11 to c01d9db Compare May 16, 2022 21:19

svenklemm requested review from akuzm, nikkhils, erimatnor, fabriziomello, jnidzwetzki, mfundul, mkindahl, pmwkaa and RafiaSabih May 16, 2022 21:24