Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable now() usage in plan-time chunk exclusion #4340

Merged
merged 1 commit into from
May 17, 2022

Conversation

svenklemm
Copy link
Member

@svenklemm svenklemm commented May 16, 2022

This implements an optimization to allow now() expression to be
used during plan time chunk exclusions. Since now() is stable it
would not normally be considered for plan time chunk exclusion.
To enable this behaviour we convert column > now() expressions
into column > const AND column > now(). Assuming that time
always moves forward this is save even for prepared statements.
This optimization works for SELECT, UPDATE and DELETE.
On hypertables with many chunks this can lead to a considerable
speedup for certain queries.

The following expressions are supported:

  • column > now()
  • column >= now()
  • column > now() - Interval
  • column > now() + Interval
  • column >= now() - Interval
  • column >= now() + Interval

Interval must not have a day or month component as those depend
on timezone settings.

Some microbenchmark to show the improvements, I did best of five
for all of the queries.

-- hypertable with 1k chunks
-- with optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 3.090 ms

-- without optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 145.640 ms

-- hypertable with 5k chunks
-- with optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 4.317 ms

-- without optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 775.259 ms

-- hypertable with 10k chunks
-- with optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 4.853 ms

-- without optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 1766.319 ms (00:01.766)

-- hypertable with 20k chunks
-- with optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 6.141 ms

-- without optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 3321.968 ms (00:03.322)

Speedup with 1k chunks: 47x
Speedup with 5k chunks: 179x
Speedup with 10k chunks: 363x
Speedup with 20k chunks: 540x

@svenklemm svenklemm self-assigned this May 16, 2022
FromExpr *from = castNode(FromExpr, node);
if (from->quals)
{
from->quals = ts_constify_now(context->root, from->quals);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a more generic variant here: https://github.com/timescale/timescaledb/blob/main/tsl/src/fdw/scan_plan.c#L552-L563
We use it to evaluate stable constraints at the access node before sending them to the data nodes. Can we just use it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code evaluates many timestamp-related stable functions, not just now(): https://github.com/timescale/timescaledb/blob/main/tsl/src/fdw/deparse.c#L118-L180

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this is for local hypertables, right? This means we don't need a whitelist and can evaluate all stable functions before performing constraint exclusion. Although I'm not sure if preprocess_queries is the right place to do this, because this happens at planning time, and the stable functions should be evaluated at execution time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a draft so i'll expand the commit message but you cannot just constify now() like that at plan time that would not be safe. What this patch does is transform column > now() - '5m' into column > '2022-05-16 14:49' AND column > now() - '5min'. This allows the constified value to be used during plan time chunk exclusion while still returning correct results even for prepared statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice idea!

@svenklemm svenklemm force-pushed the now_constify branch 9 times, most recently from 0dc975a to ca34508 Compare May 16, 2022 19:18
@codecov
Copy link

codecov bot commented May 16, 2022

Codecov Report

Merging #4340 (b8373d3) into main (c6c64c4) will increase coverage by 0.03%.
The diff coverage is 98.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4340      +/-   ##
==========================================
+ Coverage   90.75%   90.79%   +0.03%     
==========================================
  Files         216      217       +1     
  Lines       39996    40040      +44     
==========================================
+ Hits        36300    36355      +55     
+ Misses       3696     3685      -11     
Impacted Files Coverage Δ
src/compat/compat.h 94.73% <ø> (ø)
src/copy.c 93.79% <ø> (ø)
src/loader/bgw_launcher.c 91.82% <ø> (ø)
src/nodes/chunk_append/chunk_append.c 97.94% <ø> (ø)
src/planner/add_hashagg.c 52.85% <ø> (ø)
src/planner/agg_bookend.c 92.30% <ø> (ø)
src/planner/expand_hypertable.c 93.94% <ø> (ø)
src/planner/partialize.c 97.91% <ø> (ø)
src/planner/planner.h 100.00% <ø> (ø)
tsl/src/fdw/relinfo.c 96.83% <ø> (-0.02%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01c6125...b8373d3. Read the comment docs.

@svenklemm svenklemm force-pushed the now_constify branch 3 times, most recently from b04019f to 0aa4b11 Compare May 16, 2022 21:11
@svenklemm svenklemm marked this pull request as ready for review May 16, 2022 21:15
@svenklemm svenklemm requested a review from a team as a code owner May 16, 2022 21:15
@svenklemm svenklemm requested review from konskov and gayyappan and removed request for a team May 16, 2022 21:15
-- LICENSE-TIMESCALE for a copy of the license.
SET timescaledb.enable_chunk_append TO false;
SET timescaledb.enable_constraint_aware_append TO false;
SET timescaledb.current_timestamp_mock TO '1990-01-01';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have the tsl_override_current_timestamptz thing, need to merge them sometime I guess...

src/planner/constify_now.c Outdated Show resolved Hide resolved
@svenklemm svenklemm force-pushed the now_constify branch 4 times, most recently from 1ed85f5 to 6ef5869 Compare May 17, 2022 11:13
Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, gives some impressive speedup.

Copy link
Contributor

@nikkhils nikkhils left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

----------------------------------------------------------------------------------------------------------------
Append
-> Index Only Scan using _hyper_X_X_chunk_metrics_time_idx on _hyper_X_X_chunk
Index Cond: (("time" > now()) AND ("time" > 'Mon Jan 01 00:00:00 1990 PST'::timestamp with time zone))
Copy link
Contributor

@erimatnor erimatnor May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to change the query plan and show the constant expression in the explain to enable this functionality? I worry that it can be confusing?

I am thinking it is enough to add the constant expression to the quals we use to scan for chunks during chunk exclusion. That would make the functionality transparent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are referring to but at the level the optimization is happening there are no scans yet. We need this all to happen way before anything scan related is there because we want to get rid of the planning overhead of excluded chunks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i think I understand what you mean but that would limit the usefulness of the optimization. This optimization works with both our own hypertable expansion and postgres inheritance expansion. That means it applies to SELECT/UPDATE/DELETE. If we would do this only in our own hypertable expansion it would no longer work for UPDATE/DELETE

@@ -0,0 +1,87 @@
-- This file and its contents are licensed under the Timescale License.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected tests with prepared statements combined with new data that would affect the plan given that it is important that this optimization works correctly with prepared statements, However, I can't find any such tests.

This implements an optimization to allow now() expression to be
used during plan time chunk exclusions. Since now() is stable it
would not normally be considered for plan time chunk exclusion.
To enable this behaviour we convert `column > now()` expressions
into `column > const AND column > now()`. Assuming that time
always moves forward this is safe even for prepared statements.
This optimization works for SELECT, UPDATE and DELETE.
On hypertables with many chunks this can lead to a considerable
speedup for certain queries.

The following expressions are supported:
- column > now()
- column >= now()
- column > now() - Interval
- column > now() + Interval
- column >= now() - Interval
- column >= now() + Interval

Interval must not have a day or month component as those depend
on timezone settings.

Some microbenchmark to show the improvements, I did best of five
for all of the queries.

-- hypertable with 1k chunks
-- with optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 3.090 ms

-- without optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 145.640 ms

-- hypertable with 5k chunks
-- with optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 4.317 ms

-- without optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 775.259 ms

-- hypertable with 10k chunks
-- with optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 4.853 ms

-- without optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 1766.319 ms (00:01.766)

-- hypertable with 20k chunks
-- with optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 6.141 ms

-- without optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 3321.968 ms (00:03.322)

Speedup with 1k chunks: 47x
Speedup with 5k chunks: 179x
Speedup with 10k chunks: 363x
Speedup with 20k chunks: 540x
@svenklemm svenklemm merged commit 35ea80f into timescale:main May 17, 2022
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request May 23, 2022
This release adds major new features since the 2.6.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Optimize continuous aggregate query performance and storage
* The following query clauses and functions can now be used in a continuous
  aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE)
  and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE)
* Optimize now() query planning time
* Improve COPY insert performance
* Improve performance of UPDATE/DELETE on PG14 by excluding chunks

This release also includes several bug fixes.

If you are upgrading from a previous version and were using compression
with a non-default collation on a segmentby-column you should recompress
those hypertables.

**Features**
* timescale#4045 Custom origin's support in CAGGs
* timescale#4120 Add logging for retention policy
* timescale#4158 Allow ANALYZE command on a data node directly
* timescale#4169 Add support for chunk exclusion on DELETE to PG14
* timescale#4209 Add support for chunk exclusion on UPDATE to PG14
* timescale#4269 Continuous Aggregates finals form
* timescale#4301 Add support for bulk inserts in COPY operator
* timescale#4311 Support non-superuser move chunk operations
* timescale#4330 Add GUC "bgw_launcher_poll_time"
* timescale#4340 Enable now() usage in plan-time chunk exclusion

**Bugfixes**
* timescale#3899 Fix segfault in Continuous Aggregates
* timescale#4225 Fix TRUNCATE error as non-owner on hypertable
* timescale#4236 Fix potential wrong order of results for compressed hypertable with a non-default collation
* timescale#4249 Fix option "timescaledb.create_group_indexes"
* timescale#4251 Fix INSERT into compressed chunks with dropped columns
* timescale#4255 Fix option "timescaledb.create_group_indexes"
* timescale#4259 Fix logic bug in extension update script
* timescale#4269 Fix bad Continuous Aggregate view definition reported in timescale#4233
* timescale#4289 Support moving compressed chunks between data nodes
* timescale#4300 Fix refresh window cap for cagg refresh policy
* timescale#4315 Fix memory leak in scheduler
* timescale#4323 Remove printouts from signal handlers
* timescale#4342 Fix move chunk cleanup logic
* timescale#4349 Fix crashes in functions using AlterTableInternal
* timescale#4358 Fix crash and other issues in telemetry reporter

**Thanks**
* @abrownsword for reporting a bug in the telemetry reporter and testing the fix
* @jsoref for fixing various misspellings in code, comments and documentation
* @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables
* @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
@svenklemm svenklemm mentioned this pull request May 23, 2022
svenklemm added a commit that referenced this pull request May 23, 2022
This release adds major new features since the 2.6.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Optimize continuous aggregate query performance and storage
* The following query clauses and functions can now be used in a continuous
  aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE)
  and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE)
* Optimize now() query planning time
* Improve COPY insert performance
* Improve performance of UPDATE/DELETE on PG14 by excluding chunks

This release also includes several bug fixes.

If you are upgrading from a previous version and were using compression
with a non-default collation on a segmentby-column you should recompress
those hypertables.

**Features**
* #4045 Custom origin's support in CAGGs
* #4120 Add logging for retention policy
* #4158 Allow ANALYZE command on a data node directly
* #4169 Add support for chunk exclusion on DELETE to PG14
* #4209 Add support for chunk exclusion on UPDATE to PG14
* #4269 Continuous Aggregates finals form
* #4301 Add support for bulk inserts in COPY operator
* #4311 Support non-superuser move chunk operations
* #4330 Add GUC "bgw_launcher_poll_time"
* #4340 Enable now() usage in plan-time chunk exclusion

**Bugfixes**
* #3899 Fix segfault in Continuous Aggregates
* #4225 Fix TRUNCATE error as non-owner on hypertable
* #4236 Fix potential wrong order of results for compressed hypertable with a non-default collation
* #4249 Fix option "timescaledb.create_group_indexes"
* #4251 Fix INSERT into compressed chunks with dropped columns
* #4255 Fix option "timescaledb.create_group_indexes"
* #4259 Fix logic bug in extension update script
* #4269 Fix bad Continuous Aggregate view definition reported in #4233
* #4289 Support moving compressed chunks between data nodes
* #4300 Fix refresh window cap for cagg refresh policy
* #4315 Fix memory leak in scheduler
* #4323 Remove printouts from signal handlers
* #4342 Fix move chunk cleanup logic
* #4349 Fix crashes in functions using AlterTableInternal
* #4358 Fix crash and other issues in telemetry reporter

**Thanks**
* @abrownsword for reporting a bug in the telemetry reporter and testing the fix
* @jsoref for fixing various misspellings in code, comments and documentation
* @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables
* @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
mfundul pushed a commit to mfundul/timescaledb that referenced this pull request May 24, 2022
This release adds major new features since the 2.6.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Optimize continuous aggregate query performance and storage
* The following query clauses and functions can now be used in a continuous
  aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE)
  and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE)
* Optimize now() query planning time
* Improve COPY insert performance
* Improve performance of UPDATE/DELETE on PG14 by excluding chunks

This release also includes several bug fixes.

If you are upgrading from a previous version and were using compression
with a non-default collation on a segmentby-column you should recompress
those hypertables.

**Features**
* timescale#4045 Custom origin's support in CAGGs
* timescale#4120 Add logging for retention policy
* timescale#4158 Allow ANALYZE command on a data node directly
* timescale#4169 Add support for chunk exclusion on DELETE to PG14
* timescale#4209 Add support for chunk exclusion on UPDATE to PG14
* timescale#4269 Continuous Aggregates finals form
* timescale#4301 Add support for bulk inserts in COPY operator
* timescale#4311 Support non-superuser move chunk operations
* timescale#4330 Add GUC "bgw_launcher_poll_time"
* timescale#4340 Enable now() usage in plan-time chunk exclusion

**Bugfixes**
* timescale#3899 Fix segfault in Continuous Aggregates
* timescale#4225 Fix TRUNCATE error as non-owner on hypertable
* timescale#4236 Fix potential wrong order of results for compressed hypertable with a non-default collation
* timescale#4249 Fix option "timescaledb.create_group_indexes"
* timescale#4251 Fix INSERT into compressed chunks with dropped columns
* timescale#4255 Fix option "timescaledb.create_group_indexes"
* timescale#4259 Fix logic bug in extension update script
* timescale#4269 Fix bad Continuous Aggregate view definition reported in timescale#4233
* timescale#4289 Support moving compressed chunks between data nodes
* timescale#4300 Fix refresh window cap for cagg refresh policy
* timescale#4315 Fix memory leak in scheduler
* timescale#4323 Remove printouts from signal handlers
* timescale#4342 Fix move chunk cleanup logic
* timescale#4349 Fix crashes in functions using AlterTableInternal
* timescale#4358 Fix crash and other issues in telemetry reporter

**Thanks**
* @abrownsword for reporting a bug in the telemetry reporter and testing the fix
* @jsoref for fixing various misspellings in code, comments and documentation
* @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables
* @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
@mfundul mfundul mentioned this pull request May 24, 2022
mfundul pushed a commit that referenced this pull request May 24, 2022
This release adds major new features since the 2.6.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Optimize continuous aggregate query performance and storage
* The following query clauses and functions can now be used in a continuous
  aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE)
  and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE)
* Optimize now() query planning time
* Improve COPY insert performance
* Improve performance of UPDATE/DELETE on PG14 by excluding chunks

This release also includes several bug fixes.

If you are upgrading from a previous version and were using compression
with a non-default collation on a segmentby-column you should recompress
those hypertables.

**Features**
* #4045 Custom origin's support in CAGGs
* #4120 Add logging for retention policy
* #4158 Allow ANALYZE command on a data node directly
* #4169 Add support for chunk exclusion on DELETE to PG14
* #4209 Add support for chunk exclusion on UPDATE to PG14
* #4269 Continuous Aggregates finals form
* #4301 Add support for bulk inserts in COPY operator
* #4311 Support non-superuser move chunk operations
* #4330 Add GUC "bgw_launcher_poll_time"
* #4340 Enable now() usage in plan-time chunk exclusion

**Bugfixes**
* #3899 Fix segfault in Continuous Aggregates
* #4225 Fix TRUNCATE error as non-owner on hypertable
* #4236 Fix potential wrong order of results for compressed hypertable with a non-default collation
* #4249 Fix option "timescaledb.create_group_indexes"
* #4251 Fix INSERT into compressed chunks with dropped columns
* #4255 Fix option "timescaledb.create_group_indexes"
* #4259 Fix logic bug in extension update script
* #4269 Fix bad Continuous Aggregate view definition reported in #4233
* #4289 Support moving compressed chunks between data nodes
* #4300 Fix refresh window cap for cagg refresh policy
* #4315 Fix memory leak in scheduler
* #4323 Remove printouts from signal handlers
* #4342 Fix move chunk cleanup logic
* #4349 Fix crashes in functions using AlterTableInternal
* #4358 Fix crash and other issues in telemetry reporter

**Thanks**
* @abrownsword for reporting a bug in the telemetry reporter and testing the fix
* @jsoref for fixing various misspellings in code, comments and documentation
* @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables
* @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants