Fix compress_policy for multi txn handling #3667

gayyappan · 2021-10-07T15:24:13Z

Disable-check: commit-count

Note to reviewers: This PR has multiple commits. I'll be combining them into logical commits. I have not moved them around yet because it is easier to review the smaller changes.

codecov · 2021-10-11T19:48:41Z

Codecov Report

Merging #3667 (4132069) into master (bfe3603) will decrease coverage by 0.15%.
The diff coverage is 85.07%.

❗ Current head 4132069 differs from pull request most recent head 33a584e. Consider uploading reports for the commit 33a584e to get more accurate results

@@            Coverage Diff             @@
##           master    #3667      +/-   ##
==========================================
- Coverage   90.78%   90.63%   -0.16%     
==========================================
  Files         212      212              
  Lines       36551    36466      -85     
==========================================
- Hits        33182    33050     -132     
- Misses       3369     3416      +47

Impacted Files	Coverage Δ
src/cross_module_fn.c	`70.45% <ø> (-0.23%)`	⬇️
src/nodes/chunk_dispatch.c	`94.44% <ø> (-3.77%)`	⬇️
src/utils.h	`80.00% <ø> (ø)`
tsl/src/bgw_policy/compression_api.c	`68.90% <ø> (-11.90%)`	⬇️
tsl/src/bgw_policy/policy_utils.c	`81.08% <ø> (+8.66%)`	⬆️
tsl/src/init.c	`87.50% <ø> (ø)`
src/utils.c	`78.10% <71.42%> (-0.70%)`	⬇️
src/nodes/chunk_insert_state.c	`97.62% <100.00%> (-0.12%)`	⬇️
src/nodes/hypertable_insert.c	`95.67% <100.00%> (-0.03%)`	⬇️
tsl/src/bgw_policy/job.c	`87.50% <100.00%> (-9.37%)`	⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bfe2b08...33a584e. Read the comment docs.

sql/policy_internal.sql

mkindahl · 2021-10-12T12:03:37Z

Disable-check: commit-count

Do you really want to push a commit with the comment "more tests" (df9a4cb)?

afiskon

The PR requires a better commit message. The message should describe what was broken and how the patch addressed the issue.

gayyappan · 2021-10-12T13:10:59Z

Disable-check: commit-count

Do you really want to push a commit with the comment "more tests" (df9a4cb)?

Plan to merge commits into a more logical sequence. This PR has multiple commits now to help with reviews.

sql/policy_internal.sql

mkindahl · 2021-10-12T12:17:59Z

sql/policy_internal.sql

+    numchunks := numchunks + 1;
+    IF maxchunks > 0 AND numchunks >= maxchunks THEN  
+         EXIT; 
+    END IF;  


You can add a LIMIT maxchunks to the SELECT for the loop and avoid this code as well as improve performance.

I would have to do this dynamically based on the maxchunks value, i.e handle NULL or not null.
The default is to not set maxchunks. Policies are created with NULL maxchunks.
So I don't see a point in optimizing for this special case.

@gayyappan IMHO it's simple to deal with it using dynamic SQL:

statement := format(' SELECT show.oid, ch.schema_name, ch.table_name, ch.status FROM show_chunks(%L, older_than => %L) AS show(oid) JOIN pg_class pgc ON pgc.oid = show.oid JOIN pg_namespace pgns ON pgc.relnamespace = pgns.oid JOIN _timescaledb_catalog.chunk ch ON ch.table_name = pgc.relname AND ch.schema_name = pgns.nspname AND ch.hypertable_id = %L WHERE ch.dropped IS FALSE AND ch.status IN (0, 3)', htoid, lag, htid); ... FOR chunk_rec IN EXECUTE format('%s %s', statement, COALESCE('LIMIT '||maxchunks, '')) LOOP ...

The point is that I do not want to use dynamic SQL to optimize for a special case.

mkindahl · 2021-10-12T16:10:56Z

sql/policy_internal.sql

+    numchunks := numchunks + 1;
+    IF maxchunks > 0 AND numchunks >= maxchunks THEN  
+         EXIT; 
+    END IF;  
+  END LOOP;


Same here, use a LIMIT on the select instead.

sql/policy_internal.sql

fabriziomello · 2021-10-13T11:36:48Z

sql/policy_internal.sql

+      OR dimtype = 'TIMESTAMPTZ'::regtype
+      OR dimtype = 'DATE'::regtype) THEN
+      lag_interval := jsonb_object_field_text(config, 'compress_after')::interval ;
+      CALL _timescaledb_internal.policy_compression_interval( 


Maybe we can simplify it a bit and instead of have two policy_compression_?? functions we can have just one policy_compression_execute and pass down the dimtype to deal with interval and integer in one place.

Move subtract_integer_from_now to src directory and create a SQL function for it.

This PR removes the C code that executes the compression policy. Instead we use a PL/pgSQL procedure to execute the policy. PG13.4 and PG12.8 introduced some changes that require PortalContexts while executing transactions. The compression policy procedure compresses chunks in multiple transactions. We have seen some issues with snapshots and portal management in the policy code (due to the PG13.4 code changes). SPI API has transaction-portal management code. However, the compression policy code does not use SPI interfaces. But it is fairly easy to just convert this into a PL/pgSQL procedure (which calls SPI) rather than replicating portal managment code in C to manage multiple txns in the compression policy. This PR also disallows decompress_chunk, compress_chunk and recompress_chunk in txn read only mode. Fixes timescale#3656

@binakot

This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * timescale#3034 Add support for PostgreSQL 14 * timescale#3435 Add continuous aggregates for distributed hypertables * timescale#3505 Add support for timezones in `time_bucket_ng()` * timescale#3598 Improve evaluation of stable functions such as now() on access node * timescale#3717 Support transparent decompression on individual chunks **Bugfixes** * timescale#3580 Fix memory context bug executing TRUNCATE * timescale#3592 Allow alter column type on distributed hypertable * timescale#3618 Fix execution of refresh_caggs from user actions * timescale#3625 Add shared dependencies when creating chunk * timescale#3626 Fix memory context bug executing TRUNCATE * timescale#3627 Schema qualify UDTs in multi-node * timescale#3638 Allow owner change of a data node * timescale#3654 Fix index attnum mapping in reorder_chunk * timescale#3661 Fix SkipScan path generation with constant DISTINCT column * timescale#3667 Fix compress_policy for multi txn handling * timescale#3673 Fix distributed hypertable DROP within a procedure * timescale#3701 Allow anyone to use size utilities on distributed hypertables * timescale#3708 Fix crash in get_aggsplit * timescale#3709 Fix ordered append pathkey check * timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * timescale#3724 Fix inserts into compressed chunks on hypertables with caggs * timescale#3727 Fix DirectFunctionCall crash in distributed_exec * timescale#3728 Fix SkipScan with varchar column * timescale#3733 Fix ANALYZE crash with custom statistics for custom types * timescale#3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode

@binakot

This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * timescale#3034 Add support for PostgreSQL 14 * timescale#3435 Add continuous aggregates for distributed hypertables * timescale#3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * timescale#3580 Fix memory context bug executing TRUNCATE * timescale#3592 Allow alter column type on distributed hypertable * timescale#3598 Improve evaluation of stable functions such as now() on access node * timescale#3618 Fix execution of refresh_caggs from user actions * timescale#3625 Add shared dependencies when creating chunk * timescale#3626 Fix memory context bug executing TRUNCATE * timescale#3627 Schema qualify UDTs in multi-node * timescale#3638 Allow owner change of a data node * timescale#3654 Fix index attnum mapping in reorder_chunk * timescale#3661 Fix SkipScan path generation with constant DISTINCT column * timescale#3667 Fix compress_policy for multi txn handling * timescale#3673 Fix distributed hypertable DROP within a procedure * timescale#3701 Allow anyone to use size utilities on distributed hypertables * timescale#3708 Fix crash in get_aggsplit * timescale#3709 Fix ordered append pathkey check * timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * timescale#3717 Support transparent decompression on individual chunks * timescale#3724 Fix inserts into compressed chunks on hypertables with caggs * timescale#3727 Fix DirectFunctionCall crash in distributed_exec * timescale#3728 Fix SkipScan with varchar column * timescale#3733 Fix ANALYZE crash with custom statistics for custom types * timescale#3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode

@binakot

This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode

@binakot

This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode

@binakot

This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode

@binakot

This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode

gayyappan requested a review from a team as a code owner October 7, 2021 15:24

gayyappan requested review from berkley, afiskon and svenklemm and removed request for a team October 7, 2021 15:24

gayyappan force-pushed the comp_fix branch from 76ad285 to 03b811b Compare October 11, 2021 18:44

gayyappan self-assigned this Oct 11, 2021

gayyappan requested a review from mkindahl October 11, 2021 20:14