Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compress_policy for multi txn handling #3667

Merged
merged 2 commits into from
Oct 13, 2021

Conversation

gayyappan
Copy link
Contributor

@gayyappan gayyappan commented Oct 7, 2021

Disable-check: commit-count

Note to reviewers: This PR has multiple commits. I'll be combining them into logical commits. I have not moved them around yet because it is easier to review the smaller changes.

@gayyappan gayyappan requested a review from a team as a code owner October 7, 2021 15:24
@gayyappan gayyappan requested review from berkley, afiskon and svenklemm and removed request for a team October 7, 2021 15:24
@codecov
Copy link

codecov bot commented Oct 11, 2021

Codecov Report

Merging #3667 (4132069) into master (bfe3603) will decrease coverage by 0.15%.
The diff coverage is 85.07%.

❗ Current head 4132069 differs from pull request most recent head 33a584e. Consider uploading reports for the commit 33a584e to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3667      +/-   ##
==========================================
- Coverage   90.78%   90.63%   -0.16%     
==========================================
  Files         212      212              
  Lines       36551    36466      -85     
==========================================
- Hits        33182    33050     -132     
- Misses       3369     3416      +47     
Impacted Files Coverage Δ
src/cross_module_fn.c 70.45% <ø> (-0.23%) ⬇️
src/nodes/chunk_dispatch.c 94.44% <ø> (-3.77%) ⬇️
src/utils.h 80.00% <ø> (ø)
tsl/src/bgw_policy/compression_api.c 68.90% <ø> (-11.90%) ⬇️
tsl/src/bgw_policy/policy_utils.c 81.08% <ø> (+8.66%) ⬆️
tsl/src/init.c 87.50% <ø> (ø)
src/utils.c 78.10% <71.42%> (-0.70%) ⬇️
src/nodes/chunk_insert_state.c 97.62% <100.00%> (-0.12%) ⬇️
src/nodes/hypertable_insert.c 95.67% <100.00%> (-0.03%) ⬇️
tsl/src/bgw_policy/job.c 87.50% <100.00%> (-9.37%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bfe2b08...33a584e. Read the comment docs.

@gayyappan gayyappan self-assigned this Oct 11, 2021
@gayyappan gayyappan requested a review from mkindahl October 11, 2021 20:14
sql/policy_internal.sql Outdated Show resolved Hide resolved
@gayyappan gayyappan force-pushed the comp_fix branch 2 times, most recently from 5a60bb3 to df9a4cb Compare October 12, 2021 02:54
@mkindahl mkindahl removed the request for review from berkley October 12, 2021 12:01
@mkindahl
Copy link
Contributor

Disable-check: commit-count

Do you really want to push a commit with the comment "more tests" (df9a4cb)?

Copy link
Contributor

@afiskon afiskon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR requires a better commit message. The message should describe what was broken and how the patch addressed the issue.

@gayyappan
Copy link
Contributor Author

Disable-check: commit-count

Do you really want to push a commit with the comment "more tests" (df9a4cb)?

Plan to merge commits into a more logical sequence. This PR has multiple commits now to help with reviews.

sql/policy_internal.sql Outdated Show resolved Hide resolved
Comment on lines +56 to +59
numchunks := numchunks + 1;
IF maxchunks > 0 AND numchunks >= maxchunks THEN
EXIT;
END IF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a LIMIT maxchunks to the SELECT for the loop and avoid this code as well as improve performance.

Copy link
Contributor Author

@gayyappan gayyappan Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to do this dynamically based on the maxchunks value, i.e handle NULL or not null.
The default is to not set maxchunks. Policies are created with NULL maxchunks.
So I don't see a point in optimizing for this special case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gayyappan IMHO it's simple to deal with it using dynamic SQL:

statement := format('
    SELECT show.oid, ch.schema_name, ch.table_name, ch.status
    FROM show_chunks(%L, older_than => %L) AS show(oid)
      JOIN pg_class pgc ON pgc.oid = show.oid
      JOIN pg_namespace pgns ON pgc.relnamespace = pgns.oid
      JOIN _timescaledb_catalog.chunk ch ON ch.table_name = pgc.relname AND ch.schema_name = pgns.nspname AND ch.hypertable_id = %L
    WHERE ch.dropped IS FALSE AND ch.status IN (0, 3)', htoid, lag, htid);
...

FOR chunk_rec IN EXECUTE format('%s %s', statement, COALESCE('LIMIT '||maxchunks, ''))
LOOP
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that I do not want to use dynamic SQL to optimize for a special case.

Comment on lines +105 to +109
numchunks := numchunks + 1;
IF maxchunks > 0 AND numchunks >= maxchunks THEN
EXIT;
END IF;
END LOOP;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, use a LIMIT on the select instead.

sql/policy_internal.sql Outdated Show resolved Hide resolved
sql/policy_internal.sql Show resolved Hide resolved
@gayyappan gayyappan requested a review from mkindahl October 12, 2021 18:26
OR dimtype = 'TIMESTAMPTZ'::regtype
OR dimtype = 'DATE'::regtype) THEN
lag_interval := jsonb_object_field_text(config, 'compress_after')::interval ;
CALL _timescaledb_internal.policy_compression_interval(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can simplify it a bit and instead of have two policy_compression_?? functions we can have just one policy_compression_execute and pass down the dimtype to deal with interval and integer in one place.

Move subtract_integer_from_now to src directory
and create a SQL function for it.
This PR removes the C code that executes the compression
policy. Instead we use a PL/pgSQL procedure to execute
the policy.

PG13.4 and PG12.8 introduced some changes
that require PortalContexts while executing transactions.
The compression policy procedure compresses chunks in
multiple transactions. We have seen some issues with snapshots
and portal management in the policy code (due to the
PG13.4 code changes). SPI API has transaction-portal management
code. However, the compression policy code does not use SPI
interfaces. But it is fairly easy to just convert this into
a PL/pgSQL procedure (which calls SPI) rather than replicating
portal managment code in C to manage multiple txns in the
compression policy.

This PR also disallows decompress_chunk, compress_chunk and
recompress_chunk in txn read only mode.

Fixes timescale#3656
@gayyappan gayyappan merged commit fffd6c2 into timescale:master Oct 13, 2021
@NunoFilipeSantos NunoFilipeSantos added this to the TimescaleDB 2.5 milestone Oct 13, 2021
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* timescale#3034 Add support for PostgreSQL 14
* timescale#3435 Add continuous aggregates for distributed hypertables
* timescale#3505 Add support for timezones in `time_bucket_ng()`
* timescale#3598 Improve evaluation of stable functions such as now() on access node
* timescale#3717 Support transparent decompression on individual chunks

**Bugfixes**
* timescale#3580 Fix memory context bug executing TRUNCATE
* timescale#3592 Allow alter column type on distributed hypertable
* timescale#3618 Fix execution of refresh_caggs from user actions
* timescale#3625 Add shared dependencies when creating chunk
* timescale#3626 Fix memory context bug executing TRUNCATE
* timescale#3627 Schema qualify UDTs in multi-node
* timescale#3638 Allow owner change of a data node
* timescale#3654 Fix index attnum mapping in reorder_chunk
* timescale#3661 Fix SkipScan path generation with constant DISTINCT column
* timescale#3667 Fix compress_policy for multi txn handling
* timescale#3673 Fix distributed hypertable DROP within a procedure
* timescale#3701 Allow anyone to use size utilities on distributed hypertables
* timescale#3708 Fix crash in get_aggsplit
* timescale#3709 Fix ordered append pathkey check
* timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* timescale#3724 Fix inserts into compressed chunks on hypertables with caggs
* timescale#3727 Fix DirectFunctionCall crash in distributed_exec
* timescale#3728 Fix SkipScan with varchar column
* timescale#3733 Fix ANALYZE crash with custom statistics for custom types
* timescale#3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
@fabriziomello fabriziomello mentioned this pull request Oct 27, 2021
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* timescale#3034 Add support for PostgreSQL 14
* timescale#3435 Add continuous aggregates for distributed hypertables
* timescale#3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* timescale#3580 Fix memory context bug executing TRUNCATE
* timescale#3592 Allow alter column type on distributed hypertable
* timescale#3598 Improve evaluation of stable functions such as now() on access
node
* timescale#3618 Fix execution of refresh_caggs from user actions
* timescale#3625 Add shared dependencies when creating chunk
* timescale#3626 Fix memory context bug executing TRUNCATE
* timescale#3627 Schema qualify UDTs in multi-node
* timescale#3638 Allow owner change of a data node
* timescale#3654 Fix index attnum mapping in reorder_chunk
* timescale#3661 Fix SkipScan path generation with constant DISTINCT column
* timescale#3667 Fix compress_policy for multi txn handling
* timescale#3673 Fix distributed hypertable DROP within a procedure
* timescale#3701 Allow anyone to use size utilities on distributed hypertables
* timescale#3708 Fix crash in get_aggsplit
* timescale#3709 Fix ordered append pathkey check
* timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* timescale#3717 Support transparent decompression on individual chunks
* timescale#3724 Fix inserts into compressed chunks on hypertables with caggs
* timescale#3727 Fix DirectFunctionCall crash in distributed_exec
* timescale#3728 Fix SkipScan with varchar column
* timescale#3733 Fix ANALYZE crash with custom statistics for custom types
* timescale#3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
@gayyappan gayyappan deleted the comp_fix branch May 20, 2022 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants