-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compress_policy for multi txn handling #3667
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3667 +/- ##
==========================================
- Coverage 90.78% 90.63% -0.16%
==========================================
Files 212 212
Lines 36551 36466 -85
==========================================
- Hits 33182 33050 -132
- Misses 3369 3416 +47
Continue to review full report at Codecov.
|
5a60bb3
to
df9a4cb
Compare
Do you really want to push a commit with the comment "more tests" (df9a4cb)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR requires a better commit message. The message should describe what was broken and how the patch addressed the issue.
Plan to merge commits into a more logical sequence. This PR has multiple commits now to help with reviews. |
numchunks := numchunks + 1; | ||
IF maxchunks > 0 AND numchunks >= maxchunks THEN | ||
EXIT; | ||
END IF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a LIMIT maxchunks
to the SELECT
for the loop and avoid this code as well as improve performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to do this dynamically based on the maxchunks value, i.e handle NULL or not null.
The default is to not set maxchunks. Policies are created with NULL maxchunks.
So I don't see a point in optimizing for this special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gayyappan IMHO it's simple to deal with it using dynamic SQL:
statement := format('
SELECT show.oid, ch.schema_name, ch.table_name, ch.status
FROM show_chunks(%L, older_than => %L) AS show(oid)
JOIN pg_class pgc ON pgc.oid = show.oid
JOIN pg_namespace pgns ON pgc.relnamespace = pgns.oid
JOIN _timescaledb_catalog.chunk ch ON ch.table_name = pgc.relname AND ch.schema_name = pgns.nspname AND ch.hypertable_id = %L
WHERE ch.dropped IS FALSE AND ch.status IN (0, 3)', htoid, lag, htid);
...
FOR chunk_rec IN EXECUTE format('%s %s', statement, COALESCE('LIMIT '||maxchunks, ''))
LOOP
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that I do not want to use dynamic SQL to optimize for a special case.
numchunks := numchunks + 1; | ||
IF maxchunks > 0 AND numchunks >= maxchunks THEN | ||
EXIT; | ||
END IF; | ||
END LOOP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, use a LIMIT
on the select instead.
OR dimtype = 'TIMESTAMPTZ'::regtype | ||
OR dimtype = 'DATE'::regtype) THEN | ||
lag_interval := jsonb_object_field_text(config, 'compress_after')::interval ; | ||
CALL _timescaledb_internal.policy_compression_interval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can simplify it a bit and instead of have two policy_compression_??
functions we can have just one policy_compression_execute
and pass down the dimtype
to deal with interval
and integer
in one place.
Move subtract_integer_from_now to src directory and create a SQL function for it.
This PR removes the C code that executes the compression policy. Instead we use a PL/pgSQL procedure to execute the policy. PG13.4 and PG12.8 introduced some changes that require PortalContexts while executing transactions. The compression policy procedure compresses chunks in multiple transactions. We have seen some issues with snapshots and portal management in the policy code (due to the PG13.4 code changes). SPI API has transaction-portal management code. However, the compression policy code does not use SPI interfaces. But it is fairly easy to just convert this into a PL/pgSQL procedure (which calls SPI) rather than replicating portal managment code in C to manage multiple txns in the compression policy. This PR also disallows decompress_chunk, compress_chunk and recompress_chunk in txn read only mode. Fixes timescale#3656
This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * timescale#3034 Add support for PostgreSQL 14 * timescale#3435 Add continuous aggregates for distributed hypertables * timescale#3505 Add support for timezones in `time_bucket_ng()` * timescale#3598 Improve evaluation of stable functions such as now() on access node * timescale#3717 Support transparent decompression on individual chunks **Bugfixes** * timescale#3580 Fix memory context bug executing TRUNCATE * timescale#3592 Allow alter column type on distributed hypertable * timescale#3618 Fix execution of refresh_caggs from user actions * timescale#3625 Add shared dependencies when creating chunk * timescale#3626 Fix memory context bug executing TRUNCATE * timescale#3627 Schema qualify UDTs in multi-node * timescale#3638 Allow owner change of a data node * timescale#3654 Fix index attnum mapping in reorder_chunk * timescale#3661 Fix SkipScan path generation with constant DISTINCT column * timescale#3667 Fix compress_policy for multi txn handling * timescale#3673 Fix distributed hypertable DROP within a procedure * timescale#3701 Allow anyone to use size utilities on distributed hypertables * timescale#3708 Fix crash in get_aggsplit * timescale#3709 Fix ordered append pathkey check * timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * timescale#3724 Fix inserts into compressed chunks on hypertables with caggs * timescale#3727 Fix DirectFunctionCall crash in distributed_exec * timescale#3728 Fix SkipScan with varchar column * timescale#3733 Fix ANALYZE crash with custom statistics for custom types * timescale#3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode
This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * timescale#3034 Add support for PostgreSQL 14 * timescale#3435 Add continuous aggregates for distributed hypertables * timescale#3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * timescale#3580 Fix memory context bug executing TRUNCATE * timescale#3592 Allow alter column type on distributed hypertable * timescale#3598 Improve evaluation of stable functions such as now() on access node * timescale#3618 Fix execution of refresh_caggs from user actions * timescale#3625 Add shared dependencies when creating chunk * timescale#3626 Fix memory context bug executing TRUNCATE * timescale#3627 Schema qualify UDTs in multi-node * timescale#3638 Allow owner change of a data node * timescale#3654 Fix index attnum mapping in reorder_chunk * timescale#3661 Fix SkipScan path generation with constant DISTINCT column * timescale#3667 Fix compress_policy for multi txn handling * timescale#3673 Fix distributed hypertable DROP within a procedure * timescale#3701 Allow anyone to use size utilities on distributed hypertables * timescale#3708 Fix crash in get_aggsplit * timescale#3709 Fix ordered append pathkey check * timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * timescale#3717 Support transparent decompression on individual chunks * timescale#3724 Fix inserts into compressed chunks on hypertables with caggs * timescale#3727 Fix DirectFunctionCall crash in distributed_exec * timescale#3728 Fix SkipScan with varchar column * timescale#3733 Fix ANALYZE crash with custom statistics for custom types * timescale#3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode
This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode
This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode
This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode
This release adds major new features since the 2.4.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Continuous Aggregates for Distributed Hypertables * Support for PostgreSQL 14 * Experimental: Support for timezones in `time_bucket_ng()`, including the `origin` argument This release also includes several bug fixes. **Features** * #3034 Add support for PostgreSQL 14 * #3435 Add continuous aggregates for distributed hypertables * #3505 Add support for timezones in `time_bucket_ng()` **Bugfixes** * #3580 Fix memory context bug executing TRUNCATE * #3592 Allow alter column type on distributed hypertable * #3598 Improve evaluation of stable functions such as now() on access node * #3618 Fix execution of refresh_caggs from user actions * #3625 Add shared dependencies when creating chunk * #3626 Fix memory context bug executing TRUNCATE * #3627 Schema qualify UDTs in multi-node * #3638 Allow owner change of a data node * #3654 Fix index attnum mapping in reorder_chunk * #3661 Fix SkipScan path generation with constant DISTINCT column * #3667 Fix compress_policy for multi txn handling * #3673 Fix distributed hypertable DROP within a procedure * #3701 Allow anyone to use size utilities on distributed hypertables * #3708 Fix crash in get_aggsplit * #3709 Fix ordered append pathkey check * #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling * #3717 Support transparent decompression on individual chunks * #3724 Fix inserts into compressed chunks on hypertables with caggs * #3727 Fix DirectFunctionCall crash in distributed_exec * #3728 Fix SkipScan with varchar column * #3733 Fix ANALYZE crash with custom statistics for custom types * #3747 Always reset expr context in DecompressChunk **Thanks** * @binakot and @sebvett for reporting an issue with DISTINCT queries * @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE * @mjf for reporting an issue with ordered append and JOINs * @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now() * @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg * @tanglebones for reporting the ANALYZE crash with custom types on multinode
Disable-check: commit-count
Note to reviewers: This PR has multiple commits. I'll be combining them into logical commits. I have not moved them around yet because it is easier to review the smaller changes.