Improve compression job IO performance #4756

antekresic · 2022-09-27T08:52:31Z

This PR does a couple of changes to improve IO usage for compression jobs:

changes sort memory limits from work_mem to maintenance_work_mem, reducing the chance to use temp buffers for larger chunks
moves uncompressed chunk ANALYZE operation after heap has been loaded into shared buffers by the heap scan we need to do in order to sort the chunk contents

Disable-check: commit-count

codecov · 2022-09-27T09:02:51Z

Codecov Report

Merging #4756 (1e179e8) into main (1fbd411) will increase coverage by 0.02%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##             main    #4756      +/-   ##
==========================================
+ Coverage   90.85%   90.88%   +0.02%     
==========================================
  Files         221      224       +3     
  Lines       40753    42427    +1674     
==========================================
+ Hits        37028    38561    +1533     
- Misses       3725     3866     +141

Impacted Files	Coverage Δ
src/chunk.h	`100.00% <ø> (ø)`
src/planner/planner.h	`100.00% <ø> (ø)`
src/ts_catalog/catalog.c	`83.81% <ø> (ø)`
src/ts_catalog/catalog.h	`100.00% <ø> (ø)`
src/ts_catalog/compression_chunk_size.c	`46.66% <0.00%> (-0.80%)`	⬇️
src/ts_catalog/continuous_agg.h	`100.00% <ø> (ø)`
src/utils.h	`82.60% <ø> (ø)`
tsl/src/compression/float_utils.h	`100.00% <ø> (ø)`
tsl/src/compression/gorilla.c	`91.11% <ø> (ø)`
tsl/src/continuous_aggs/insert.c	`90.00% <ø> (ø)`
... and 106 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0e193d...1e179e8. Read the comment docs.

svenklemm · 2022-09-27T09:49:24Z

Needs changelog entry

akuzm · 2022-09-27T12:50:23Z

tsl/src/compression/compression.c

@@ -488,6 +496,25 @@ compress_chunk_populate_sort_info_for_column(Oid table, const ColumnCompressionI
 	ReleaseSysCache(tp);
 }

+static void


Suggested change

static void

/* Run VACUUM ANALYZE on the given chunk. */

static void

Maybe it should be called run_vacuum_analyze if that's what it does?

It actually just runs ANALYZE without the vacuum afaik. But I agree with changing the name to make it more apparent.

Updated to run_analyze_on_chunk.

akuzm · 2022-09-27T12:59:16Z

tsl/src/compression/compression.c

@@ -441,6 +443,12 @@ compress_chunk_sort_relation(Relation in_rel, int n_keys, const ColumnCompressio

 	heap_endscan(heapScan);

+	/* Perform an analyze on the chunk to get up-to-date stats before compressing.
+	 * We do it at this point to reduce the amount of random IO operations
+	 * necessary in case the statistic target is larger than the chunk heap size.


We do it here because we just read out the entire chunk into tuplesort, its pages are cached and we can save on disk I/O, right? The wording with random I/O and statistics target is somewhat confusing...

That's true. I'm open to suggestions about better wording 😄

"We do it at this point because we've just read out the entire chunk into tuplesort, so its pages are likely to be cached and we can save on I/O".

Although I wonder to which extent this is true, because full heap scans use a dedicated cache strategy which doesn't cache much.

I have definitely seen a rise in average read op size. In any case, heap will be present in os/disk cache if not shared buffers.

The Postgres cache uses the dedicated strategy but the os/disk cache doesn't and that's the caching we really care about.

Updated comment to suggestion.

mkindahl

Looks good to me. The lock that we are taking while compressing is very heavy, but this has nothing to do with this PR and I do not think that we can take a weaker lock.

When compressing larger chunks, compression sort tends to use temporary files since memory limits (`work_mem`) are usually pretty small to fit all the data into memory. On the other hand, using `maintenance_work_mem` makes more sense since its generally safer to use a larger value without impacting general resource usage.

Depending on the statistics target, running ANALYZE on a chunk before compression can cause a lot of random IO operations for chunks that are bigger than the number of pages ANALYZE needs to read. By moving that operation after the heap is loaded into memory for sorting, we increase the chance of hitting cache and reducing disk operations necessary to execute compression jobs.

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@jflambert

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration

@byazici

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket * Introduce fixed schedules for background jobs and the ability to check job errors. * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg * #5054 Fix segfault after second ANALYZE * #5086 Reset baserel cache on invalid hypertable cache **Thanks** * @byazici for reporting a problem with segmentby on compressed caggs * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded. * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression

@byazici

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket * Introduce fixed schedules for background jobs and the ability to check job errors. * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg * #5054 Fix segfault after second ANALYZE * #5086 Reset baserel cache on invalid hypertable cache **Thanks** * @byazici for reporting a problem with segmentby on compressed caggs * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded. * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression

MarkCupitt · 2024-01-03T06:39:54Z

I just ran into an issue on teh latest Helmn Chart, was getting a Patroni Startup Error

128 kB is outside the valid range for parameter "maintenance_work_mem

adding maintenance_work_mem: 64MB into teh Helm Values

      postgresql:
        parameters:
          maintenance_work_mem: 64MB

FIxed it ..

antekresic added performance compression labels Sep 27, 2022

antekresic requested review from mkindahl and svenklemm September 27, 2022 08:52

antekresic self-assigned this Sep 27, 2022

svenklemm approved these changes Sep 27, 2022

View reviewed changes

akuzm reviewed Sep 27, 2022

View reviewed changes

akuzm approved these changes Sep 27, 2022

View reviewed changes

mkindahl approved these changes Sep 28, 2022

View reviewed changes

antekresic added 2 commits September 28, 2022 10:43

antekresic force-pushed the improve_compression branch from c0620a3 to 1e179e8 Compare September 28, 2022 08:53

antekresic merged commit cc110a3 into timescale:main Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve compression job IO performance #4756

Improve compression job IO performance #4756

antekresic commented Sep 27, 2022 •

edited

Loading

codecov bot commented Sep 27, 2022 •

edited

Loading

svenklemm commented Sep 27, 2022

akuzm Sep 27, 2022

akuzm Sep 27, 2022

antekresic Sep 27, 2022 •

edited

Loading

antekresic Sep 28, 2022

akuzm Sep 27, 2022

antekresic Sep 27, 2022

akuzm Sep 27, 2022

antekresic Sep 27, 2022

cevian Sep 27, 2022

antekresic Sep 28, 2022

mkindahl left a comment

MarkCupitt commented Jan 3, 2024

	static void
	/* Run VACUUM ANALYZE on the given chunk. */
	static void

Improve compression job IO performance #4756

Improve compression job IO performance #4756

Conversation

antekresic commented Sep 27, 2022 • edited Loading

codecov bot commented Sep 27, 2022 • edited Loading

Codecov Report

svenklemm commented Sep 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antekresic Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkindahl left a comment

Choose a reason for hiding this comment

MarkCupitt commented Jan 3, 2024

antekresic commented Sep 27, 2022 •

edited

Loading

codecov bot commented Sep 27, 2022 •

edited

Loading

antekresic Sep 27, 2022 •

edited

Loading