Add index to compression_chunk_size catalog table #7227

mkindahl · 2024-09-02T11:53:11Z

During upgrade the function remove_dropped_chunk_metadata is used to update the metadata tables and remove data for chunks marked as dropped. The function iterates of the chunks of the provided hypertable and internally does a sequence scan of compression_chunk_size table to locate the compressed_chunk_id, resulting in quadratic execution time. This is usually not noticed for small number of chunks, but for large number of chunks this becomes a problem.

This commit fixes this by adding an index to compression_chunk_size catalog table, turning the sequence scan into an index scan.

mkindahl · 2024-09-02T11:55:51Z

An alternative would be to rewrite the delete statements to join it with the select statement that drives the for-loop in remove_dropped_chunk_metadata. However, this is a lot more work for a small benefit, so hence this patch.

During upgrade the function `remove_dropped_chunk_metadata` is used to update the metadata tables and remove data for chunks marked as dropped. The function iterates of the chunks of the provided hypertable and internally does a sequence scan of `compression_chunk_size` table to locate the `compressed_chunk_id`, resulting in quadratic execution time. This is usually not noticed for small number of chunks, but for large number of chunks this becomes a problem. This commit fixes this by adding an index to `compression_chunk_size` catalog table, turning the sequence scan into an index scan.

codecov · 2024-09-02T12:07:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.81%. Comparing base (59f50f2) to head (576ddc8).
Report is 305 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7227      +/-   ##
==========================================
+ Coverage   80.06%   81.81%   +1.74%     
==========================================
  Files         190      205      +15     
  Lines       37181    38320    +1139     
  Branches     9450     9936     +486     
==========================================
+ Hits        29770    31350    +1580     
+ Misses       2997     2974      -23     
+ Partials     4414     3996     -418

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MiguelTubio

This release contains performance improvements and bug fixes since the 2.16.1 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6882: Allow DELETE on the compressed chunks without decompression. * timescale#7033 Use MERGE statement on CAgg Refresh * timescale#7126: Add functions to show the compression information. * timescale#7147: Vectorize partial aggregation for `sum * timescale#7204: Track additional extensions in telemetry. * timescale#7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * timescale#7209: Add a function to drop the `osm` chunk. **Bugfixes** * timescale#7187: Fix the string literal length for the `compressed_data_info` function. * timescale#7191: Fix creating default indexes on chunks when migrating the data. * timescale#7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * timescale#7201: Use the generic extension description when building `apt` and `rpm` loader packages. * timescale#7227: Add an index to the `compression_chunk_size` catalog table. * timescale#7229: Fix the foreign key constraints where the index and the constraint column order are different. * timescale#7230: Do not propagate the foreign key constraints to the `osm` chunk. * timescale#7234: Release the cache after accessing the cache entry. * timescale#7258 Force English in the pg_config command executed by cmake to avoid unexpected building errors * timescale#7270 Fix memory leak in compressed DML batch filtering **Thanks** * @MiguelTubio for reporting and fixing a Windows build error * @posuch for reporting the misleading extension description in the generic loader packages.

@MiguelTubio

This release contains performance improvements and bug fixes since the 2.16.1 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6882: Allow DELETE on the compressed chunks without decompression. * timescale#7033 Use MERGE statement on CAgg Refresh * timescale#7126: Add functions to show the compression information. * timescale#7147: Vectorize partial aggregation for `sum * timescale#7204: Track additional extensions in telemetry. * timescale#7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * timescale#7209: Add a function to drop the `osm` chunk. **Bugfixes** * timescale#7187: Fix the string literal length for the `compressed_data_info` function. * timescale#7191: Fix creating default indexes on chunks when migrating the data. * timescale#7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * timescale#7201: Use the generic extension description when building `apt` and `rpm` loader packages. * timescale#7227: Add an index to the `compression_chunk_size` catalog table. * timescale#7229: Fix the foreign key constraints where the index and the constraint column order are different. * timescale#7230: Do not propagate the foreign key constraints to the `osm` chunk. * timescale#7234: Release the cache after accessing the cache entry. * timescale#7258 Force English in the pg_config command executed by cmake to avoid unexpected building errors * timescale#7270 Fix memory leak in compressed DML batch filtering **Thanks** * @MiguelTubio for reporting and fixing a Windows build error * @posuch for reporting the misleading extension description in the generic loader packages.

@MiguelTubio

This release contains performance improvements and bug fixes since the 2.16.1 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6882: Allow DELETE on the compressed chunks without decompression. * timescale#7033 Use MERGE statement on CAgg Refresh * timescale#7126: Add functions to show the compression information. * timescale#7147: Vectorize partial aggregation for `sum * timescale#7204: Track additional extensions in telemetry. * timescale#7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * timescale#7209: Add a function to drop the `osm` chunk. **Bugfixes** * timescale#7187: Fix the string literal length for the `compressed_data_info` function. * timescale#7191: Fix creating default indexes on chunks when migrating the data. * timescale#7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * timescale#7201: Use the generic extension description when building `apt` and `rpm` loader packages. * timescale#7227: Add an index to the `compression_chunk_size` catalog table. * timescale#7229: Fix the foreign key constraints where the index and the constraint column order are different. * timescale#7230: Do not propagate the foreign key constraints to the `osm` chunk. * timescale#7234: Release the cache after accessing the cache entry. * timescale#7258 Force English in the pg_config command executed by cmake to avoid unexpected building errors * timescale#7270 Fix memory leak in compressed DML batch filtering **Thanks** * @MiguelTubio for reporting and fixing a Windows build error * @posuch for reporting the misleading extension description in the generic loader packages.

@MiguelTubio

This release contains performance improvements and bug fixes since the 2.16.1 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6882: Allow DELETE on the compressed chunks without decompression. * timescale#7033 Use MERGE statement on CAgg Refresh * timescale#7126: Add functions to show the compression information. * timescale#7147: Vectorize partial aggregation for `sum * timescale#7204: Track additional extensions in telemetry. * timescale#7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * timescale#7209: Add a function to drop the `osm` chunk. **Bugfixes** * timescale#7187: Fix the string literal length for the `compressed_data_info` function. * timescale#7191: Fix creating default indexes on chunks when migrating the data. * timescale#7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * timescale#7201: Use the generic extension description when building `apt` and `rpm` loader packages. * timescale#7227: Add an index to the `compression_chunk_size` catalog table. * timescale#7229: Fix the foreign key constraints where the index and the constraint column order are different. * timescale#7230: Do not propagate the foreign key constraints to the `osm` chunk. * timescale#7234: Release the cache after accessing the cache entry. * timescale#7258 Force English in the pg_config command executed by cmake to avoid unexpected building errors * timescale#7270 Fix memory leak in compressed DML batch filtering **Thanks** * @MiguelTubio for reporting and fixing a Windows build error * @posuch for reporting the misleading extension description in the generic loader packages.

@MiguelTubio

This release contains performance improvements and bug fixes since the 2.16.1 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6882: Allow DELETE on the compressed chunks without decompression. * timescale#7033 Use MERGE statement on CAgg Refresh * timescale#7126: Add functions to show the compression information. * timescale#7147: Vectorize partial aggregation for `sum * timescale#7200: Vectorize common aggregate functions like `min`, `max`, `sum`, `avg`, `stddev`, `variance` for compressed columns of arithmetic types, when there is grouping on segmentby columns or no grouping. * timescale#7204: Track additional extensions in telemetry. * timescale#7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * timescale#7209: Add a function to drop the `osm` chunk. * timescale#7275: Add support for RETURNING clause for MERGE * timescale#7295 Support ALTER TABLE SET ACCESS METHOD on hypertable **Bugfixes** * timescale#7187: Fix the string literal length for the `compressed_data_info` function. * timescale#7191: Fix creating default indexes on chunks when migrating the data. * timescale#7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * timescale#7201: Use the generic extension description when building `apt` and `rpm` loader packages. * timescale#7227: Add an index to the `compression_chunk_size` catalog table. * timescale#7229: Fix the foreign key constraints where the index and the constraint column order are different. * timescale#7230: Do not propagate the foreign key constraints to the `osm` chunk. * timescale#7234: Release the cache after accessing the cache entry. * timescale#7258 Force English in the pg_config command executed by cmake to avoid unexpected building errors * timescale#7270 Fix memory leak in compressed DML batch filtering * timescale#7286: Fix index column check while searching for index * timescale#7290 Add check for NULL offset for caggs built on top of caggs * timescale#7301 Make foreign key behaviour for hypertables consistent * timescale#7318: Fix chunk skipping range filtering * timescale#7320 Set license specific extension comment in install script **Thanks** * @MiguelTubio for reporting and fixing a Windows build error * @posuch for reporting the misleading extension description in the generic loader packages. * @snyrkill for discovering and reporting the issue

@MiguelTubio

This release adds support for PostgreSQL 17, significantly improves the performance of continuous aggregate refreshes, and contains performance improvements for analytical queries and delete operations over compressed hypertables. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.17.0** * Full PostgreSQL 17 support for all existing features. TimescaleDB v2.17 is available for PostgreSQL 14, 15, 16, and 17. * Significant performance improvements for continuous aggregate policies: continuous aggregate refresh is now using `merge` instead of deleting old materialized data and re-inserting. This update can decrease dramatically the amount of data that must be written on the continuous aggregate in the presence of a small number of changes, reduce the `i/o` cost of refreshing a continuous aggregate, and generate fewer Write-Ahead Logs (`WAL`). Overall, continuous aggregate policies will be more lightweight, use less system resources, and complete faster. * Increased performance for real-time analytical queries over compressed hypertables: we are excited to introduce additional Single Instruction, Multiple Data (`SIMD`) vectorization optimization to our engine by supporting vectorized execution for queries that group by using the `segment_by` column(s) and aggregate using the basic aggregate functions (`sum`, `count`, `avg`, `min`, `max`). Stay tuned for more to come in follow-up releases! Support for grouping on additional columns, filtered aggregation, vectorized expressions, and `time_bucket` is coming soon. * Improved performance of deletes on compressed hypertables when a large amount of data is affected. This improvement speeds up operations that delete whole segments by skipping the decompression step. It is enabled for all deletes that filter by the `segment_by` column(s). **PostgreSQL 14 deprecation announcement** We will continue supporting PostgreSQL 14 until April 2025. Closer to that time, we will announce the specific version of TimescaleDB in which PostgreSQL 14 support will not be included going forward. **Features** * #6882: Allow delete of full segments on compressed chunks without decompression. * #7033: Use `merge` statement on continuous aggregates refresh. * #7126: Add functions to show the compression information. * #7147: Vectorize partial aggregation for `sum(int4)` with grouping on `segment by` columns. * #7204: Track additional extensions in telemetry. * #7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * #7209: Add a function to drop the `osm` chunk. * #7275: Add support for the `returning` clause for `merge`. * #7200: Vectorize common aggregate functions like `min`, `max`, `sum`, `avg`, `stddev`, `variance` for compressed columns of arithmetic types, when there is grouping on `segment by` columns or no grouping. **Bug fixes** * #7187: Fix the string literal length for the `compressed_data_info` function. * #7191: Fix creating default indexes on chunks when migrating the data. * #7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * #7201: Use the generic extension description when building `apt` and `rpm` loader packages. * #7227: Add an index to the `compression_chunk_size` catalog table. * #7229: Fix the foreign key constraints where the index and the constraint column order are different. * #7230: Do not propagate the foreign key constraints to the `osm` chunk. * #7234: Release the cache after accessing the cache entry. * #7258: Force English in the `pg_config` command executed by `cmake` to avoid the unexpected building errors. * #7270: Fix the memory leak in compressed DML batch filtering. * #7286: Fix the index column check while searching for the index. * #7290: Add check for null offset for continuous aggregates built on top of continuous aggregates. * #7301: Make foreign key behavior for hypertables consistent. * #7318: Fix chunk skipping range filtering. * #7320: Set the license specific extension comment in the install script. **Thanks** * @MiguelTubio for reporting and fixing the Windows build error. * @posuch for reporting the misleading extension description in the generic loader packages. * @snyrkill for discovering and reporting the issue with continuous aggregates built on top of continuous aggregates.

@MiguelTubio

This release adds support for PostgreSQL 17, significantly improves the performance of continuous aggregate refreshes, and contains performance improvements for analytical queries and delete operations over compressed hypertables. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.17.0** * Full PostgreSQL 17 support for all existing features. TimescaleDB v2.17 is available for PostgreSQL 14, 15, 16, and 17. * Significant performance improvements for continuous aggregate policies: continuous aggregate refresh is now using `merge` instead of deleting old materialized data and re-inserting. This update can decrease dramatically the amount of data that must be written on the continuous aggregate in the presence of a small number of changes, reduce the `i/o` cost of refreshing a continuous aggregate, and generate fewer Write-Ahead Logs (`WAL`). Overall, continuous aggregate policies will be more lightweight, use less system resources, and complete faster. * Increased performance for real-time analytical queries over compressed hypertables: we are excited to introduce additional Single Instruction, Multiple Data (`SIMD`) vectorization optimization to our engine by supporting vectorized execution for queries that group by using the `segment_by` column(s) and aggregate using the basic aggregate functions (`sum`, `count`, `avg`, `min`, `max`). Stay tuned for more to come in follow-up releases! Support for grouping on additional columns, filtered aggregation, vectorized expressions, and `time_bucket` is coming soon. * Improved performance of deletes on compressed hypertables when a large amount of data is affected. This improvement speeds up operations that delete whole segments by skipping the decompression step. It is enabled for all deletes that filter by the `segment_by` column(s). **PostgreSQL 14 deprecation announcement** We will continue supporting PostgreSQL 14 until April 2025. Closer to that time, we will announce the specific version of TimescaleDB in which PostgreSQL 14 support will not be included going forward. **Features** * #6882: Allow delete of full segments on compressed chunks without decompression. * #7033: Use `merge` statement on continuous aggregates refresh. * #7126: Add functions to show the compression information. * #7147: Vectorize partial aggregation for `sum(int4)` with grouping on `segment by` columns. * #7204: Track additional extensions in telemetry. * #7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * #7209: Add a function to drop the `osm` chunk. * #7275: Add support for the `returning` clause for `merge`. * #7200: Vectorize common aggregate functions like `min`, `max`, `sum`, `avg`, `stddev`, `variance` for compressed columns of arithmetic types, when there is grouping on `segment by` columns or no grouping. **Bug fixes** * #7187: Fix the string literal length for the `compressed_data_info` function. * #7191: Fix creating default indexes on chunks when migrating the data. * #7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * #7201: Use the generic extension description when building `apt` and `rpm` loader packages. * #7227: Add an index to the `compression_chunk_size` catalog table. * #7229: Fix the foreign key constraints where the index and the constraint column order are different. * #7230: Do not propagate the foreign key constraints to the `osm` chunk. * #7234: Release the cache after accessing the cache entry. * #7258: Force English in the `pg_config` command executed by `cmake` to avoid the unexpected building errors. * #7270: Fix the memory leak in compressed DML batch filtering. * #7286: Fix the index column check while searching for the index. * #7290: Add check for null offset for continuous aggregates built on top of continuous aggregates. * #7301: Make foreign key behavior for hypertables consistent. * #7318: Fix chunk skipping range filtering. * #7320: Set the license specific extension comment in the install script. **Thanks** * @MiguelTubio for reporting and fixing the Windows build error. * @posuch for reporting the misleading extension description in the generic loader packages. * @snyrkill for discovering and reporting the issue with continuous aggregates built on top of continuous aggregates. --------- Signed-off-by: Pallavi Sontakke <pallavi@timescale.com> Signed-off-by: Yannis Roussos <iroussos@gmail.com> Signed-off-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com> Co-authored-by: Yannis Roussos <iroussos@gmail.com> Co-authored-by: atovpeko <114177030+atovpeko@users.noreply.github.com> Co-authored-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com>

@MiguelTubio

This release adds support for PostgreSQL 17, significantly improves the performance of continuous aggregate refreshes, and contains performance improvements for analytical queries and delete operations over compressed hypertables. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.17.0** * Full PostgreSQL 17 support for all existing features. TimescaleDB v2.17 is available for PostgreSQL 14, 15, 16, and 17. * Significant performance improvements for continuous aggregate policies: continuous aggregate refresh is now using `merge` instead of deleting old materialized data and re-inserting. This update can decrease dramatically the amount of data that must be written on the continuous aggregate in the presence of a small number of changes, reduce the `i/o` cost of refreshing a continuous aggregate, and generate fewer Write-Ahead Logs (`WAL`). Overall, continuous aggregate policies will be more lightweight, use less system resources, and complete faster. * Increased performance for real-time analytical queries over compressed hypertables: we are excited to introduce additional Single Instruction, Multiple Data (`SIMD`) vectorization optimization to our engine by supporting vectorized execution for queries that group by using the `segment_by` column(s) and aggregate using the basic aggregate functions (`sum`, `count`, `avg`, `min`, `max`). Stay tuned for more to come in follow-up releases! Support for grouping on additional columns, filtered aggregation, vectorized expressions, and `time_bucket` is coming soon. * Improved performance of deletes on compressed hypertables when a large amount of data is affected. This improvement speeds up operations that delete whole segments by skipping the decompression step. It is enabled for all deletes that filter by the `segment_by` column(s). **PostgreSQL 14 deprecation announcement** We will continue supporting PostgreSQL 14 until April 2025. Closer to that time, we will announce the specific version of TimescaleDB in which PostgreSQL 14 support will not be included going forward. **Features** * timescale#6882: Allow delete of full segments on compressed chunks without decompression. * timescale#7033: Use `merge` statement on continuous aggregates refresh. * timescale#7126: Add functions to show the compression information. * timescale#7147: Vectorize partial aggregation for `sum(int4)` with grouping on `segment by` columns. * timescale#7204: Track additional extensions in telemetry. * timescale#7207: Refactor the `decompress_batches_scan` functions for easier maintenance. * timescale#7209: Add a function to drop the `osm` chunk. * timescale#7275: Add support for the `returning` clause for `merge`. * timescale#7200: Vectorize common aggregate functions like `min`, `max`, `sum`, `avg`, `stddev`, `variance` for compressed columns of arithmetic types, when there is grouping on `segment by` columns or no grouping. **Bug fixes** * timescale#7187: Fix the string literal length for the `compressed_data_info` function. * timescale#7191: Fix creating default indexes on chunks when migrating the data. * timescale#7195: Fix the `segment by` and `order by` checks when dropping a column from a compressed hypertable. * timescale#7201: Use the generic extension description when building `apt` and `rpm` loader packages. * timescale#7227: Add an index to the `compression_chunk_size` catalog table. * timescale#7229: Fix the foreign key constraints where the index and the constraint column order are different. * timescale#7230: Do not propagate the foreign key constraints to the `osm` chunk. * timescale#7234: Release the cache after accessing the cache entry. * timescale#7258: Force English in the `pg_config` command executed by `cmake` to avoid the unexpected building errors. * timescale#7270: Fix the memory leak in compressed DML batch filtering. * timescale#7286: Fix the index column check while searching for the index. * timescale#7290: Add check for null offset for continuous aggregates built on top of continuous aggregates. * timescale#7301: Make foreign key behavior for hypertables consistent. * timescale#7318: Fix chunk skipping range filtering. * timescale#7320: Set the license specific extension comment in the install script. **Thanks** * @MiguelTubio for reporting and fixing the Windows build error. * @posuch for reporting the misleading extension description in the generic loader packages. * @snyrkill for discovering and reporting the issue with continuous aggregates built on top of continuous aggregates. --------- Signed-off-by: Pallavi Sontakke <pallavi@timescale.com> Signed-off-by: Yannis Roussos <iroussos@gmail.com> Signed-off-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com> Co-authored-by: Yannis Roussos <iroussos@gmail.com> Co-authored-by: atovpeko <114177030+atovpeko@users.noreply.github.com> Co-authored-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com>

mkindahl force-pushed the add-compression-chunk-size-index branch from 547050d to 576ddc8 Compare September 2, 2024 11:57

mkindahl self-assigned this Sep 2, 2024

fabriziomello approved these changes Sep 2, 2024

View reviewed changes

akuzm approved these changes Sep 4, 2024

View reviewed changes

mkindahl merged commit e1eeedb into timescale:main Sep 4, 2024
37 of 38 checks passed

mkindahl deleted the add-compression-chunk-size-index branch September 4, 2024 08:28

pallavisontakke mentioned this pull request Sep 20, 2024

Release 2.17.0 #7285

Merged

pallavisontakke mentioned this pull request Oct 8, 2024

Release 2.17.0 #7328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index to compression_chunk_size catalog table #7227

Add index to compression_chunk_size catalog table #7227

mkindahl commented Sep 2, 2024

mkindahl commented Sep 2, 2024

codecov bot commented Sep 2, 2024 •

edited

Loading

Add index to compression_chunk_size catalog table #7227

Add index to compression_chunk_size catalog table #7227

Conversation

mkindahl commented Sep 2, 2024

mkindahl commented Sep 2, 2024

codecov bot commented Sep 2, 2024 • edited Loading

Codecov Report

codecov bot commented Sep 2, 2024 •

edited

Loading