Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize segmentwise recompression #7482

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

antekresic
Copy link
Contributor

@antekresic antekresic commented Nov 22, 2024

With the recent change to compression schema, we can optimize segmentwise recompression and target only batches which are affected to be recompressed or compress tuples which don't belong to any batch without any decompression. This means narrowing down the amount of that we need to decompress and recompress, ultimately speeding up the operation and generating less bloat in the compressed chunk.

Benchmark with improvements (2-3x in certain cases):
https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3959&var-run2=3960&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false

@antekresic antekresic self-assigned this Nov 22, 2024
@erimatnor
Copy link
Contributor

Do we have benchmarks to confirm that this is faster?

@antekresic antekresic force-pushed the antekresic/optimize_recompression branch 4 times, most recently from a1b8376 to ae32773 Compare November 28, 2024 09:22
@antekresic antekresic marked this pull request as ready for review November 28, 2024 10:47
@antekresic antekresic added compression enhancement An enhancement to an existing feature for functionality labels Nov 28, 2024
@antekresic antekresic added this to the TimescaleDB 2.18.0 milestone Nov 28, 2024
* 4: frozen
* 8: compressed_partial
*/
if (!ts_chunk_is_compressed(uncompressed_chunk) && ts_chunk_is_partial(uncompressed_chunk))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition reads a little weird, how can chunk be uncompressed and partially compressed at the same time? Might be not a topic for this PR, but we might have to rename ts_chunk_is_compressed to something like ts_chunk_is_fully_compressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that check seems to check for two bitfields which should always accompany each other (compressed and partially compressed). Having partially compressed without the compressed bit should be an internal error.

Maybe Ensure is more suitable for this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this in a follow-up PR since its the existing state which I've just moved to a separate file.

* 8: compressed_partial
*/
if (!ts_chunk_is_compressed(uncompressed_chunk) && ts_chunk_is_partial(uncompressed_chunk))
elog(ERROR,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this still can be triggered by a user calling it for an unsuitable chunk? Needs better errcode then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this shouldn't not happen in regular execution. That's why I think Ensure would be better suited here.

Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so this is the sorted merge recompression algorithm we discussed a long time ago.

@antekresic antekresic force-pushed the antekresic/optimize_recompression branch 2 times, most recently from a560496 to c0c2c14 Compare December 19, 2024 12:48
tsl_recompress_chunk_segmentwise(PG_FUNCTION_ARGS)
{
Oid uncompressed_chunk_id = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0);
bool if_not_compressed = PG_ARGISNULL(1) ? true : PG_GETARG_BOOL(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is true here while in the function comment it is stated as false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address this in a follow-up PR since its current state and has very little to do with the change here.

@antekresic antekresic force-pushed the antekresic/optimize_recompression branch from c0c2c14 to 4b4d270 Compare December 19, 2024 13:15
With the recent change to compression schema, we can optimize
segmentwise recompression and target only batches which are
affected to be recompressed or compress tuples which don't
belong to any batch without any decompression. This means
narrowing down the amount of that we need to decompress
and recompress, ultimately speeding up the operation and
generating less bloat in the compressed chunk.
@antekresic antekresic force-pushed the antekresic/optimize_recompression branch from 4b4d270 to 14f528c Compare December 19, 2024 13:17
@antekresic antekresic enabled auto-merge (rebase) December 19, 2024 13:28
@antekresic antekresic merged commit 44dd49e into main Dec 19, 2024
48 checks passed
@antekresic antekresic deleted the antekresic/optimize_recompression branch December 19, 2024 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compression enhancement An enhancement to an existing feature for functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants