Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compaction): avoid count_deltas() call if nothing changed #6868

Closed
wants to merge 7 commits into from

Conversation

problame
Copy link
Contributor

@problame problame commented Feb 21, 2024

Changes

  • keep track of LayerMap rebuilds
  • keep track of repartition() rebuilds
    • introduce some more typing while at it
  • add skipping logic to create_image_layers

The skipping logic is simple: if we've seen the same repartioning & layermap version before, bail early.

The underlying invariant is that only these two inputs determine whether we would produce any new image layers.

Effectively, this is a cache invalidation problem.
These are notoriously hard to get right, and even harder to maintain.
Counter-measures that we could implement in a later PR:

  • randomized expiration time
  • probablistic check, i.e., every 100th call, we run the computation anyway and warn if it did produce new image layers even though or cache said we wouldn't.

part of #6861

…ove backwards

This PR enforces aspects of `Timeline::repartition` that were already
true at runtime:

- it's not called concurrently, so, bail out if it is anyway (see
  comment why it's not called concurrently)
- the `lsn` should never be moving backwards over the lifetime of a
  Timeline object, because last_record_lsn() can only move forwards
  over the lifetime of a Timeline object

part of #6861
Copy link

2466 tests run: 2267 passed, 78 failed, 121 skipped (full report)


Failures on Postgres 16

Failures on Postgres 15

Failures on Postgres 14

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_gc_of_remote_layers[release-pg14] or test_gc_of_remote_layers[debug-pg14] or test_issue_5878[release-pg14] or test_issue_5878[debug-pg14] or test_compaction_downloads_on_demand_with_image_creation[release-pg14] or test_compaction_downloads_on_demand_with_image_creation[debug-pg14] or test_deferred_deletion[release-pg14] or test_deferred_deletion[debug-pg14] or test_emergency_mode[release-pg14] or test_emergency_mode[debug-pg14] or test_deletion_queue_recovery[release-pg14-validate-keep] or test_deletion_queue_recovery[debug-pg14-validate-keep] or test_eviction_across_generations[release-pg14] or test_eviction_across_generations[debug-pg14] or test_deletion_queue_recovery[release-pg14-no-validate-lose] or test_deletion_queue_recovery[debug-pg14-no-validate-lose] or test_deletion_queue_recovery[release-pg14-validate-lose] or test_deletion_queue_recovery[debug-pg14-validate-lose] or test_generations_upgrade[release-pg14] or test_generations_upgrade[debug-pg14] or test_deletion_queue_recovery[release-pg14-no-validate-keep] or test_deletion_queue_recovery[debug-pg14-no-validate-keep] or test_remote_timeline_client_calls_started_metric[release-pg14] or test_remote_timeline_client_calls_started_metric[debug-pg14] or test_remote_storage_upload_queue_retries[release-pg14] or test_remote_storage_upload_queue_retries[debug-pg14] or test_gc_of_remote_layers[release-pg15] or test_gc_of_remote_layers[debug-pg15] or test_issue_5878[release-pg15] or test_issue_5878[debug-pg15] or test_compaction_downloads_on_demand_with_image_creation[release-pg15] or test_compaction_downloads_on_demand_with_image_creation[debug-pg15] or test_deferred_deletion[release-pg15] or test_deferred_deletion[debug-pg15] or test_deletion_queue_recovery[release-pg15-validate-lose] or test_deletion_queue_recovery[debug-pg15-validate-lose] or test_eviction_across_generations[release-pg15] or test_eviction_across_generations[debug-pg15] or test_deletion_queue_recovery[release-pg15-no-validate-lose] or test_deletion_queue_recovery[debug-pg15-no-validate-lose] or test_generations_upgrade[release-pg15] or test_generations_upgrade[debug-pg15] or test_emergency_mode[release-pg15] or test_emergency_mode[debug-pg15] or test_deletion_queue_recovery[release-pg15-validate-keep] or test_deletion_queue_recovery[debug-pg15-validate-keep] or test_deletion_queue_recovery[release-pg15-no-validate-keep] or test_deletion_queue_recovery[debug-pg15-no-validate-keep] or test_remote_timeline_client_calls_started_metric[release-pg15] or test_remote_timeline_client_calls_started_metric[debug-pg15] or test_remote_storage_upload_queue_retries[release-pg15] or test_remote_storage_upload_queue_retries[debug-pg15] or test_gc_of_remote_layers[release-pg16] or test_gc_of_remote_layers[debug-pg16] or test_issue_5878[release-pg16] or test_issue_5878[debug-pg16] or test_compaction_downloads_on_demand_with_image_creation[release-pg16] or test_compaction_downloads_on_demand_with_image_creation[debug-pg16] or test_deferred_deletion[release-pg16] or test_deferred_deletion[debug-pg16] or test_deletion_queue_recovery[release-pg16-validate-lose] or test_deletion_queue_recovery[debug-pg16-validate-lose] or test_generations_upgrade[release-pg16] or test_generations_upgrade[debug-pg16] or test_deletion_queue_recovery[release-pg16-validate-keep] or test_emergency_mode[release-pg16] or test_emergency_mode[debug-pg16] or test_deletion_queue_recovery[release-pg16-no-validate-lose] or test_deletion_queue_recovery[debug-pg16-no-validate-lose] or test_eviction_across_generations[release-pg16] or test_eviction_across_generations[debug-pg16] or test_deletion_queue_recovery[release-pg16-no-validate-keep] or test_deletion_queue_recovery[debug-pg16-no-validate-keep] or test_secondary_downloads[release-pg16] or test_remote_storage_upload_queue_retries[release-pg16] or test_remote_storage_upload_queue_retries[debug-pg16] or test_remote_timeline_client_calls_started_metric[release-pg16] or test_remote_timeline_client_calls_started_metric[debug-pg16]"
Flaky tests (1)

Postgres 15

  • test_delete_timeline_client_hangup: debug

Test coverage report is not available

The comment gets automatically updated with the latest test results
98d0bca at 2024-02-21T19:33:39.656Z :recycle:

Base automatically changed from problame/repartition-bail-on-concurrent-call to main February 26, 2024 10:22
@problame
Copy link
Contributor Author

problame commented Apr 3, 2024

superseded by #7230

@problame problame closed this Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant