-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: perform a full internal scrub at the end of TS tests #14349
Conversation
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5c98-f2ef-446a-a552-5eae066ed7d2: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_shadow_indexing_default_local_retention.cluster_remote_write=False.topic_remote_write=false.cloud_storage_type=CloudStorageType.S3" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5c98-f2ea-4435-ac18-2ff2b29da365: "rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5c98-f2f3-4f44-a853-0c9726a70d53: "rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=False.cloud_storage_type=CloudStorageType.ABS" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5c98-f2f7-41e3-a545-6d5c565fa74c: "rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=False.cloud_storage_type=CloudStorageType.S3" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5ca7-8adc-4cf8-83b4-300a8dd69b27: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_shadow_indexing_default_local_retention.cluster_remote_write=False.topic_remote_write=false.cloud_storage_type=CloudStorageType.ABS" |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39583#018b5c98-f2ea-4435-ac18-2ff2b29da365 |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5ca7-8ae4-4324-a723-665d80148160: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_shadow_indexing_default_local_retention.cluster_remote_write=False.topic_remote_write=false.cloud_storage_type=CloudStorageType.S3" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5ca7-8ae8-4841-a76d-a26696eac637: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_shadow_indexing_default_local_retention.cluster_remote_write=False.topic_remote_write=-1.cloud_storage_type=CloudStorageType.ABS" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39583#018b5ca7-8aec-4ec0-8285-ec369d53ff5f: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_shadow_indexing_default_local_retention.cluster_remote_write=False.topic_remote_write=-1.cloud_storage_type=CloudStorageType.S3" |
e66dc85
to
4ff237b
Compare
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dba-2659-4e65-a535-1f08e221bdd6: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_local_time_based_retention_is_overridden.local_retention_ms=3600000.cloud_storage_type=CloudStorageType.ABS" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dc5-0e55-4fd4-9812-1d126c6039b9: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_local_time_based_retention_is_overridden.local_retention_ms=3600000.cloud_storage_type=CloudStorageType.ABS" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dba-2655-427b-8ba1-37d18eee17c0: "rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_size_based_retention_application.cloud_storage_type=CloudStorageType.S3" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dba-265f-40c5-ae1a-1267838353c5: "rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.bytes" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dc5-0e51-416a-b488-da2ed2722c49: "rptest.tests.read_replica_e2e_test.TestReadReplicaService.test_identical_hwms.partition_count=5.cloud_storage_type=CloudStorageType.S3" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dba-265c-45fa-8ce8-728170feb511: "rptest.tests.retention_policy_test.ShadowIndexingLocalRetentionTest.test_local_time_based_retention_is_overridden.local_retention_ms=3600000.cloud_storage_type=CloudStorageType.S3" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dc5-0e58-4865-be56-1e4a8f2ccdd4: "rptest.tests.e2e_shadow_indexing_test.ShadowIndexingInfiniteRetentionTest.test_segments_not_deleted.cloud_storage_type=CloudStorageType.ABS" |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39614#018b5dba-2655-427b-8ba1-37d18eee17c0 |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39614#018b5dba-265f-40c5-ae1a-1267838353c5 |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/39614#018b5dc5-0e5c-48d4-b653-020d8b98fe3f: "rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.bytes" |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39614#018b5dc5-0e5c-48d4-b653-020d8b98fe3f |
4ff237b
to
27dc66e
Compare
/cdt |
766a497
to
9073dff
Compare
319ffb9
to
5002759
Compare
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8acb-aedb-497a-8cec-ceb38b79c74b: "rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8acb-aede-4c36-b7bc-db4bff41983e: "rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8ad1-4a92-44e1-820e-8dd44c663ad9: "rptest.tests.read_replica_e2e_test.ReadReplicasUpgradeTest.test_upgrades.cloud_storage_type=CloudStorageType.S3" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8ad1-4a98-4be3-aee9-91c98b8f48f9: "rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8ad1-4a95-42a8-b8fa-f7046493f0ea: "rptest.tests.cluster_features_test.FeaturesSingleNodeUpgradeTest.test_upgrade" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8ad1-4a9b-4a88-be6b-f199932175c6: "rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8acb-aed5-49ef-9a84-2478a975a6b1: "rptest.tests.cluster_features_test.FeaturesSingleNodeTest.test_get_features" |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40228#018b8acb-aed8-454d-8ddc-679881cf1904: "rptest.tests.cluster_features_test.FeaturesSingleNodeUpgradeTest.test_upgrade" |
/ci-repeat |
new failures detected in https://buildkite.com/redpanda/redpanda/builds/40294#018b8fd6-e3f6-4650-9102-b0985d6b6525: "rptest.tests.rpk_registry_test.RpkRegistryTest.test_produce_consume_avro" |
If a scrub has not reached the end of the log, we call that a partial scrub, otherwise it's a full scrub. For integrating scrubbing into our ducktape testing suite, it will be useful to make the full scrub delay larger. However, in production I expect both configs to have the same value generally. This commit teaches the `scrubber_scheduller` how to juggle the two timestamps, introduces the new cluster configuration and threads it into the rest of the `archival` code.
It will prove useful to reset all of the scrubbing metadata when adding a scrubbing run at the end of our ducktape tests. This commit introduces the a new command that does just that and exposes it via the archiver.
This will come in handy for forcing Redpanda to scrub aggresively at the end of ducktape tests.
Resetting scrubbing metadata has the purpose of triggering a new full scrub at the end of ducktape tests. Since the scheduler uses said metadata to pick the time for the next scrub, we also need to inform it that the metadata has changed and a rescheduling is needed.
This commit introduces a new admin api endpoint which can be used to reset the scrubbing metadata of one partition. It does this by replicating the `reset_scrubbing_metadata` which was introduced in the previous patch.
When a full scrub of the log finishes, via one or more scrubs, persist the timestamp at which this happened. This timestamp is then exposed via the admin API. We will use this in our ducktape tests for end of test scrubbing.
This commit updates the RedpandaService such that it will perform an internal scrub (as opposed to external via rp-storage-tool) at the end of all tiered storage tests. This is achieved by configuring the scrubber to be aggresive until it reaches the end of the log. Once that happens, it will wait for the full scrub timeout (10 minutes in this commit).
5002759
to
dd27307
Compare
Changes in force-push:
|
This patch set has the goal of performing an internal (as opposed to external via rp-storage-tool) scrub
for each partition at the end of each tiered storage tests. This change is made in the final commit, but
there are a number of prerequisites:
cloud_storage_scrubbing_interval_ms
config into two separate configs:cloud_storage_{partia|full}_scrub_interval_ms
. The partial interval is applied for scrubswhich continue the work of a previous scrub. The full interval is applied after a scrub which
has reached the end of the log. In production, I expect these values to always be the same,
but they're very useful for testing as it allows us to pause scrubbing after the end of the log was reached.
before the final scrub:
v1/cloud_storage/reset_scrubbing_metadata/{namespace}/{topic}/{partition}
to figure out when to stop waiting.
Fixes #13886
Backports Required
Release Notes