Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_utils: add scoped config resetter #13146

Merged
merged 2 commits into from
Sep 1, 2023

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Aug 31, 2023

Several tests set configs directly with config::shard_local_config
without cleaning them up after. This means that the next test that gets
run in the same process may be affected by whatever a previous test had
set.

This introduces a new scoped wrapper around config::shard_local_config
to track what properties may have been updated, so it may reset them
upon call to destructor.

Fixes #12839

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@andrwng
Copy link
Contributor Author

andrwng commented Aug 31, 2023

/ci-repeat

@andijcr
Copy link
Contributor

andijcr commented Aug 31, 2023

very cool

i ran a search for tests that uses shard_local_cfg find src/v/ -type d -name tests -print | xargs -I {} rg shard_local_cfg {} and here's the result. A bunch of them access the property directly.

I wonder if there is an alternative to change all of them to the .get("...") form

src/v/archival/tests/ntp_archiver_reupload_test.cc:    config::shard_local_cfg()
src/v/archival/tests/ntp_archiver_reupload_test.cc:    config::shard_local_cfg()
src/v/archival/tests/ntp_archiver_test.cc:    config::shard_local_cfg().delete_retention_ms.set_value(
src/v/archival/tests/ntp_archiver_test.cc:    config::shard_local_cfg().delete_retention_ms.reset();
src/v/archival/tests/ntp_archiver_test.cc:    config::shard_local_cfg()
src/v/archival/tests/ntp_archiver_test.cc:    config::shard_local_cfg().delete_retention_ms.set_value(
src/v/archival/tests/ntp_archiver_test.cc:    config::shard_local_cfg().delete_retention_ms.set_value(
src/v/archival/tests/ntp_archiver_test.cc:    config::shard_local_cfg()
src/v/archival/tests/ntp_archiver_test.cc:    config::shard_local_cfg().cloud_storage_spillover_manifest_size.set_value(
src/v/archival/tests/service_fixture.cc:        auto& cfg = config::shard_local_cfg();
src/v/archival/tests/service_fixture.cc:    config::shard_local_cfg().cloud_storage_enabled.set_value(false);
src/v/cloud_roles/tests/fetch_credentials_tests.cc:    config::shard_local_cfg().cloud_storage_credentials_host.set_value(
src/v/cloud_roles/tests/fetch_credentials_tests.cc:    config::shard_local_cfg()
src/v/cloud_storage/tests/remote_partition_fuzz_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/remote_partition_fuzz_test.cc:        config::shard_local_cfg().cloud_storage_max_segment_readers_per_shard(
src/v/cloud_storage/tests/cloud_storage_fixture.h:        config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_fixture.h:        config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_fixture.h:        config::shard_local_cfg()
src/v/cloud_storage/tests/s3_imposter.h:        config::shard_local_cfg().cloud_storage_enabled.set_value(false);
src/v/cloud_storage/tests/remote_partition_test.cc:    if (config::shard_local_cfg().cloud_storage_disable_chunk_reads()) {
src/v/cloud_storage/tests/remote_partition_test.cc:          == min_segments * config::shard_local_cfg().log_segment_size());
src/v/cloud_storage/tests/remote_partition_test.cc:          == wanted_segments * config::shard_local_cfg().log_segment_size());
src/v/cloud_storage/tests/remote_partition_test.cc:               * config::shard_local_cfg().cloud_storage_cache_chunk_size());
src/v/cloud_storage/tests/remote_partition_test.cc:               * config::shard_local_cfg().cloud_storage_cache_chunk_size());
src/v/cloud_storage/tests/remote_partition_test.cc:               * config::shard_local_cfg().cloud_storage_cache_chunk_size());
src/v/cloud_storage/tests/remote_partition_test.cc:               * config::shard_local_cfg().cloud_storage_cache_chunk_size());
src/v/cloud_storage/tests/remote_partition_test.cc:               * config::shard_local_cfg().cloud_storage_cache_chunk_size());
src/v/cloud_storage/tests/remote_partition_test.cc:               * config::shard_local_cfg().cloud_storage_cache_chunk_size());
src/v/cloud_storage/tests/s3_imposter.cc:        auto& cfg = config::shard_local_cfg();
src/v/cloud_storage/tests/util.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/util.cc:        config::shard_local_cfg().cloud_storage_max_segment_readers_per_shard(
src/v/cloud_storage/tests/util.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/util.cc:        config::shard_local_cfg().cloud_storage_max_segment_readers_per_shard(
src/v/cloud_storage/tests/produce_utils.h:              config::shard_local_cfg()
src/v/cloud_storage/tests/produce_utils.h:          config::shard_local_cfg()
src/v/cloud_storage/tests/read_replica_test.cc:        config::shard_local_cfg().enable_metrics_reporter.set_value(false);
src/v/cloud_storage/tests/read_replica_test.cc:        config::shard_local_cfg().disable_metrics.set_value(true);
src/v/cloud_storage/tests/read_replica_test.cc:        config::shard_local_cfg().disable_public_metrics.set_value(true);
src/v/cloud_storage/tests/read_replica_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/read_replica_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/read_replica_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/manual_fixture.h:        config::shard_local_cfg()
src/v/cloud_storage/tests/manual_fixture.h:        config::shard_local_cfg()
src/v/cloud_storage/tests/remote_test.cc:        config::shard_local_cfg().cloud_storage_backend.value()) {
src/v/cloud_storage/tests/remote_test.cc:        config::shard_local_cfg().cloud_storage_backend.set_value(
src/v/cloud_storage/tests/remote_test.cc:        config::shard_local_cfg().cloud_storage_backend.set_value(
src/v/cloud_storage/tests/delete_records_e2e_test.cc:        config::shard_local_cfg().log_compaction_interval_ms.set_value(
src/v/cloud_storage/tests/delete_records_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/delete_records_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/delete_records_e2e_test.cc:        config::shard_local_cfg().enable_metrics_reporter.set_value(false);
src/v/cloud_storage/tests/delete_records_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/delete_records_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/delete_records_e2e_test.cc:        config::shard_local_cfg().retention_local_strict.set_value(true);
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg().cloud_storage_spillover_manifest_size.set_value(
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg().cloud_storage_enable_segment_merging.set_value(
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg().enable_metrics_reporter.set_value(false);
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg().retention_local_strict.set_value(true);
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:        config::shard_local_cfg().log_compaction_interval_ms.set_value(
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:        config::shard_local_cfg().enable_metrics_reporter.set_value(false);
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:        config::shard_local_cfg()
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg().fetch_max_bytes.set_value(size_t{10});
src/v/cloud_storage/tests/cloud_storage_e2e_test.cc:    config::shard_local_cfg().retention_local_trim_interval.set_value(
src/v/cloud_storage/tests/topic_recovery_service_test.cc:      config::shard_local_cfg()
src/v/cloud_storage/tests/async_manifest_view_test.cc:        config::shard_local_cfg().cloud_storage_manifest_cache_size.set_value(
src/v/cloud_storage/tests/async_manifest_view_test.cc:        config::shard_local_cfg().cloud_storage_manifest_cache_ttl_ms.set_value(
src/v/cloud_storage/tests/remote_segment_test.cc:    config::shard_local_cfg().cloud_storage_cache_chunk_size.set_value(
src/v/cloud_storage/tests/remote_segment_test.cc:        config::shard_local_cfg().cloud_storage_cache_chunk_size(), true));
src/v/cloud_storage/tests/remote_segment_test.cc:    config::shard_local_cfg().cloud_storage_cache_chunk_size.set_value(
src/v/cloud_storage/tests/remote_segment_test.cc:    config::shard_local_cfg().cloud_storage_cache_chunk_size.set_value(
src/v/cloud_storage/tests/remote_segment_test.cc:    config::shard_local_cfg().cloud_storage_cache_chunk_size.set_value(
src/v/cloud_storage/tests/remote_segment_test.cc:    config::shard_local_cfg().cloud_storage_cache_chunk_size.set_value(
src/v/cloud_storage/tests/remote_segment_test.cc:    config::shard_local_cfg().cloud_storage_cache_chunk_size.set_value(
src/v/cloud_storage/tests/remote_segment_test.cc:    config::shard_local_cfg().cloud_storage_chunk_prefetch.set_value(
src/v/cluster/tests/local_monitor_test.cc:            return config::shard_local_cfg()
src/v/cluster/tests/local_monitor_test.cc:            return config::shard_local_cfg()
src/v/cluster/tests/local_monitor_test.cc:            return config::shard_local_cfg().storage_min_free_bytes.bind();
src/v/cluster/tests/local_monitor_test.cc:        config::shard_local_cfg()
src/v/cluster/tests/local_monitor_test.cc:        config::shard_local_cfg()
src/v/cluster/tests/local_monitor_test.cc:        config::shard_local_cfg()
src/v/cluster/tests/manual_log_deletion_test.cc:        config::shard_local_cfg().log_segment_size_min.set_value(
src/v/cluster/tests/partition_allocator_fixture.h:            config::shard_local_cfg()
src/v/cluster/tests/cluster_test_fixture.h:        config::shard_local_cfg().get(p_name).set_value(v);
src/v/cluster/cloud_metadata/tests/uploader_test.cc:    config::shard_local_cfg()
src/v/kafka/server/tests/alter_config_test.cc:        "{}", config::shard_local_cfg().delete_retention_ms().value_or(-1ms)),
src/v/kafka/server/tests/alter_config_test.cc:        "{}", config::shard_local_cfg().delete_retention_ms().value_or(-1ms)),
src/v/kafka/server/tests/topic_recreate_test.cc:            auto& config = config::shard_local_cfg();
src/v/kafka/server/tests/produce_consume_test.cc:            config::shard_local_cfg().get(name).set_value(value);
src/v/kafka/server/tests/produce_consume_test.cc:            config::shard_local_cfg()
src/v/kafka/server/tests/produce_consume_test.cc:            config::shard_local_cfg()
src/v/kafka/server/tests/produce_consume_test.cc:            config::shard_local_cfg()
src/v/kafka/server/tests/produce_consume_test.cc:            auto& config = config::shard_local_cfg();
src/v/kafka/server/tests/produce_consume_test.cc:    config::shard_local_cfg().log_message_timestamp_alert_after_ms.set_value(
src/v/kafka/server/tests/produce_consume_test.cc:    config::shard_local_cfg().log_message_timestamp_alert_before_ms.set_value(
src/v/kafka/server/tests/produce_consume_test.cc:    config::shard_local_cfg().log_message_timestamp_alert_before_ms.set_value(
src/v/raft/tests/mux_state_machine_fixture.h:            config::shard_local_cfg().raft_election_timeout_ms.set_value(10ms);
src/v/raft/tests/raft_group_fixture.h:                return config::shard_local_cfg()
src/v/raft/tests/raft_group_fixture.h:                return config::shard_local_cfg()
src/v/raft/tests/raft_group_fixture.h:            config::shard_local_cfg().get("disable_metrics").set_value(true);
src/v/raft/tests/raft_group_fixture.h:            config::shard_local_cfg()
src/v/raft/tests/raft_group_fixture.h:            config::shard_local_cfg()
src/v/redpanda/tests/fixture.h:    config::configuration& lconf() { return config::shard_local_cfg(); }
src/v/redpanda/tests/fixture.h:            auto& config = config::shard_local_cfg();
src/v/storage/tests/storage_test_fixture.h:            config::shard_local_cfg().get("disable_metrics").set_value(true);
src/v/storage/tests/storage_test_fixture.h:            config::shard_local_cfg()
src/v/storage/tests/storage_e2e_fixture_test.cc:    config::shard_local_cfg().log_segment_ms_min.set_value(
src/v/storage/tests/half_page_concurrent_dispatch.cc:    const size_t data_size = (config::shard_local_cfg().append_chunk_size() / 2)
src/v/storage/tests/kvstore_test.cc:        config::shard_local_cfg().get(p_name).set_value(v);
src/v/storage/tests/appender_chunk_manipulations.cc:    const auto chunk_size = config::shard_local_cfg().append_chunk_size();
src/v/storage/tests/storage_e2e_test.cc:    config::shard_local_cfg().log_segment_size_min.set_value(
src/v/storage/tests/storage_e2e_test.cc:    config::shard_local_cfg().log_segment_size_max.set_value(
src/v/storage/tests/storage_e2e_test.cc:    config::shard_local_cfg().log_segment_size_min.reset();
src/v/storage/tests/storage_e2e_test.cc:    config::shard_local_cfg().log_segment_size_max.reset();
src/v/storage/tests/storage_e2e_test.cc:        config::shard_local_cfg().cloud_storage_enabled.set_value(
src/v/storage/tests/storage_e2e_test.cc:        config::shard_local_cfg().retention_bytes.set_value(
src/v/storage/tests/storage_e2e_test.cc:        config::shard_local_cfg()
src/v/storage/tests/log_retention_tests.cc:    config::shard_local_cfg().get("cloud_storage_enabled").set_value(true);
src/v/storage/tests/log_retention_tests.cc:    config::shard_local_cfg().get("cloud_storage_enabled").set_value(true);
src/v/storage/tests/log_retention_tests.cc:    config::shard_local_cfg().get("retention_local_strict").set_value(true);
src/v/cloud_storage_clients/tests/backend_detection_test.cc:    config::shard_local_cfg().cloud_storage_backend.set_value(


andijcr
andijcr previously approved these changes Aug 31, 2023
@andijcr
Copy link
Contributor

andijcr commented Aug 31, 2023

for example

[[nodiscard]] auto make_local_config_resetter() {
    return ss::defer([] {
        config::shard_local_cfg().for_each(
          [](config::base_property& p) { p.reset(); });
    });
}

public:
~scoped_config() {
for (auto& p : _properties_to_reset) {
config::shard_local_cfg().get(p).reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has to do this on all shards. Something like:

        ss::smp::invoke_on_all([&] {
            config::shard_local_cfg().get(p).reset();
        }).ge();

and same is for setting the value. We have tests which are using more than one shard and setting config parameters (without resetting). For instance throughput_limits_fixture in the produce_consume_test.cc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the setting part of this seems a bit more involved. I'm not sure it's the right thing to do to always do it on every core. I'm leaning towards forcing test authors to manage this per core with a sharded service or somesuch.

@andrwng andrwng force-pushed the scoped_config branch 2 times, most recently from 1c4224a to c07fd44 Compare August 31, 2023 14:48
@andrwng
Copy link
Contributor Author

andrwng commented Aug 31, 2023

for example

[[nodiscard]] auto make_local_config_resetter() {
    return ss::defer([] {
        config::shard_local_cfg().for_each(
          [](config::base_property& p) { p.reset(); });
    });
}

Yea I'd considered something like this, but opted to track specific configs given the full list might be quite expensive if we start cleaning up every config ubiquitously

~scoped_config() {
for (auto p : _properties_to_reset) {
ss::smp::invoke_on_all([p] {
config::shard_local_cfg().get(p).reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i must be confused about how reset works--it resets it back to a default? i would have expected that it reset it back to it's value at the time get was called so the scoped configs could compose together or deal with cases where a fixture is customizing a config before a test runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I waffled on this a bit. I considered saving the entire config on construction, but felt like that might be a bit heavyweight; or serializing the original value as yaml or json and reverting, but opted for resetting because it was a simpler implementation.

If you're okay with it, I could rename it to some scoped_config_reset or somesuch if that makes this less surprising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's probably ok as. no harm in improving/changing it later if it's solving a need now

}

private:
std::list<std::string_view> _properties_to_reset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we save a real string here? as a test author, i think i'm not going to expect to need to manage the lifetime of property strings. for example, wouldn't this be a use-after-free error?

scoped.get(std::string("asdf")).something()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can do this, thought it's also just not something in our tree atm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh i see. yeh, no urgent need then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops, didn't see the response, but just pushed this change

Several tests set configs directly with config::shard_local_config
without cleaning them up after. This means that the next test that gets
run in the same process may be affected by whatever a previous test had
set.

This introduces a new scoped wrapper around config::shard_local_config
to track what properties may have been updated, so it may reset them
upon call to destructor.
dotnwat
dotnwat previously approved these changes Aug 31, 2023
Uses the new scoped_config to reset updated configs to their
defaults at the end of each test. This ensures no side effects across
tests.
@andrwng
Copy link
Contributor Author

andrwng commented Sep 1, 2023

Failures were all #13182

@dotnwat dotnwat merged commit fb9302e into redpanda-data:dev Sep 1, 2023
@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-13146-v23.2.x-726 remotes/upstream/v23.2.x
git cherry-pick -x ed4d72d102d857e3400a62a86d41d5701dbb7b9d d15a801e4315c40a4ef51eefc05ce386a04503c0

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (never reported reclaimable) in cloud_storage_rpfixture.reclaimable_reported_in_health_report
5 participants