Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: prototype for metamorphic settings during upgrade tests #112226

Conversation

erikgrinaker
Copy link
Contributor

This patch prototypes support for metamorphic cluster settings in upgrade tests.

The motivation is to metamorphically vary cluster settings that are not supported on all version (in particular, the initial version being upgraded from). Instead, the cluster setting is only modified once the cluster has been upgraded to a version that supports it.

Touches #103304.
Epic: none
Release note: None

This patch prototypes support for metamorphic cluster settings in
upgrade tests.

The motivation is to metamorphically vary cluster settings that are not
supported on all version (in particular, the initial version being
upgraded from). Instead, the cluster setting is only modified once the
cluster has been upgraded to a version that supports it.

Epic: none
Release note: None
@erikgrinaker erikgrinaker self-assigned this Oct 12, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Oct 12, 2023

@renatolabs Would like to get your thoughts on this before I go any further.

The primary motivation here is to metamorphically enable expiration-based leases in upgrade tests, when this is only possible to vary in 23.1 and later. In non-upgrade tests, this is already supported via TestSpec.Leases = MetamorphicLeases, but this does not work with the upgrade tests because 22.2 clusters and before do not support the kv.expiration_leases_only.enabled setting.

This overlaps with general support for metamorphic parameters in roachtests (see #105807 and #111066), but I don't intend to tackle that now -- unless you think it would be worthwhile to generalize this to support all roachtests from the outset.

@renatolabs
Copy link
Contributor

Thanks for prototyping!

I'll think about this more. Some initial thoughts: ideally I wouldn't want this type of logic to live in the mixedversion framework (i.e., intimate knowledge of cluster settings and when they were introduced).

How content would you be if this was tested not by setting these specific cluster settings, but by running tests with metamorphic builds of crdb? @DarrylWong is actively working on that (#111949). If we use metamorphic builds of previous release binaries as well (I'm not entirely sure if they are readily available or if we'd need to change some of our build processes), we would get coverage for kv.expiration_leases_only.enabled automatically that way.

@erikgrinaker
Copy link
Contributor Author

How content would you be if this was tested not by setting these specific cluster settings, but by running tests with metamorphic builds of crdb?

That would be better, if we can get it to work reliably. We should probably make sure all nodes of the same binary run with the same metamorphic parameters (it's going to be extremely annoying to debug test failures where different nodes have different views of e.g. default cluster settings). There might also be some issues around binaries for different versions running with different metamorphic parameters in mixed-version clusters, but I guess we'll find out.

I'll close out #103304 in favor of #86678, and add GA-blocker for the latter instead.

@srosenberg
Copy link
Member

We should probably make sure all nodes of the same binary run with the same metamorphic parameters (it's going to be extremely annoying to debug test failures where different nodes have different views of e.g. default cluster settings).

This is effectively already handled via the COCKROACH_RANDOM_SEED environment variable.

There might also be some issues around binaries for different versions running with different metamorphic parameters in mixed-version clusters, but I guess we'll find out.

This may indeed cause an interference. @DarrylWong fyi.

@renatolabs
Copy link
Contributor

extremely annoying to debug test failures where different nodes have different views of e.g. default cluster settings

FWIW, I believe this is already the case anyway, as far as cluster setting defaults go; the defaults are not version gated so nodes should already be able to handle different views of the defaults in mixed-binary scenarios. That said, we also use metamorphic constants for things other than cluster settings and I'm sure we don't code against the possibility of those values being different (nor should we).

binaries for different versions running with different metamorphic parameters in mixed-version clusters

This will indeed be a problem and unless we have an idea of how to deal with this, it will invalidate the whole approach. I'll think about this more, but if we don't have a solution, we might need to always disable metamorphic builds in mixed-version tests.

@erikgrinaker
Copy link
Contributor Author

FWIW, I believe this is already the case anyway, as far as cluster setting defaults go; the defaults are not version gated so nodes should already be able to handle different views of the defaults in mixed-binary scenarios.

Yes, but 2 sets of settings given by major version is a lot easier to reason about than e.g. 5 sets of randomly chosen settings in a 5-node cluster.

I'll think about this more, but if we don't have a solution, we might need to always disable metamorphic builds in mixed-version tests.

I think we should have some solution for metamorphically varying at least cluster settings and envvars in mixed-version tests, since we need all the coverage we can get in these tests.

@renatolabs
Copy link
Contributor

I think we should have some solution for metamorphically varying at least cluster settings

Yes, I agree. My comment was about disabling metamorphic builds in upgrade tests. We will support setting cluster settings probabilistically, managing it at the test level. I plan to have something within the next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants