Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: enable schema changes in acceptance/version-upgrade #98855

Merged
merged 2 commits into from
Mar 20, 2023

Conversation

fqazi
Copy link
Collaborator

@fqazi fqazi commented Mar 17, 2023

Previously, due to flakes we disabled schema changes inside the version update test. This patch re-enables them, since we are confident that the workload itslef is now stable in a mixed version state.

Fixes: #58489
Release note: None

@fqazi fqazi requested a review from a team March 17, 2023 14:39
@fqazi fqazi requested a review from a team as a code owner March 17, 2023 14:39
@fqazi fqazi requested review from herkolategan and renatolabs and removed request for a team March 17, 2023 14:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi and @herkolategan)


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 480 at r1 (raw file):

// Run performs `cluster.RunE` on a randomly picked node.
func (h *Helper) Run(rng *rand.Rand, cmd ...string) error {

I'm not convinced this function needs to be part of *Helper (I'd like to only add functions to this struct once there are many call sites that would benefit from it). I know we have Query related functions here already, but one distinction is that when we need to run SQL in a test, we will always connect to one gateway node; for running commands, it's very common that commands will need to run on multiple nodes at the same time, and it's not great to have behavior/logging be different if we are running a command in one vs many nodes.

In the mixed-version closure, you can achieve the same behavior with

node := h.RandomNode(rng, c.All())
c.RunE(...)

pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 487 at r1 (raw file):

// InitLegacyWorkload initializes a legacy workload on all nodes.
func (h *Helper) InitLegacyWorkload(workloadPath string, workload string) {

I also believe this function is doing something too specific to warrant a function in *Helper. In addition, it sounds like a function that the caller would always need to make sure to call just once. This is not quite the type of function that should live here. I'll make a comment down below.


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 490 at r1 (raw file):

	// Stage workload on all nodes as the load node to run workload is chosen
	// randomly.
	h.runner.cluster.Put(h.ctx, workloadPath, "./workload", h.runner.crdbNodes)

What's the reason we need to use ./workload instead of ./cockroach workload?


pkg/cmd/roachtest/tests/versionupgrade.go line 134 at r1 (raw file):

		func(ctx context.Context, l *logger.Logger, rng *rand.Rand, h *mixedversion.Helper) error {
			initWorkload.Do(func() {
				h.InitLegacyWorkload(t.DeprecatedWorkload(), "schemachange")

Instead of using the sync.Once, I think it makes more sense to call mvt.OnStartup and init the workload there.


pkg/cmd/roachtest/tests/versionupgrade.go line 142 at r1 (raw file):

				fmt.Sprintf("--concurrency %d", 2),
				fmt.Sprintf("{pgurl:1-%d}", len(c.All())),
			}

Nit: consider using commandbuilder, we've been trying to use more structured data types for commands in roachtest.


pkg/cmd/roachtest/tests/versionupgrade.go line 143 at r1 (raw file):

				fmt.Sprintf("{pgurl:1-%d}", len(c.All())),
			}
			return h.Run(rng, runCmd...)

Please use cluster.RunE. We want to return errors instead of calling t.Fatal in these callbacks.

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 480 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

I'm not convinced this function needs to be part of *Helper (I'd like to only add functions to this struct once there are many call sites that would benefit from it). I know we have Query related functions here already, but one distinction is that when we need to run SQL in a test, we will always connect to one gateway node; for running commands, it's very common that commands will need to run on multiple nodes at the same time, and it's not great to have behavior/logging be different if we are running a command in one vs many nodes.

In the mixed-version closure, you can achieve the same behavior with

node := h.RandomNode(rng, c.All())
c.RunE(...)

I was debating if it should be in the helper at all. I'll move this code out.


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 490 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

What's the reason we need to use ./workload instead of ./cockroach workload?

The schemachange workload is not supported under the cockroach binary, that's something that we should get done. I'll get an issue open for it.


pkg/cmd/roachtest/tests/versionupgrade.go line 134 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Instead of using the sync.Once, I think it makes more sense to call mvt.OnStartup and init the workload there.

Good point I'll switch this.


pkg/cmd/roachtest/tests/versionupgrade.go line 143 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Please use cluster.RunE. We want to return errors instead of calling t.Fatal in these callbacks.

The run wrapper was doing this, but I"ll remove it and put the calls here directly on cluster.

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 487 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

I also believe this function is doing something too specific to warrant a function in *Helper. In addition, it sounds like a function that the caller would always need to make sure to call just once. This is not quite the type of function that should live here. I'll make a comment down below.

Done.

@fqazi fqazi requested a review from renatolabs March 17, 2023 18:35
Copy link
Collaborator

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi and @herkolategan)


pkg/cmd/roachtest/tests/versionupgrade.go line 112 at r2 (raw file):

		if err := c.PutE(ctx, t.L(), t.DeprecatedWorkload(), "./workload", c.All()); err != nil {
			return err
		}

Probably cleaner to stash ./workload "out-of-band" -- i.e., you can just call c.Put() (no need for PutE) before the mvt := mixedversion.NewTest line. If that fails, we don't waste time setting up the mixed-version test, and it also simplifies this closure too.


pkg/cmd/roachtest/tests/versionupgrade.go line 113 at r2 (raw file):

			return err
		}
		return c.RunE(ctx, c.All(), "./workload init", "schemachange")

Nit: since these are all consts, probably reads better as a single string: c.RunE(ctx, c.All(), "./workload init schemachange")


pkg/cmd/roachtest/tests/versionupgrade.go line 145 at r2 (raw file):

			runCmd.Flag("max-ops", 10)
			runCmd.Flag("concurrency", 2)
			runCmd.Arg("{pgurl:1-%d}", len(c.All()))

Nit: these can be chained

runCmd := roachtestutil.NewCommand("...")
    .Flag("verbose", 1)
    .Flag("max-ops", 10)

pkg/cmd/roachtest/tests/versionupgrade.go line 146 at r2 (raw file):

			runCmd.Flag("concurrency", 2)
			runCmd.Arg("{pgurl:1-%d}", len(c.All()))
			randomNode := c.All().RandNode()

We should use h.RandomNode() here (i.e., using the rng provided as parameter instead of the global rand).

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)


pkg/cmd/roachtest/tests/versionupgrade.go line 112 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Probably cleaner to stash ./workload "out-of-band" -- i.e., you can just call c.Put() (no need for PutE) before the mvt := mixedversion.NewTest line. If that fails, we don't waste time setting up the mixed-version test, and it also simplifies this closure too.

Done.


pkg/cmd/roachtest/tests/versionupgrade.go line 146 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

We should use h.RandomNode() here (i.e., using the rng provided as parameter instead of the global rand).

Done.

Previously, due to flakes we disabled schema changes inside
the version update test. This patch re-enables them, since
we are confident that the workload itslef is now stable in a
mixed version state.

Fixes: cockroachdb#58489
Release note: None
Copy link
Collaborator

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Thank you for adding the schemachange workload back!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan)

@fqazi
Copy link
Collaborator Author

fqazi commented Mar 17, 2023

@renatolabs @chengxiong-ruan TFTR!

@fqazi
Copy link
Collaborator Author

fqazi commented Mar 17, 2023

This will need some tweaks on the workload side, we don't handle a specific version case for CREATE's properly. It should be minor will try and validate it a few times on Monday (otherwise there is a risk of intermittent failures).

Previously, when the randomized declarative schema changer
was run during version upgrade we the descriptor ID generator
can be temporarily unavailable between 22.2 and 23.1 This
patch, allows this error temporarily for this workload.

Epic: none

Release note: None
@fqazi
Copy link
Collaborator Author

fqazi commented Mar 20, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 20, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 20, 2023

Build succeeded:

@craig craig bot merged commit cb58230 into cockroachdb:master Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: schemaChangeStep is skipped in acceptance/version-upgrade
4 participants