Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: wait for splits before starting worker for kv50/rangelookups/relocate #39122

Merged
merged 2 commits into from
Aug 12, 2019

Conversation

jeffrey-xiao
Copy link
Contributor

@jeffrey-xiao jeffrey-xiao commented Jul 26, 2019

Relocates were failing with cannot add placeholder, have an existing placeholder due to concurrent splits when splitting the tables during workload run. This commit adds a split step during to workload init to prevent these concurrent splits.

Release note: None

Fixes #39021.

Alternatively, we can also add the error to isExpectedRelocateError but that doesn't seem ideal.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Makes sense. Here you're using the splits as a proxy for whether the workload has finished its setup. There aren't too many thing which run concurrently with workload but I wonder if we have this sort of problem in a more subtle way in other places. A more complicated solution here could involve communicating with the workload process somehow, perhaps by having it create a file or expose its state. This works.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jeffrey-xiao and @nvanbenschoten)


pkg/cmd/roachtest/kv.go, line 570 at r1 (raw file):

		})

		// We only start running the workers when the initial splits have been

nit: you could use the retry package here. The defaults are pretty sane.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. One question I'm left with is whether we can add the --splits flag to the init step and use the doneInit chan like we were before?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @jeffrey-xiao)


pkg/cmd/roachtest/kv.go, line 557 at r2 (raw file):

			defer close(doneWorkload)
			concurrency := ifLocal("", " --concurrency="+fmt.Sprint(nodes*64))
			splits := " --splits=" + fmt.Sprint(splits)

nit: strconv.Itoa(splits)


pkg/cmd/roachtest/kv.go, line 574 at r2 (raw file):

		// We only start running the workers when the initial splits have been
		// created. The relocate worker will fail with "cannot add placeholder,
		// have an existing placeholder" if there concurrent splits happening.

"if there are"


pkg/cmd/roachtest/kv.go, line 576 at r2 (raw file):

		// have an existing placeholder" if there concurrent splits happening.
		retryOpts := base.DefaultRetryOptions()
		for retryable := retry.StartWithCtx(ctx, retryOpts); retryable.Next(); {

Add a t.Status so that we know what's going on if this step gets stuck. Once you do that, we'll want to add another one once this finishes.

Copy link
Contributor Author

@jeffrey-xiao jeffrey-xiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the previous approach is that the splits happen as part of the run command of the workload and not the init step (See https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/cli/run.go#L351-L356).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)

@nvanbenschoten
Copy link
Member

It's pretty bad that workload init takes a --splits flag but doesn't actually perform any splits. @danhhz do you see any issue with splitting tables in runInit? We already do so in fixturesLoad and fixturesImport.

@jeffrey-xiao jeffrey-xiao changed the title roachtest: fix rangelookup roachtest with relocate workload roachtest: wait for splits before starting worker for kv50/rangelookups/relocate Aug 8, 2019
@jeffrey-xiao jeffrey-xiao force-pushed the fix-range-lookup-roachtest branch 3 times, most recently from d916cfd to 8fbbd5b Compare August 8, 2019 17:48
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @jeffrey-xiao, and @nvanbenschoten)


pkg/workload/workloadsql/workloadsql.go, line 43 at r3 (raw file):

	}

	const splitConcurrency = 384 // TODO(dan): Don't hardcode this.

This seems like a good place for this, but we now have a number of code paths that will call Split multiple times. We should audit those paths and figure out how to avoid the duplication. For instance, see backupRestoreTestSetupWithParams, fixturesLoad, fixturesImport, runRun when*doInit || *drop, etc.

@danhhz
Copy link
Contributor

danhhz commented Aug 8, 2019

It's pretty bad that workload init takes a --splits flag but doesn't actually perform any splits. @danhhz do you see any issue with splitting tables in runInit? We already do so in fixturesLoad and fixturesImport.

Ah, sorry I missed this! I'd actually like to go even further and split only in init and not in run (unless --init is also specified, but I'd also like to get rid of the --init flag on run)

@jeffrey-xiao jeffrey-xiao force-pushed the fix-range-lookup-roachtest branch 4 times, most recently from e2a52c0 to 44bc0ad Compare August 8, 2019 22:27
This change removes some code duplication with splitting tables before
running a workload and also resolves the confusion around init taking a
--splits flag, but not performing any splits.

Release note: None
…ps/relocate

Relocates were failing due to concurrent splits that were a part of the
kv initialization step. This commit waits until the initialization and
split step has completed before starting the workers.

Release note: None
Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workload changes lgtm. i mentioned offline that there might be some weird fallout from this, which is why i hadn't done it myself, but @jeffrey-xiao indicated he was okay with that

@jeffrey-xiao
Copy link
Contributor Author

TFTRs!

bors r+

craig bot pushed a commit that referenced this pull request Aug 12, 2019
39122: roachtest: wait for splits before starting worker for kv50/rangelookups/relocate r=jeffrey-xiao a=jeffrey-xiao

Relocates were failing with `cannot add placeholder, have an existing placeholder` due to concurrent splits when splitting the tables during `workload run`. This commit adds a split step during to `workload init` to prevent these concurrent splits.

Release note: None

Fixes #39021. 

Alternatively, we can also add the error to `isExpectedRelocateError` but that doesn't seem ideal.

Co-authored-by: Jeffrey Xiao <jeffrey.xiao1998@gmail.com>
@craig
Copy link
Contributor

craig bot commented Aug 12, 2019

Build succeeded

@craig craig bot merged commit 5b43c41 into cockroachdb:master Aug 12, 2019
@jeffrey-xiao jeffrey-xiao deleted the fix-range-lookup-roachtest branch August 15, 2019 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: kv50/rangelookups/relocate/nodes=8 failed
5 participants