-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: wait for splits before starting worker for kv50/rangelookups/relocate #39122
roachtest: wait for splits before starting worker for kv50/rangelookups/relocate #39122
Conversation
a1feb7d
to
3109a04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Here you're using the splits as a proxy for whether the workload has finished its setup. There aren't too many thing which run concurrently with workload but I wonder if we have this sort of problem in a more subtle way in other places. A more complicated solution here could involve communicating with the workload process somehow, perhaps by having it create a file or expose its state. This works.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/cmd/roachtest/kv.go, line 570 at r1 (raw file):
}) // We only start running the workers when the initial splits have been
nit: you could use the retry package here. The defaults are pretty sane.
3109a04
to
7de990a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. One question I'm left with is whether we can add the --splits
flag to the init
step and use the doneInit
chan like we were before?
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @jeffrey-xiao)
pkg/cmd/roachtest/kv.go, line 557 at r2 (raw file):
defer close(doneWorkload) concurrency := ifLocal("", " --concurrency="+fmt.Sprint(nodes*64)) splits := " --splits=" + fmt.Sprint(splits)
nit: strconv.Itoa(splits)
pkg/cmd/roachtest/kv.go, line 574 at r2 (raw file):
// We only start running the workers when the initial splits have been // created. The relocate worker will fail with "cannot add placeholder, // have an existing placeholder" if there concurrent splits happening.
"if there are"
pkg/cmd/roachtest/kv.go, line 576 at r2 (raw file):
// have an existing placeholder" if there concurrent splits happening. retryOpts := base.DefaultRetryOptions() for retryable := retry.StartWithCtx(ctx, retryOpts); retryable.Next(); {
Add a t.Status
so that we know what's going on if this step gets stuck. Once you do that, we'll want to add another one once this finishes.
7de990a
to
8e222ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the previous approach is that the splits happen as part of the run command of the workload and not the init step (See https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/cli/run.go#L351-L356).
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)
It's pretty bad that |
d916cfd
to
8fbbd5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @jeffrey-xiao, and @nvanbenschoten)
pkg/workload/workloadsql/workloadsql.go, line 43 at r3 (raw file):
} const splitConcurrency = 384 // TODO(dan): Don't hardcode this.
This seems like a good place for this, but we now have a number of code paths that will call Split multiple times. We should audit those paths and figure out how to avoid the duplication. For instance, see backupRestoreTestSetupWithParams
, fixturesLoad
, fixturesImport
, runRun
when*doInit || *drop
, etc.
Ah, sorry I missed this! I'd actually like to go even further and split only in init and not in run (unless --init is also specified, but I'd also like to get rid of the --init flag on run) |
e2a52c0
to
44bc0ad
Compare
This change removes some code duplication with splitting tables before running a workload and also resolves the confusion around init taking a --splits flag, but not performing any splits. Release note: None
…ps/relocate Relocates were failing due to concurrent splits that were a part of the kv initialization step. This commit waits until the initialization and split step has completed before starting the workers. Release note: None
44bc0ad
to
5b43c41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workload changes lgtm. i mentioned offline that there might be some weird fallout from this, which is why i hadn't done it myself, but @jeffrey-xiao indicated he was okay with that
TFTRs! bors r+ |
39122: roachtest: wait for splits before starting worker for kv50/rangelookups/relocate r=jeffrey-xiao a=jeffrey-xiao Relocates were failing with `cannot add placeholder, have an existing placeholder` due to concurrent splits when splitting the tables during `workload run`. This commit adds a split step during to `workload init` to prevent these concurrent splits. Release note: None Fixes #39021. Alternatively, we can also add the error to `isExpectedRelocateError` but that doesn't seem ideal. Co-authored-by: Jeffrey Xiao <jeffrey.xiao1998@gmail.com>
Build succeeded |
Relocates were failing with
cannot add placeholder, have an existing placeholder
due to concurrent splits when splitting the tables duringworkload run
. This commit adds a split step during toworkload init
to prevent these concurrent splits.Release note: None
Fixes #39021.
Alternatively, we can also add the error to
isExpectedRelocateError
but that doesn't seem ideal.