-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: WaitForFullReplication hangs and causes test flakes #40805
Comments
@irfansharif any chance you could take a look at this? I've also seen
fail where it seems like we just don't move a replica we need to move (the SucceedsSoon times out). That may or may not be related. That test is currently pretty flaky with fatal errors today due to things fixed in #40751 but it eventually fails and it looks like we're just not doing anything. I verified that the replica which needs to move does indeed get the right zone config. |
This change demonstrates a totally reliable flake during repartitioning. My guess is this implies that the intermittent flakes we've observed in cockroachdb#40805 are due to the server's automatic upgrade happening late in the test. Take this diff and run: ``` make test PKG=./pkg/ccl/partitionccl TESTS=Repartition TESTFLAGS=-v 2>&1 | tee out.$(date +%s) ``` Release Justification: definitely don't release this, it just repros a failure. Release note: None
Turns out my above comment is totally unrelated, see #40823. |
Taking a look at this now. |
The 10s or so you've seen where we wait for full replication seems typical, and is something #38565 is looking at.
Just a timeout. We seem to have bumped this up in #40838. |
TestParallel/subquery_retry_multinode
timed out and caused a build to fail, and the stack traces show that it seemed to have gotten stuck onTestCluster.WaitForFullReplication()
. There were about 10 seconds oftestutils/testcluster/testcluster.go:718 [n1,s1] has 1 underreplicated ranges
in the logs.#38565 is potentially related.
Test logs (internal): https://drive.google.com/file/d/1kQeirJNVxZlUUtgT_SpRYjlfw7U1tw-m/view?usp=sharing
The text was updated successfully, but these errors were encountered: