Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testutils/lint/passes/timer: Test failed under stress #39616

Closed
cockroach-teamcity opened this issue Aug 13, 2019 · 0 comments · Fixed by #39666
Closed

testutils/lint/passes/timer: Test failed under stress #39616

cockroach-teamcity opened this issue Aug 13, 2019 · 0 comments · Fixed by #39666
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/14f9c378040b933900a84d11045f1750440b2c71

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=Test PKG=github.com/cockroachdb/cockroach/pkg/testutils/lint/passes/timer TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1436669&tab=buildLog

=== RUN   Test
--- FAIL: Test (0.07s)
    analysistest.go:103: loading [a]: go [list -e -json -compiled=true -test=true -export=false -deps=true -find=false -- a]: exit status 1: go build a: failed to cache compiled Go files

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 13, 2019
@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Aug 13, 2019
craig bot pushed a commit that referenced this issue Aug 14, 2019
39640: storage: unify replica addition and removal paths r=nvanbenschoten a=tbg

This continues the reworking of the various replication change APIs with
the goal of allowing a) testing of general atomic replication changes b)
issuing replica swaps from the replicate queue (in 19.2).

For previous steps, see:

#39485
#39611

This change is not a pure plumbing PR. Instead, it unifies
`(*Replica).addReplica` and `(*Replica).removeReplica` into a method that
can do both, `(*Replica).addAndRemoveReplicas`.

Given a slice of ReplicationChanges, this method first adds learner
replicas corresponding to the desired new voters. After having sent
snapshots to all of them, the method issues a configuration change that
atomically
- upgrades all learners to voters
- removes any undesired replicas.

Note that no atomic membership changes are *actually* carried out yet. This
is because the callers of `addAndRemoveReplicas` pass in only a single
change (i.e. an addition or removal), which the method also verifies.

Three pieces are missing after this PR: First, we need to be able to
instruct raft to carry out atomic configuration changes:

https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451

which in particular requires being able to store the ConfState
corresponding to a joint configuration in the unreplicated local state
(under a new key).

Second, we must pass the slice of changes handed to
`AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling
it first, see:

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891

and

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325

Third, we must to teach the replicate queue to issue the "atomic swaps";
this is the reason we're introducing atomic membership changes in the first
place.

Release note: None

39656: kv: init heartbeat txn log tag later r=nvanbenschoten a=tbg

At init() time, the txn proto has not been populated yet.
Found while investigating #39652.

This change strikes me as clunky, but I don't have the bandwidth to dig deeper
right now.

Release note: None

39666: testutils/lint/passes: disable under nightly stress r=mjibson a=mjibson

Under stress these error with "go build a: failed to cache compiled Go files".

Fixes #39616
Fixes #39541
Fixes #39479

Release note: None

39669: rpc: use gRPC enforced minimum keepalive timeout r=knz a=ajwerner

Before this commit we'd experience the following annoying log message from gRPC
every time we create a new connection telling us that our setting is being
ignored.

```
Adjusting keepalive ping interval to minimum period of 10s
```

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig craig bot closed this as completed in #39666 Aug 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants