Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: version/mixed/nodes=5 failed #41145

Closed
cockroach-teamcity opened this issue Sep 26, 2019 · 2 comments · Fixed by #41148
Closed

roachtest: version/mixed/nodes=5 failed #41145

cockroach-teamcity opened this issue Sep 26, 2019 · 2 comments · Fixed by #41148
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/77f26d185efb436aaac88243de19a27caa5da9b6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=version/mixed/nodes=5 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1509340&tab=artifacts#/version/mixed/nodes=5

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190926-1509340/version/mixed/nodes=5/run_1
	cluster.go:2143,version.go:233,version.go:246,test_runner.go:689: unexpected node event: 2: dead

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Sep 26, 2019
@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Sep 26, 2019
@jordanlewis
Copy link
Member

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x1bd9e44]

goroutine 176 [running]:
panic(0x3b493a0, 0x6f39c70)
	/usr/local/go/src/runtime/panic.go:565 +0x2c5 fp=0xc003608c90 sp=0xc003608c00 pc=0x78d165
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000025560, 0x494fca0, 0xc0007181e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:181 +0x121 fp=0xc003608cf0 sp=0xc003608c90 pc=0x12d0211
runtime.call32(0x0, 0x417d478, 0xc00049a490, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:519 +0x3b fp=0xc003608d20 sp=0xc003608cf0 pc=0x7bb1db
panic(0x3b493a0, 0x6f39c70)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5 fp=0xc003608db0 sp=0xc003608d20 pc=0x78d055
runtime.panicmem(...)
	/usr/local/go/src/runtime/panic.go:82
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:390 +0x411 fp=0xc003608de0 sp=0xc003608db0 pc=0x7a2cc1
github.com/cockroachdb/cockroach/pkg/storage.(*replicaAppBatch).runPreApplyTriggers(0xc0009370c8, 0x494fca0, 0xc003a93f50, 0xc00c6f2008, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:679 +0x374 fp=0xc003609050 sp=0xc003608de0 pc=0x1bd9e44
github.com/cockroachdb/cockroach/pkg/storage.(*replicaAppBatch).Stage(0xc0009370c8, 0x492de20, 0xc00c6f2008, 0xc0036093c8, 0x1bd526b, 0xc0009372a8, 0xc0009372d8)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:468 +0x2fd fp=0xc003609390 sp=0xc003609050 pc=0x1bd8f8d
github.com/cockroachdb/cockroach/pkg/storage/apply.Batch.Stage-fm(0x492de20, 0xc00c6f2008, 0xc00c6f2008, 0xc0009372a8, 0x498a500, 0xc0009372a8)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:69 +0x43 fp=0xc0036093d8 sp=0xc003609390 pc=0x1b3f803
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCmdIter(0x498a500, 0xc0009372a8, 0xc0036094c8, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:162 +0x11b fp=0xc003609438 sp=0xc0036093d8 pc=0x1b3e73b
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc003609908, 0x494fca0, 0xc003a93f50, 0x498a500, 0xc000937278, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:270 +0x15d fp=0xc0036094f0 sp=0xc003609438 pc=0x1b3f1fd
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc003609908, 0x494fca0, 0xc003a93f50, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:247 +0xcf fp=0xc003609548 sp=0xc0036094f0 pc=0x1b3f01f
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc000937000, 0x494fca0, 0xc003a93f50, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:760 +0xdbd fp=0xc003609c78 sp=0xc003609548 pc=0x1c06d6d
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x494fca0, 0xc003a93f50, 0xc000937000, 0x494fca0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3761 +0x12e fp=0xc003609d38 sp=0xc003609c78 pc=0x1c7bf7e
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc000b50700, 0x494fca0, 0xc003a93f50, 0xc003bcc800, 0xc003609ec0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3514 +0x16b fp=0xc003609dc0 sp=0xc003609d38 pc=0x1c4422b
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc000b50700, 0x494fca0, 0xc0007181e0, 0x48)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3750 +0x1f4 fp=0xc003609ef8 sp=0xc003609dc0 pc=0x1c44f44
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc001092700, 0x494fca0, 0xc0007181e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:238 +0x214 fp=0xc003609f50 sp=0xc003609ef8 pc=0x1c31264
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x494fca0, 0xc0007181e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:161 +0x3e fp=0xc003609f78 sp=0xc003609f50 pc=0x1c7665e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc000436f70, 0xc000025560, 0xc000436f60)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0xfb fp=0xc003609fc8 sp=0xc003609f78 pc=0x12d23ab
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc003609fd0 sp=0xc003609fc8 pc=0x7bcef1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:189 +0xa8

This seems new.

@nvanbenschoten
Copy link
Member

Yeah, it looks like we missed part of the upgrade path around replicated ChangeReplicas triggers. The fix is trivial, but I'm going to use this an opportunity to introduce some unit testing in this area now that it's decently decomposed.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 26, 2019
…atedNextReplicaID

Fixes cockroachdb#41145.

This bug was introduced in cockroachdb#40892.

This may force us to pick a new SHA for the beta. Any ChangeReplicas
Raft entry from 19.1 or before is going to crash a node without it.

Release justification: fixes a crash in mixed version clusters.

Release note: None
craig bot pushed a commit that referenced this issue Sep 27, 2019
41148: storage: don't crash when applying ChangeReplicas trigger with DeprecatedNextReplicaID r=nvanbenschoten a=nvanbenschoten

Fixes #41145.

This bug was introduced in #40892.

This may force us to pick a new SHA for the beta. Any ChangeReplicas
Raft entry from 19.1 or before is going to crash a node without it.

Release justification: fixes a crash in mixed version clusters.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in 15f5b81 Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
4 participants