Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/tpch/nodes=32 failed #34398

Closed
cockroach-teamcity opened this issue Jan 30, 2019 · 2 comments
Closed

roachtest: import/tpch/nodes=32 failed #34398

cockroach-teamcity opened this issue Jan 30, 2019 · 2 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/395d842feb97c5bd8cad2b32b71a5156c03061eb

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1115923&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1585,import.go:150: unexpected node event: 16: dead

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Jan 30, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Jan 30, 2019
@tbg
Copy link
Member

tbg commented Jan 30, 2019

The input RaftStatus was nil. A comment I wrote claimed that this can't happen:

RaftStatus *raft.Status // never nil

It's definitely a lie because we sometimes return trivial truncatedStatuses:

if raftStatus == nil {
if log.V(6) {
log.Infof(ctx, "the raft group doesn't exist for r%d", rangeID)
}
return &truncateDecision{}, nil
}

I think String() wasn't called on truncateDecisions that don't cause a truncation, but we recently changed that (#34266). Easy to fix.

E190130 09:36:20.286516 52416 util/log/crash_reporting.go:202  [n22,raftlog,s22,r786/1:/Table/53/1/1081{0310…-1991…}] a panic has occurred!
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x190242d]

goroutine 52416 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0001c4a20, 0x3898dc0, 0xc005d8be30)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:183 +0x11f
panic(0x2d24dc0, 0x546ff60)
        /usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).raftSnapshotsForIndex(0xc0064887d0, 0x0, 0xc02d188000)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:226 +0x3d
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).NumNewRaftSnapshots(0xc0064887d0, 0xc006d30e00)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:254 +0x34
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).String(0xc0064887d0, 0xc008ad5a00, 0xc008184000)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:273 +0x24a
github.com/cockroachdb/cockroach/pkg/storage.(*raftLogQueue).process(0xc0006362c0, 0x3898d80, 0xc008ad5a40, 0xc008184000, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:449 +0x3d0
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica(0xc0009d8000, 0x3898d80, 0xc008ad5a40, 0xc008184000, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:751 +0x345
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processLoop.func1.2(0x3898dc0, 0xc005d8be30)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:645 +0xb8
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc0001c4a20, 0x3898dc0, 0xc005d8be30, 0xc0070fe0c0, 0x2b, 0x0, 0x0, 0xc006d30de0)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:323 +0xe6
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:318 +0x134
~

@RaduBerinde
Copy link
Member

Some nodes from cyan hit the same crash.

craig bot pushed a commit that referenced this issue Jan 30, 2019
33474: docs: add RFC to expose follower reads to clients r=ajwerner a=ajwerner

Relates to #16593.
Release note: None

34399: storage: fix NPE while printing trivial truncateDecision r=bdarnell a=tbg

Fixes #34398.

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig craig bot closed this as completed in #34399 Jan 30, 2019
andreimatei pushed a commit to andreimatei/cockroach that referenced this issue Jan 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

3 participants