Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in TPC-C restore on 15 node cluster #34406

Closed
awoods187 opened this issue Jan 30, 2019 · 1 comment
Closed

Panic in TPC-C restore on 15 node cluster #34406

awoods187 opened this issue Jan 30, 2019 · 1 comment
Labels
S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors

Comments

@awoods187
Copy link
Contributor

Describe the problem

Node died while importing tpc-c 10k on a 15 node cluster with load based splitting turned off. In the shell I saw

Andrews-MBP-2:pm andrewwoods$ roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=10000 --db=tpcc"
Error: importing fixture: importing table customer: pq: communication error: rpc error: code = Canceled desc = context canceled
Error:  exit status 1

To Reproduce

  1. export CLUSTER=andy-nolbs
  2. roachprod create $CLUSTER -n 16 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
  3. roachprod run $CLUSTER:1-15 -- 'sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier /dev/nvme1n1 /mnt/data1/; mount | grep /mnt/data1'
  4. roachprod stage $CLUSTER:1-15 cockroach
  5. roachprod stage $CLUSTER:16 workload
  6. roachprod start $CLUSTER:1-15 --racks=5
  7. roachprod sql $CLUSTER:1
  8. SET CLUSTER SETTING kv.range_split.by_load_enabled = false
  9. roachprod adminurl --open $CLUSTER:1
  10. roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=10000 --db=tpcc"

Expected behavior
Completing tpc-c import

Additional data / screenshots

Environment:
v2.2.0-alpha.20181217-936-g395d842

* ERROR: [n10,raftlog,s10,r1265/3:/Table/57/1/13{36/9/…-42/5/…}] a panic has occurred!
*
runtime error: invalid memory address or nil pointer dereference

goroutine 13092 [running]:
runtime/debug.Stack(0x3898dc0, 0xc008bb4120, 0xc000000003)
	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
github.com/cockroachdb/cockroach/pkg/util/log.ReportPanic(0x3898dc0, 0xc008bb4120, 0xc00010d300, 0x2d24dc0, 0x546ff60, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/crash_reporting.go:212 +0xa6
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000557290, 0x3898dc0, 0xc008bb4120)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:181 +0xe4
panic(0x2d24dc0, 0x546ff60)
	/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).raftSnapshotsForIndex(0xc00a8c4910, 0x0, 0xc00a8460c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:226 +0x3d
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).NumNewRaftSnapshots(0xc00a8c4910, 0xc00fa19d40)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:254 +0x34
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).String(0xc00a8c4910, 0xc009218b00, 0xc00b881b00)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:273 +0x24a
github.com/cockroachdb/cockroach/pkg/storage.(*raftLogQueue).process(0xc000ca4040, 0x3898d80, 0xc009218b40, 0xc00b881b00, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:449 +0x3d0
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica(0xc00077ed00, 0x3898d80, 0xc009218b40, 0xc00b881b00, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:751 +0x345
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processLoop.func1.2(0x3898dc0, 0xc008bb4120)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:645 +0xb8
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc000557290, 0x3898dc0, 0xc008bb4120, 0xc00be3c1b0, 0x2b, 0x0, 0x0, 0xc00fa19d20)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:323 +0xe6
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:318 +0x134
@tbg
Copy link
Member

tbg commented Jan 30, 2019

Thanks for reporting @awoods187. This is fixed once #34399 lands.

@tbg tbg closed this as completed Jan 30, 2019
@awoods187 awoods187 added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Projects
None yet
Development

No branches or pull requests

2 participants