Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/partitionccl: TestRepartitioning failed under stress #43707

Closed
cockroach-teamcity opened this issue Jan 3, 2020 · 7 comments · Fixed by #48245
Closed

ccl/partitionccl: TestRepartitioning failed under stress #43707

cockroach-teamcity opened this issue Jan 3, 2020 · 7 comments · Fixed by #48245
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/5b73cd419b514452492226c0ea1b19639356ebb8

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1671818&tab=buildLog

              [n1,client=127.0.0.1:36422,user=root] executing: SELECT count(*) FROM "multi col list partitioning - DEFAULT" WHERE (a = 5) AND (b > 8) in state: Open
              [n1,client=127.0.0.1:36422,user=root] planning starts: SELECT
              [n1,client=127.0.0.1:36422,user=root] generating optimizer plan
              [n1,client=127.0.0.1:36422,user=root] added table 'data.public."multi col list partitioning - DEFAULT"' to table collection
              [n1,client=127.0.0.1:36422,user=root] query cache hit
              [n1,client=127.0.0.1:36422,user=root] planning ends
              [n1,client=127.0.0.1:36422,user=root] checking distributability
              [n1,client=127.0.0.1:36422,user=root] will distribute plan: true
              [n1,client=127.0.0.1:36422,user=root] execution starts: distributed engine
               === SPAN START: consuming rows ===
              [n1,client=127.0.0.1:36422,user=root] creating DistSQL plan with isLocal=false
              [n1,client=127.0.0.1:36422,user=root] querying next range at /Table/68/1/5/9
              [n1,client=127.0.0.1:36422,user=root] running DistSQL plan
               === SPAN START: flow ===
              [n1,client=127.0.0.1:36422,user=root] starting (0 processors, 0 startables)
               === SPAN START: table reader ===
            cockroach.processorid: 0
            cockroach.stat.tablereader.input.rows: 0
            cockroach.stat.tablereader.stalltime: 0s
              [n1,client=127.0.0.1:36422,user=root] starting scan with limitBatches true
              [n1,client=127.0.0.1:36422,user=root] Scan /Table/68/1/{5/9-6}
               === SPAN START: txn coordinator send ===
               === SPAN START: dist sender send ===
              [n1,client=127.0.0.1:36422,user=root,txn=15adfab2] querying next range at /Table/68/1/5/9
              [n1,client=127.0.0.1:36422,user=root,txn=15adfab2] r76: sending batch 1 Scan to (n1,s1):1
              [n1,client=127.0.0.1:36422,user=root,txn=15adfab2] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n1] 1 Scan
              [n1,s1] executing 1 requests
              [n1,s1,r76/1:/Table/68/1/{5/9-6}] read-only path
              [n1,s1,r76/1:/Table/68/1/{5/9-6}] read has no clock uncertainty
              [n1,s1,r76/1:/Table/68/1/{5/9-6}] acquire latches
              [n1,s1,r76/1:/Table/68/1/{5/9-6}] waited 3.319µs to acquire latches
              [n1,s1,r76/1:/Table/68/1/{5/9-6}] waiting for read lock
              [n1,s1,r76/1:/Table/68/1/{5/9-6}] read completed
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 36µs
              [n1,client=127.0.0.1:36422,user=root] execution ends
              [n1,client=127.0.0.1:36422,user=root] rows affected: 1
              [n1,client=127.0.0.1:36422,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:36422,user=root] releasing 1 tables
              [n1,client=127.0.0.1:36422,user=root] [NoTxn pos:4715] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:36422,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 41406 [running]:
            runtime/debug.Stack(0xa7a358200, 0xc0015b5f20, 0x375cc20)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x37c9c40, 0xc003a8db00, 0xc0015b5ef0)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:49 +0x103
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc003a8db00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1360 +0xaa1
            testing.tRunner(0xc003a8db00, 0xc003608c30)
            	/usr/local/go/src/testing/testing.go:827 +0xbf
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:878 +0x35c

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Jan 3, 2020
@cockroach-teamcity
Copy link
Member Author

(ccl/partitionccl).TestRepartitioning failed on master@ae3bf3e4f9a32f491f09846750ad88bef2be1c5d:

Fatal error:

F200116 19:03:53.292122 979805 kv/txn_interceptor_span_refresher.go:149  [n2,txn-hb=6b3875ba] unexpected batch read timestamp: 1579201432.743908957,0. Expected refreshed timestamp: 1579201432.825244580,1. ba: [txn: 6b3875ba], EndTxn(commit:false) [/Min] tsflex:false

Stack:

goroutine 979805 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x9a3c401, 0x0, 0x0, 0xc0042f4540)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xc6
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x906eb60, 0xc000000004, 0x87b05d5, 0x24, 0x95, 0xc003728b60, 0xc1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:211 +0xbe3
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x67c9500, 0xc00274e6f0, 0xc000000004, 0x2, 0x5c15950, 0x4d, 0xc003a79568, 0x3, 0x3)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x67c9500, 0xc00274e6f0, 0x1, 0xc000000004, 0x5c15950, 0x4d, 0xc004ca3568, 0x3, 0x3)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).SendLocked(0xc0033259c0, 0x67c9500, 0xc00274e6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:149 +0xb2d
github.com/cockroachdb/cockroach/pkg/kv.(*txnPipeliner).SendLocked(0xc003325910, 0x67c9500, 0xc00274e6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_pipeliner.go:208 +0x1ea
github.com/cockroachdb/cockroach/pkg/kv.(*txnSeqNumAllocator).SendLocked(0xc0033258f0, 0x67c9500, 0xc00274e6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_seq_num_allocator.go:104 +0x2c3
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeater).abortTxnAsyncLocked.func1(0x67c9500, 0xc00274e6f0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeater.go:385 +0x1c9
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc004994280, 0x67c9500, 0xc00274e6f0, 0xc0041727e0, 0x24, 0x0, 0x0, 0xc004cb5e60)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:322 +0x163
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:317 +0x14b
Log preceding fatal error

I200116 19:03:49.614289 661519 storage/queue.go:524  [n1,s1] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I200116 19:03:49.745837 973096 sql/lease.go:921  new lease: 74("multi col list partitioning - DEFAULT") ver=2:1579201741.699824728,0, refcount=0
I200116 19:03:49.759757 665056 sql/lease.go:326  [n1,client=127.0.0.1:42012,user=root,scExec] waiting for 1 leases to expire: desc=[{multi col list partitioning - DEFAULT 74 1}]
I200116 19:03:49.885801 665056 sql/sqlbase/structured.go:1475  [n1,client=127.0.0.1:42012,user=root,scExec] publish: descID=74 (multi col list partitioning - DEFAULT) version=3 mtime=1970-01-01 00:00:00 +0000 UTC
I200116 19:03:50.117644 665056 sql/lease.go:326  [n1,client=127.0.0.1:42012,user=root,scExec] waiting for 1 leases to expire: desc=[{multi col list partitioning - DEFAULT 74 2}]
I200116 19:03:50.139517 974076 sql/lease.go:921  new lease: 74("multi col list partitioning - DEFAULT") ver=3:1579201741.700824728,0, refcount=0
I200116 19:03:50.553947 665056 sql/sqlbase/structured.go:1475  [n1,client=127.0.0.1:42012,user=root] publish: descID=74 (multi col list partitioning - DEFAULT) version=4 mtime=1970-01-01 00:00:00 +0000 UTC
I200116 19:03:50.755486 665056 sql/event_log.go:132  [n1,client=127.0.0.1:42012,user=root] Event: "alter_table", target: 74, info: {TableName:data.public."multi col list partitioning - DEFAULT" Statement:ALTER TABLE "multi col list partitioning - DEFAULT" PARTITION BY LIST (a, b) (PARTITION p34 VALUES IN ((3, 4)), PARTITION p57 VALUES IN ((5, 7)), PARTITION p58 VALUES IN ((5, 8)), PARTITION p5d VALUES IN ((5, DEFAULT))) User:root MutationID:0 CascadeDroppedViews:[]}
I200116 19:03:50.852857 975717 sql/lease.go:921  new lease: 74("multi col list partitioning - DEFAULT") ver=4:1579201742.225461800,0, refcount=0
I200116 19:03:50.919579 976010 storage/replica_command.go:1706  [n2,s2,r129/2:/Table/74/1{-/3/4}] change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r129:/Table/74/1{-/3/4} [(n2,s2):2, next=3, gen=84]
I200116 19:03:50.932058 665056 sql/lease.go:326  [n1,client=127.0.0.1:42012,user=root,scExec] waiting for 1 leases to expire: desc=[{multi col list partitioning - DEFAULT 74 3}]
I200116 19:03:51.038029 976065 storage/replica_command.go:1706  [n3,s3,r128/1:/{Table/74/2-Max}] change replicas (add [(n2,s2):2LEARNER] remove []): existing descriptor r128:/{Table/74/2-Max} [(n3,s3):1, next=2, gen=84]
I200116 19:03:51.071126 976010 storage/replica_raft.go:248  [n2,s2,r129/2:/Table/74/1{-/3/4}] proposing SIMPLE(l3) ADD_REPLICA[(n3,s3):3LEARNER]: after=[(n2,s2):2 (n3,s3):3LEARNER] next=4
I200116 19:03:51.118000 976010 storage/store_snapshot.go:973  [n2,s2,r129/2:/Table/74/1{-/3/4}] sending LEARNER snapshot f7f49ddd at applied index 38
I200116 19:03:51.120248 976010 storage/store_snapshot.go:1016  [n2,s2,r129/2:/Table/74/1{-/3/4}] streamed snapshot to (n3,s3):3LEARNER: kv pairs: 12, log entries: 0, rate-limit: 8.0 MiB/sec, 0.02s
I200116 19:03:51.121182 976516 storage/raft_snapshot_queue.go:169  [n2,raftsnapshot,s2,r129/2:/Table/74/1{-/3/4}] skipping snapshot; replica is likely a learner in the process of being added: (n3,s3):3LEARNER
I200116 19:03:51.132883 976472 storage/replica_raftstorage.go:785  [n3,s3,r129/3:{-}] applying LEARNER snapshot [id=f7f49ddd index=38]
I200116 19:03:51.222363 976065 storage/replica_raft.go:248  [n3,s3,r128/1:/{Table/74/2-Max}] proposing SIMPLE(l2) ADD_REPLICA[(n2,s2):2LEARNER]: after=[(n3,s3):1 (n2,s2):2LEARNER] next=3
I200116 19:03:51.244928 976685 storage/raft_snapshot_queue.go:169  [n3,raftsnapshot,s3,r128/1:/{Table/74/2-Max}] skipping snapshot; replica is likely a learner in the process of being added: (n2,s2):2LEARNER
I200116 19:03:51.250584 976065 storage/store_snapshot.go:973  [n3,s3,r128/1:/{Table/74/2-Max}] sending LEARNER snapshot 7f49a99b at applied index 19
I200116 19:03:51.257282 976065 storage/store_snapshot.go:1016  [n3,s3,r128/1:/{Table/74/2-Max}] streamed snapshot to (n2,s2):2LEARNER: kv pairs: 7, log entries: 0, rate-limit: 8.0 MiB/sec, 0.01s
I200116 19:03:51.267101 976472 storage/replica_raftstorage.go:806  [n3,s3,r129/3:/Table/74/1{-/3/4}] applied LEARNER snapshot [total=134ms ingestion=4@130ms id=f7f49ddd index=38]
I200116 19:03:51.278671 976734 storage/replica_raftstorage.go:785  [n2,s2,r128/2:{-}] applying LEARNER snapshot [id=7f49a99b index=19]
I200116 19:03:51.350173 976734 storage/replica_raftstorage.go:806  [n2,s2,r128/2:/{Table/74/2-Max}] applied LEARNER snapshot [total=64ms ingestion=4@33ms id=7f49a99b index=19]
I200116 19:03:51.358288 976010 storage/replica_command.go:1706  [n2,s2,r129/2:/Table/74/1{-/3/4}] change replicas (add [(n3,s3):3] remove []): existing descriptor r129:/Table/74/1{-/3/4} [(n2,s2):2, (n3,s3):3LEARNER, next=4, gen=85]
I200116 19:03:51.363476 976065 storage/replica_command.go:1706  [n3,s3,r128/1:/{Table/74/2-Max}] change replicas (add [(n2,s2):2] remove []): existing descriptor r128:/{Table/74/2-Max} [(n3,s3):1, (n2,s2):2LEARNER, next=3, gen=85]
I200116 19:03:51.462894 664732 storage/queue.go:524  [n3,s3] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I200116 19:03:51.590206 976010 storage/replica_raft.go:248  [n2,s2,r129/2:/Table/74/1{-/3/4}] proposing SIMPLE(v3) ADD_REPLICA[(n3,s3):3]: after=[(n2,s2):2 (n3,s3):3] next=4
I200116 19:03:51.599293 976065 storage/replica_raft.go:248  [n3,s3,r128/1:/{Table/74/2-Max}] proposing SIMPLE(v2) ADD_REPLICA[(n2,s2):2]: after=[(n3,s3):1 (n2,s2):2] next=3
I200116 19:03:51.645191 663801 storage/queue.go:524  [n2,s2] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I200116 19:03:51.841090 977487 storage/replica_command.go:1706  [n2,s2,r128/2:/{Table/74/2-Max}] change replicas (add [] remove [(n3,s3):1VOTER_DEMOTING]): existing descriptor r128:/{Table/74/2-Max} [(n3,s3):1, (n2,s2):2, next=3, gen=86]
I200116 19:03:51.890877 977552 storage/replica_command.go:1706  [n3,replicate,s3,r129/3:/Table/74/1{-/3/4}] change replicas (add [] remove [(n2,s2):2VOTER_DEMOTING]): existing descriptor r129:/Table/74/1{-/3/4} [(n2,s2):2, (n3,s3):3, next=4, gen=86]
I200116 19:03:52.206191 977552 storage/replica_raft.go:248  [n3,s3,r129/3:/Table/74/1{-/3/4}] proposing ENTER_JOINT(r2 l2) REMOVE_REPLICA[(n2,s2):2VOTER_DEMOTING]: after=[(n2,s2):2VOTER_DEMOTING (n3,s3):3] next=4
I200116 19:03:52.232895 977552 storage/replica_command.go:1706  [n3,replicate,s3,r129/3:/Table/74/1{-/3/4}] change replicas (add [] remove []): existing descriptor r129:/Table/74/1{-/3/4} [(n2,s2):2VOTER_DEMOTING, (n3,s3):3, next=4, gen=87]
I200116 19:03:52.259440 977487 storage/replica_command.go:1706  [n2,s2,r128/2:/{Table/74/2-Max}] change replicas (add [] remove [(n3,s3):1VOTER_DEMOTING]): existing descriptor r128:/{Table/74/2-Max} [(n3,s3):1, (n2,s2):2, next=3, gen=86]
I200116 19:03:52.351648 977552 storage/replica_raft.go:248  [n3,s3,r129/3:/Table/74/1{-/3/4}] proposing LEAVE_JOINT: after=[(n2,s2):2LEARNER (n3,s3):3] next=4
I200116 19:03:52.379410 977552 storage/replica_command.go:1706  [n3,replicate,s3,r129/3:/Table/74/1{-/3/4}] change replicas (add [] remove [(n2,s2):2LEARNER]): existing descriptor r129:/Table/74/1{-/3/4} [(n2,s2):2LEARNER, (n3,s3):3, next=4, gen=88]
E200116 19:03:52.411536 975864 storage/queue.go:1035  [n3,merge,s3,r120/1:/Table/74{-/1}] while carrying out changes [{REMOVE_REPLICA n2,s2}]: change replicas of r129 failed: descriptor changed: [expected] r129:/Table/74/1{-/3/4} [(n2,s2):2, (n3,s3):3, next=4, gen=86] != [actual] r129:/Table/74/1{-/3/4} [(n2,s2):2VOTER_DEMOTING, (n3,s3):3, next=4, gen=87]
I200116 19:03:52.473516 978368 storage/replica_raft.go:248  [n3,s3,r128/1:/{Table/74/2-Max}] proposing ENTER_JOINT(r1 l1) REMOVE_REPLICA[(n3,s3):1VOTER_DEMOTING]: after=[(n3,s3):1VOTER_DEMOTING (n2,s2):2] next=3
E200116 19:03:52.474359 978368 storage/replica_raft.go:268  [n3,s3,r128/1:/{Table/74/2-Max}] received invalid ChangeReplicasTrigger ENTER_JOINT(r1 l1) REMOVE_REPLICA[(n3,s3):1VOTER_DEMOTING]: after=[(n3,s3):1VOTER_DEMOTING (n2,s2):2] next=3 to remove self (leaseholder)
I200116 19:03:52.507128 977552 storage/replica_raft.go:248  [n3,s3,r129/3:/Table/74/1{-/3/4}] proposing SIMPLE(r2) REMOVE_REPLICA[(n2,s2):2LEARNER]: after=[(n3,s3):3] next=4
I200116 19:03:52.521661 663739 storage/store_remove_replica.go:129  [n2,s2,r129/2:/Table/74/1{-/3/4}] removing replica r129/2
E200116 19:03:52.526529 976103 storage/queue.go:1035  [n2,merge,s2,r126/2:/Table/74/{1/6-2}] while carrying out changes [{REMOVE_REPLICA n3,s3}]: change replicas of r128 failed: storage/replica_raft.go:269: [n3,s3,r128/1:/{Table/74/2-Max}]: received invalid ChangeReplicasTrigger ENTER_JOINT(r1 l1) REMOVE_REPLICA[(n3,s3):1VOTER_DEMOTING]: after=[(n3,s3):1VOTER_DEMOTING (n2,s2):2] next=3 to remove self (leaseholder)
I200116 19:03:52.527600 978505 storage/replica_command.go:1706  [n3,replicate,s3,r128/1:/{Table/74/2-Max}] change replicas (add [] remove [(n2,s2):2VOTER_DEMOTING]): existing descriptor r128:/{Table/74/2-Max} [(n3,s3):1, (n2,s2):2, next=3, gen=86]
I200116 19:03:52.575047 978484 storage/replica_command.go:672  [n3,merge,s3,r120/1:/Table/74{-/1}] initiating a merge of r129:/Table/74/1{-/3/4} [(n3,s3):3, next=4, gen=89] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I200116 19:03:52.661551 665056 sql/event_log.go:132  [n1,client=127.0.0.1:42012,user=root] Event: "set_zone_config", target: 74, info: {Target:INDEX data.public."multi col list partitioning - DEFAULT"@primary Config: Options:constraints = '[+n1]' User:root}
I200116 19:03:52.740007 661653 server/status/runtime.go:498  [n1] runtime stats: 1.9 GiB RSS, 807 goroutines, 74 MiB/32 MiB/127 MiB GO alloc/idle/total, 98 MiB/151 MiB CGO alloc/total, 2333.3 CGO/sec, 325.3/31.2 %(u/s)time, 0.5 %gc (11x), 1.8 MiB/1.8 MiB (r/w)net
I200116 19:03:52.874479 978848 storage/replica_command.go:1706  [n2,s2,r128/2:/{Table/74/2-Max}] change replicas (add [] remove [(n3,s3):1VOTER_DEMOTING]): existing descriptor r128:/{Table/74/2-Max} [(n3,s3):1, (n2,s2):2, next=3, gen=86]
I200116 19:03:52.906867 978505 storage/replica_command.go:1706  [n3,replicate,s3,r128/1:/{Table/74/2-Max}] change replicas (add [] remove [(n2,s2):2VOTER_DEMOTING]): existing descriptor r128:/{Table/74/2-Max} [(n3,s3):1, (n2,s2):2, next=3, gen=86]
I200116 19:03:52.982365 664684 storage/store_remove_replica.go:129  [n3,s3,r120/1:/Table/74{-/1}] removing replica r129/3

Repro

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestRepartitioning PKG=./pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@danhhz
Copy link
Contributor

danhhz commented Jan 16, 2020

cc @andreimatei looks like you recently upgraded this error to a Fatal?

@andreimatei
Copy link
Contributor

yeah, let me see. I've seen this crash somewhere else too, so maybe I should revert something.

@tbg tbg added the branch-master Failures and bugs on the master branch. label Jan 22, 2020
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jan 27, 2020
Before this patch, the refresher interceptor was erroneously asserting
its tracking of the refreshed timestamp is in sync with the
TxnCoordSender. It may, in fact, not be in sync in edge cases where a
refresh succeeded but the TxnCoordSender doesn't hear about that
success.

Touches cockroachdb#38156
Touches cockroachdb#41941
Touches cockroachdb#43707

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jan 29, 2020
Before this patch, the refresher interceptor was erroneously asserting
its tracking of the refreshed timestamp is in sync with the
TxnCoordSender. It may, in fact, not be in sync in edge cases where a
refresh succeeded but the TxnCoordSender doesn't hear about that
success.

Touches cockroachdb#38156
Touches cockroachdb#41941
Touches cockroachdb#43707

Release note: None
craig bot pushed a commit that referenced this issue Jan 30, 2020
44407: storage: improve the migration away from txn.DeprecatedOrigTimestamp r=andreimatei a=andreimatei

19.2 doesn't generally set txn.ReadTimestamp. Instead, it sets
txn.DeprecatedOrigTimestamp. Before this patch, all code dealing with
txn.ReadTimestamp had to deal with the possibility of it not being set.
This is fragile; I recently forgot to deal with it in a patch.
This patch sets txn.ReadTimestamp to txn.DeprecatedOrigTimestamp when it
wasn't set, thereby releaving most other code of they worry.

This comes at the cost of an extra txn clone for requests coming from
19.2 nodes.

Release note: None

44428: storage: fix handling of refreshed timestamp r=andreimatei a=andreimatei

Before this patch, the refresher interceptor was erroneously asserting
its tracking of the refreshed timestamp is in sync with the
TxnCoordSender. It may, in fact, not be in sync in edge cases where a
refresh succeeded but the TxnCoordSender doesn't hear about that
success.

Touches #38156
Touches #41941
Touches #43707

Release note: None

44503: roachpb: fix txn.Update() commutativity r=andreimatei a=andreimatei

Updates to the WriteTooOld field were not commutative. This patch fixes
that, by clarifying that the transaction with the higher ReadTimestamp
gets to dictate the WriteTooOld value.
I'm not sure what consequences this used to have, besides allowing for
the confusing case where the server would receive a request with the
WriteTooOld flag set, but with the ReadTimestamp==WriteTimestamp.  A
future commit introduces a sanity assertion that all the requests with
the WTO flag have a bumped WriteTimestamp.

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@cockroach-teamcity
Copy link
Member Author

(ccl/partitionccl).TestRepartitioning failed on master@120d53fc17f1f027c6075a2050f2991e369c5293:

               === SPAN START: [async] drain ===
               === SPAN START: table reader ===
            cockroach.processorid: 0
            cockroach.stat.tablereader.bytes.read: 0 B
            cockroach.stat.tablereader.input.rows: 0
            cockroach.stat.tablereader.stalltime: 4.704ms
              [n3] starting scan with limitBatches true
              [n3] Scan /Table/100/{2/5-3}
               === SPAN START: txn coordinator send ===
               === SPAN START: dist sender send ===
              [n3,txn=010fabee] querying next range at /Table/100/2/5
              [n3,txn=010fabee] r188: sending batch 1 Scan to (n3,s3):1
              [n3,txn=010fabee] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n3] 1 Scan
              [n3,s3] executing 1 requests
              [n3,s3,r188/1:/Table/100/{2/5-3}] read-only path
              [n3,s3,r188/1:/Table/100/{2/5-3}] sequencing request
              [n3,s3,r188/1:/Table/100/{2/5-3}] acquiring latches
              [n3,s3,r188/1:/Table/100/{2/5-3}] scanning lock table for conflicting locks
              [n3,s3,r188/1:/Table/100/{2/5-3}] waiting for read lock
              [n3,s3,r188/1:/Table/100/{2/5-3}] read completed
              [n3,txn=010fabee] recording span to refresh: /Table/100/{2/5-3}
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 465µs
               === SPAN START: flow ===
              [n1,client=127.0.0.1:48826,user=root] starting (0 processors, 0 startables)
               === SPAN START: noop ===
            cockroach.processorid: 2
              [n1,client=127.0.0.1:48826,user=root] execution ends
              [n1,client=127.0.0.1:48826,user=root] rows affected: 1
              [n1,client=127.0.0.1:48826,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:48826,user=root] releasing 1 tables
               === SPAN START: exec stmt ===
              [n1,client=127.0.0.1:48826,user=root] [NoTxn pos:11062] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:48826,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 1267583 [running]:
            runtime/debug.Stack(0xc005db5b00, 0x6af5f80, 0xc009b218e0)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x6bf03c0, 0xc006f3b200, 0xc005db5b00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:37 +0x87
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc006f3b200)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1363 +0xb41
            testing.tRunner(0xc006f3b200, 0xc005eb38c0)
            	/usr/local/go/src/testing/testing.go:909 +0x19a
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:960 +0x652

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestRepartitioning PKG=./pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/partitionccl).TestRepartitioning failed on master@c473f40078994551cebcbe00fdbf1fa388957658:

            cockroach.stat.tablereader.bytes.read: 0 B
            cockroach.stat.tablereader.input.rows: 0
            cockroach.stat.tablereader.stalltime: 9.104ms
              [n3] starting scan with limitBatches true
              [n3] Scan /Table/100/2/{!NULL-3}
               === SPAN START: txn coordinator send ===
               === SPAN START: [async] drain ===
               === SPAN START: dist sender send ===
              [n3,txn=7c7c43a5] querying next range at /Table/100/2/!NULL
              [n3,txn=7c7c43a5] r190: sending batch 1 Scan to (n3,s3):1
              [n3,txn=7c7c43a5] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n3] 1 Scan
               === SPAN START: flow ===
              [n3,s3] executing 1 requests
              [n3,s3,r190/1:/Table/100/2{-/3}] read-only path
              [n3,s3,r190/1:/Table/100/2{-/3}] sequencing request
              [n1,client=127.0.0.1:37614,hostssl,user=root] starting (0 processors, 0 startables)
              [n3,s3,r190/1:/Table/100/2{-/3}] acquiring latches
               === SPAN START: noop ===
            cockroach.processorid: 2
              [n3] Consumer sent handshake. Consuming flow scheduled: false
              [n3,s3,r190/1:/Table/100/2{-/3}] scanning lock table for conflicting locks
              [n3,s3,r190/1:/Table/100/2{-/3}] waiting for read lock
              [n3] Consumer sent handshake. Consuming flow scheduled: true
              [n3,s3,r190/1:/Table/100/2{-/3}] read completed
              [n3,txn=7c7c43a5] recording span to refresh: /Table/100/2/{!NULL-3}
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 459µs
              [n1,client=127.0.0.1:37614,hostssl,user=root] execution ends
              [n1,client=127.0.0.1:37614,hostssl,user=root] rows affected: 1
              [n1,client=127.0.0.1:37614,hostssl,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:37614,hostssl,user=root] releasing 1 tables
               === SPAN START: exec stmt ===
              [n1,client=127.0.0.1:37614,hostssl,user=root] [NoTxn pos:10354] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:37614,hostssl,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 1276413 [running]:
            runtime/debug.Stack(0xc004c19800, 0x6c1d540, 0xc009105da0)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x6d18ec0, 0xc00368ae00, 0xc004c19800)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:37 +0x87
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc00368ae00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1363 +0xb41
            testing.tRunner(0xc00368ae00, 0xc0053ba6f0)
            	/usr/local/go/src/testing/testing.go:909 +0x19a
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:960 +0x652

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestRepartitioning PKG=./pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/partitionccl).TestRepartitioning failed on master@e1e26af4e20a4e7be3b4b157e6d7c72fbf44c10d:

              [n1,client=127.0.0.1:54488,hostssl,user=root] querying next range at /Table/76/1/6
              [n1,client=127.0.0.1:54488,hostssl,user=root] running DistSQL plan
               === SPAN START: flow ===
              [n1,client=127.0.0.1:54488,hostssl,user=root] starting (0 processors, 0 startables)
               === SPAN START: table reader ===
            cockroach.processorid: 0
            cockroach.stat.tablereader.bytes.read: 0 B
            cockroach.stat.tablereader.input.rows: 0
            cockroach.stat.tablereader.stalltime: 252µs
              [n1,client=127.0.0.1:54488,hostssl,user=root] starting scan with limitBatches true
              [n1,client=127.0.0.1:54488,hostssl,user=root] Scan /Table/76/{1/6-2}
               === SPAN START: txn coordinator send ===
               === SPAN START: dist sender send ===
              [n1,client=127.0.0.1:54488,hostssl,user=root,txn=0d64ea03] querying next range at /Table/76/1/6
              [n1,client=127.0.0.1:54488,hostssl,user=root,txn=0d64ea03] r87: sending batch 1 Scan to (n1,s1):1
              [n1,client=127.0.0.1:54488,hostssl,user=root,txn=0d64ea03] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n1] 1 Scan
              [n1,s1] executing 1 requests
              [n1,s1,r87/1:/Table/76/{1/5-2}] read-only path
              [n1,s1,r87/1:/Table/76/{1/5-2}] sequencing request
              [n1,s1,r87/1:/Table/76/{1/5-2}] acquiring latches
              [n1,s1,r87/1:/Table/76/{1/5-2}] scanning lock table for conflicting locks
              [n1,s1,r87/1:/Table/76/{1/5-2}] read has no clock uncertainty
              [n1,s1,r87/1:/Table/76/{1/5-2}] waiting for read lock
              [n1,s1,r87/1:/Table/76/{1/5-2}] read completed
              [n1,client=127.0.0.1:54488,hostssl,user=root,txn=0d64ea03] recording span to refresh: /Table/76/{1/6-2}
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 42µs
              [n1,client=127.0.0.1:54488,hostssl,user=root] execution ends
              [n1,client=127.0.0.1:54488,hostssl,user=root] rows affected: 1
              [n1,client=127.0.0.1:54488,hostssl,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:54488,hostssl,user=root] releasing 1 tables
               === SPAN START: exec stmt ===
              [n1,client=127.0.0.1:54488,hostssl,user=root] [NoTxn pos:6534] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:54488,hostssl,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 472176 [running]:
            runtime/debug.Stack(0xc0073e2000, 0x4535040, 0xc0051c2100)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x4623820, 0xc004d56c00, 0xc0073e2000)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:37 +0x6b
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc004d56c00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1363 +0x971
            testing.tRunner(0xc004d56c00, 0xc004ab9290)
            	/usr/local/go/src/testing/testing.go:909 +0xc9
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:960 +0x350

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestRepartitioning PKG=./pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/partitionccl).TestRepartitioning failed on master@3b612692db93aa7c87493705e1fad85c9c664f6c:

Fatal error:

F200430 09:47:57.804400 1121587 kv/kvclient/kvcoord/txn_coord_sender.go:809  [n3,merge,s3,r184/1:/Table/86/1/3{-/4},txn=8b7a1658] unexpected error with ABORTED txn: (*roachpb.internalError) cannot write undeclared span /Local/RangeID/184/r/AbortSpan/"8b7a1658-45d2-4fc0-8a3f-bf52a88cb5e9"
declared:
read local: /Local/RangeID/184/r/AbortSpan/"8b7a1658-45d2-4fc0-8a3f-bf52a88cb5e9" at 0,0
read local: /Local/Range/Table/86/1/3/RangeDescriptor at 0,0
write global: /Meta2/Table/86/1/3/4 at 1588240076.350121220,0
write global: /Meta2/Table/86/1/4 at 1588240076.350121220,0
write global: /Table/13/1/2020-04-30T09:47:56.350121Z/551058398190108675/0 at 1588240076.350121220,0
write global: /Table/13/1/2020-04-30T09:47:56.350121Z/551058398190108675/2/1 at 1588240076.350121220,0
write global: /Table/13/1/2020-04-30T09:47:56.350121Z/551058398190108675/3/1 at 1588240076.350121220,0
write global: /Table/13/1/2020-04-30T09:47:56.350121Z/551058398190108675/4/1 at 1588240076.350121220,0
write global: /Table/13/1/2020-04-30T09:47:56.350121Z/551058398190108675/5/1 at 1588240076.350121220,0
write global: /Table/13/1/2020-04-30T09:47:56.350121Z/551058398190108675/6/1 at 1588240076.350121220,0
write local: /Local/Range/Table/86/1/3/RangeDescriptor at 0,0
write local: /Local/Range/Table/86/1/3/Transaction/"8b7a1658-45d2-4fc0-8a3f-bf52a88cb5e9" at 0,0
write local: /Local/Range/Table/86/1/3/4/RangeDescriptor at 0,0

stack:

Stack:

goroutine 1121587 [running]:
runtime/debug.Stack(0xc0002d22b0, 0xd, 0x10)
	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).checkAllowed(0xc00372a360, 0x1, 0xc0003316c0, 0x1c, 0x20, 0x0, 0x0, 0x0, 0x622f738, 0xc00387f270, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:293 +0x3ad
github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).CheckAllowed(0xc00372a360, 0x1, 0xc0003316c0, 0x1c, 0x20, 0x0, 0x0, 0x0, 0xc002b98dc0, 0x7f29d2a180af)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:224 +0xb5
github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.spanSetWriter.Clear(0x7f2a05bf4ff0, 0xc002051b00, 0xc00372a360, 0x1, 0x0, 0x0, 0xc0003316c0, 0x1c, 0x20, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/batch.go:386 +0xdd
github.com/cockroachdb/cockroach/pkg/storage.mvccPutInternal(0x7023280, 0xc004dc6330, 0x7f2a05b8f748, 0xc001ad0b40, 0x70a5940, 0xc002b98dc0, 0xc00460ca80, 0xc0003316c0, 0x1c, 0x20, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:1463 +0x6dd
github.com/cockroachdb/cockroach/pkg/storage.mvccPutUsingIter(0x7023280, 0xc004dc6330, 0x7f2a05b8f748, 0xc001ad0b40, 0x70a5940, 0xc002b98dc0, 0xc00460ca80, 0xc0003316c0, 0x1c, 0x20, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:1254 +0x18f
github.com/cockroachdb/cockroach/pkg/storage.MVCCDelete(0x7023280, 0xc004dc6330, 0x7f2a05b8f6b0, 0xc001ad0b40, 0xc00460ca80, 0xc0003316c0, 0x1c, 0x20, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:1224 +0x26c
github.com/cockroachdb/cockroach/pkg/kv/kvserver/abortspan.(*AbortSpan).Del(0xc005cd3500, 0x7023280, 0xc004dc6330, 0x7f2a05b8f6b0, 0xc001ad0b40, 0xc00460ca80, 0xc04fd24558167a8b, 0xe9b58ca852bf3f8a, 0x7f2a05cf2101, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/abortspan/abortspan.go:125 +0xfb
github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.UpdateAbortSpan(0x7023280, 0xc004dc6330, 0x70b6a60, 0xc0007bdea0, 0x7f2a05b8f6b0, 0xc001ad0b40, 0xc00460ca80, 0xc04fd24558167a8b, 0xe9b58ca852bf3f8a, 0xc0054b99d0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/transaction.go:102 +0x466
github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.resolveLocalLocks(0x7023280, 0xc004dc6330, 0xc0005aeaf0, 0x7f2a05b8f6b0, 0xc001ad0b40, 0xc00460ca80, 0xc002973dd0, 0xc00564c5a0, 0x70b6a60, 0xc0007bdea0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/cmd_end_transaction.go:556 +0x5c4
github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.EndTxn(0x7023280, 0xc004dc6330, 0x7f2a05b8f6b0, 0xc001ad0b40, 0x70b6a60, 0xc0007bdea0, 0x160a8fef85e2f75b, 0x1, 0x300000003, 0x1, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/cmd_end_transaction.go:239 +0x23a8
github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateCommand(0x7023280, 0xc004dc6330, 0xc0054b99b0, 0x8, 0x0, 0x7f2a05b8f6b0, 0xc001ad0b40, 0x70b6a60, 0xc0007bdea0, 0xc00460ca80, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_evaluate.go:471 +0x2bf
github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateBatch(0x7023280, 0xc004dc6330, 0xc0054b99b0, 0x8, 0x7f2a05b8f6b0, 0xc001ad0b40, 0x70b6a60, 0xc0007bdea0, 0xc00460ca80, 0xc001b85000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_evaluate.go:241 +0x492
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatchWrapper(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc0054b99b0, 0x8, 0x70b6a60, 0xc0007bdea0, 0xc00460ca80, 0xc001b85000, 0xc00372a360, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_write.go:555 +0x155
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatchWithServersideRefreshes(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc0054b99b0, 0x8, 0x70b6a60, 0xc0007bdea0, 0xc00460ca80, 0xc001b85000, 0xc00372a360, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_write.go:524 +0x177
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatch(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc0054b99b0, 0x8, 0xc001b85000, 0xc00372a360, 0x0, 0x10, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_write.go:347 +0x1f0
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateProposal(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc0054b99b0, 0x8, 0xc001b85000, 0xc00372a360, 0x0, 0xc003885e00, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:731 +0x13c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).requestToProposal(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc0054b99b0, 0x8, 0xc001b85000, 0xc00372a360, 0xc000a672d0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:851 +0x9f
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evalAndPropose(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc001b85000, 0xc0046bb340, 0xc003886c60, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:73 +0x10b
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeWriteBatch(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc001b85000, 0x160a8fef617ec4df, 0x0, 0x0, 0x300000003, 0x1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_write.go:133 +0xa5b
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries(0xc00301c000, 0x7023280, 0xc004dc6330, 0xc001b85000, 0x622f248, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:247 +0x5b7
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithRangeID(0xc00301c000, 0x7023280, 0xc004dc6330, 0xb8, 0xc001b85000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:94 +0xa8c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Send(0xc00301c000, 0x7023280, 0xc004dc6300, 0x160a8fef85e2f75b, 0x1, 0x300000003, 0x1, 0x0, 0xb8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:36 +0xce
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send(0xc0015bce00, 0x7023280, 0xc004dc62a0, 0x160a8fef85e2f75b, 0x1, 0x300000003, 0x1, 0x0, 0xb8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_send.go:202 +0x6c4
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).Send(0xc005283320, 0x7023280, 0xc004dc62a0, 0x0, 0x0, 0x300000003, 0x1, 0x0, 0xb8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/stores.go:177 +0x206
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x7023280, 0xc004dc62a0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:850 +0x26b
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc0015c03c0, 0x7023280, 0xc004dc62a0, 0x60c4cb9, 0x10, 0xc003888078, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:302 +0x151
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc0003cea00, 0x7023280, 0xc004dc62a0, 0xc001b84f80, 0xc004dc62a0, 0xe0bd34, 0xc000a672d0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:838 +0x2af
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc0003cea00, 0x7023280, 0xc004dc6270, 0xc001b84f80, 0xe0e12c, 0xc004dc6180, 0xd07eac)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:876 +0xb0
github.com/cockroachdb/cockroach/pkg/rpc.internalClientAdapter.Batch(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:537
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).sendBatch(0xc0012dad40, 0x7023280, 0xc004dc6270, 0x3, 0x6fe2c40, 0xc001d6a9c0, 0x0, 0x0, 0x300000003, 0x1, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:199 +0x1d7
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).SendNext(0xc0012dad40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x300000003, 0x1, 0x0, 0xb8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:168 +0x21c
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.raceTransport.SendNext(0x7043cc0, 0xc0012dad40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0xb8, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport_race.go:78 +0x3fe
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas(0xc004410b40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0xb8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1672 +0x348
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendRPC(0xc004410b40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0xb8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:466 +0x32c
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendSingleRange(0xc004410b40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:548 +0x3e1
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(0xc004410b40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1429 +0x546
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(0xc004410b40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1089 +0x1d90
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc004410b40, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:740 +0xa99
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnLockGatekeeper).SendLocked(0xc004f2bad0, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_lock_gatekeeper.go:86 +0x195
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnMetricRecorder).SendLocked(0xc004f2ba98, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_metric_recorder.go:46 +0xf4
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnCommitter).SendLocked(0xc004f2ba68, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go:190 +0x8ba
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).sendLockedWithRefreshAttempts(0xc004f2b9d0, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:247 +0x108
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).SendLocked(0xc004f2b9d0, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:182 +0x2cb
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnPipeliner).SendLocked(0xc004f2b910, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:252 +0x1ea
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSeqNumAllocator).SendLocked(0xc004f2b8f0, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_seq_num_allocator.go:105 +0x2c3
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnHeartbeater).SendLocked(0xc004f2b850, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_heartbeater.go:172 +0xff
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).Send(0xc004f2b680, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:503 +0x6b7
github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender(0xc0013db280, 0x7023240, 0xc000a672c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:739 +0x174
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Send(0xc0048177a0, 0x7023240, 0xc000a672c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:911 +0x1d2
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).rollback(0xc0048177a0, 0x7023240, 0xc000a672c0, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:709 +0x5fc
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).CleanupOnError(0xc0048177a0, 0x7023240, 0xc000a672c0, 0x6f9e7e0, 0xc006434640)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:603 +0x9e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).AdminMerge(0xc00301c000, 0x7023240, 0xc000a672c0, 0xc0064d3a10, 0x3, 0x8, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:773 +0x3a6
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*mergeQueue).process(0xc005b54c00, 0x7023240, 0xc000a672c0, 0xc00301c000, 0xc003b2bc70, 0x60bd435, 0xd)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/merge_queue.go:326 +0x171c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1(0x7023240, 0xc000a672c0, 0xdf8475800, 0x7023240)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:955 +0x24e
github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x7023240, 0xc000a672c0, 0xc004d1cf60, 0x1f, 0xdf8475800, 0xc004531d98, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:135 +0xde
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc005929a20, 0x7023280, 0xc004b25f80, 0x708c520, 0xc00301c000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:914 +0x3aa
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processLoop.func1.2(0x7023280, 0xc0051bdd10)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:842 +0x116
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc0015c03c0, 0x7023280, 0xc0051bdd10, 0xc002aad2f0, 0x29, 0x0, 0x0, 0xc0051bde60)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:323 +0x163
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:318 +0x14b
. ba: [txn: 8b7a1658], EndTxn(commit:false tsflex:true) [/Local/Range/Table/86/1/3/RangeDescriptor] . txn: meta={id=8b7a1658 key=/Local/Range/Table/86/1/3/RangeDescriptor pri=0.00061626 epo=0 ts=1588240076.350121220,0 min=1588240076.350121220,0 seq=11} lock=true stat=ABORTED rts=0,0 wto=false max=0,0 int=6.
goroutine 1121587 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa4c4401, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xc6
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x9af6600, 0xc000000004, 0x920cb09, 0x27, 0x329, 0xc005f19000, 0x40ed)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:212 +0xe04
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x7023280, 0xc004dc6180, 0xc000000004, 0x2, 0x6169188, 0x3c, 0xc00388c2a0, 0x4, 0x4)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x7023280, 0xc004dc6180, 0x1, 0xc000000004, 0x6169188, 0x3c, 0xc00388c2a0, 0x4, 0x4)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).updateStateLocked(0xc004f2b680, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:809 +0x720
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).Send(0xc004f2b680, 0x7023280, 0xc004dc6180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:505 +0x74d
github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender(0xc0013db280, 0x7023240, 0xc000a672c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:739 +0x174
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Send(0xc0048177a0, 0x7023240, 0xc000a672c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:911 +0x1d2
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).rollback(0xc0048177a0, 0x7023240, 0xc000a672c0, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:709 +0x5fc
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).CleanupOnError(0xc0048177a0, 0x7023240, 0xc000a672c0, 0x6f9e7e0, 0xc006434640)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:603 +0x9e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).AdminMerge(0xc00301c000, 0x7023240, 0xc000a672c0, 0xc0064d3a10, 0x3, 0x8, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:773 +0x3a6
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*mergeQueue).process(0xc005b54c00, 0x7023240, 0xc000a672c0, 0xc00301c000, 0xc003b2bc70, 0x60bd435, 0xd)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/merge_queue.go:326 +0x171c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1(0x7023240, 0xc000a672c0, 0xdf8475800, 0x7023240)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:955 +0x24e
github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x7023240, 0xc000a672c0, 0xc004d1cf60, 0x1f, 0xdf8475800, 0xc004531d98, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:135 +0xde
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc005929a20, 0x7023280, 0xc004b25f80, 0x708c520, 0xc00301c000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:914 +0x3aa
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processLoop.func1.2(0x7023280, 0xc0051bdd10)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:842 +0x116
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc0015c03c0, 0x7023280, 0xc0051bdd10, 0xc002aad2f0, 0x29, 0x0, 0x0, 0xc0051bde60)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:323 +0x163
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:318 +0x14b

Log preceding fatal error

I200430 09:47:53.969319 1120174 kv/kvserver/replica_command.go:1602  [n3,replicate,s3,r185/1:/Table/86/1{-/3}] change replicas (add [(n2,s2):2LEARNER] remove []): existing descriptor r185:/Table/86/1{-/3} [(n3,s3):1, next=2, gen=122]
I200430 09:47:54.257085 538830 kv/kvserver/queue.go:581  [n2,s2] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I200430 09:47:54.430397 1119674 kv/kvserver/replica_raft.go:247  [n1,s1,r181/3:/Table/86/{1/5/7-2}] proposing ENTER_JOINT(r2 l2) REMOVE_REPLICA[(n2,s2):2VOTER_DEMOTING]: after=[(n2,s2):2VOTER_DEMOTING (n1,s1):3] next=4
I200430 09:47:54.466527 1120174 kv/kvserver/replica_raft.go:247  [n3,s3,r185/1:/Table/86/1{-/3}] proposing SIMPLE(l2) ADD_REPLICA[(n2,s2):2LEARNER]: after=[(n3,s3):1 (n2,s2):2LEARNER] next=3
I200430 09:47:54.472297 1119674 kv/kvserver/replica_command.go:1602  [n1,replicate,s1,r181/3:/Table/86/{1/5/7-2}] change replicas (add [] remove []): existing descriptor r181:/Table/86/{1/5/7-2} [(n2,s2):2VOTER_DEMOTING, (n1,s1):3, next=4, gen=123]
I200430 09:47:54.553393 1121057 kv/kvserver/raft_snapshot_queue.go:126  [n3,raftsnapshot,s3,r185/1:/Table/86/1{-/3}] skipping snapshot; replica is likely a learner in the process of being added: (n2,s2):2LEARNER
I200430 09:47:54.563240 539701 kv/kvserver/queue.go:581  [n3,s3] rate limited in MaybeAdd (gc): throttled on async limiting semaphore
I200430 09:47:54.569403 1120174 kv/kvserver/store_snapshot.go:932  [n3,replicate,s3,r185/1:/Table/86/1{-/3}] sending LEARNER snapshot c38afa21 at applied index 16
I200430 09:47:54.589349 1120174 kv/kvserver/store_snapshot.go:975  [n3,replicate,s3,r185/1:/Table/86/1{-/3}] streamed snapshot to (n2,s2):2LEARNER: kv pairs: 25, log entries: 0, rate-limit: 8.0 MiB/sec, 0.05s
I200430 09:47:54.603805 1119674 kv/kvserver/replica_raft.go:247  [n1,s1,r181/3:/Table/86/{1/5/7-2}] proposing LEAVE_JOINT: after=[(n2,s2):2LEARNER (n1,s1):3] next=4
I200430 09:47:54.608112 539701 kv/kvserver/queue.go:581  [n3,s3] rate limited in MaybeAdd (replicate): throttled on async limiting semaphore
I200430 09:47:54.649904 1121017 kv/kvserver/replica_raftstorage.go:780  [n2,s2,r185/2:{-}] applying LEARNER snapshot [id=c38afa21 index=16]
I200430 09:47:54.656644 1119674 kv/kvserver/replica_command.go:1602  [n1,replicate,s1,r181/3:/Table/86/{1/5/7-2}] change replicas (add [] remove [(n2,s2):2LEARNER]): existing descriptor r181:/Table/86/{1/5/7-2} [(n2,s2):2LEARNER, (n1,s1):3, next=4, gen=124]
I200430 09:47:54.842483 1121017 kv/kvserver/replica_raftstorage.go:801  [n2,s2,r185/2:/Table/86/1{-/3}] applied LEARNER snapshot [total=192ms ingestion=4@14ms id=c38afa21 index=16]
I200430 09:47:54.860307 1120174 kv/kvserver/replica_command.go:1602  [n3,replicate,s3,r185/1:/Table/86/1{-/3}] change replicas (add [(n2,s2):2] remove []): existing descriptor r185:/Table/86/1{-/3} [(n3,s3):1, (n2,s2):2LEARNER, next=3, gen=123]
I200430 09:47:55.092035 1119674 kv/kvserver/replica_raft.go:247  [n1,s1,r181/3:/Table/86/{1/5/7-2}] proposing SIMPLE(r2) REMOVE_REPLICA[(n2,s2):2LEARNER]: after=[(n1,s1):3] next=4
I200430 09:47:55.119755 538727 kv/kvserver/store_remove_replica.go:122  [n2,s2,r181/2:/Table/86/{1/5/7-2}] removing replica r181/2
I200430 09:47:55.131124 1120174 kv/kvserver/replica_raft.go:247  [n3,s3,r185/1:/Table/86/1{-/3}] proposing SIMPLE(v2) ADD_REPLICA[(n2,s2):2]: after=[(n3,s3):1 (n2,s2):2] next=3
I200430 09:47:55.280791 1121935 kv/kvserver/replica_command.go:1602  [n2,s2,r178/2:/Table/86/1/{3/4-4}] change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r178:/Table/86/1/{3/4-4} [(n2,s2):2, next=3, gen=115]
I200430 09:47:55.369924 540189 sql/event_log.go:132  [n1,client=127.0.0.1:55546,hostssl,user=root] Event: "set_zone_config", target: 86, info: {Target:PARTITION p57 OF INDEX data.public."multi col range partitioning - MAXVALUE"@primary Config: Options:constraints = '[+n2]' User:root}
I200430 09:47:55.371395 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a < 3
I200430 09:47:55.741308 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE (a, b) >= (5, 6) AND (a, b) < (5, 7)
I200430 09:47:55.788769 1121935 kv/kvserver/replica_raft.go:247  [n2,s2,r178/2:/Table/86/1/{3/4-4}] proposing SIMPLE(l3) ADD_REPLICA[(n3,s3):3LEARNER]: after=[(n2,s2):2 (n3,s3):3LEARNER] next=4
I200430 09:47:55.821118 1122493 kv/kvserver/raft_snapshot_queue.go:126  [n2,raftsnapshot,s2,r178/2:/Table/86/1/{3/4-4}] skipping snapshot; replica is likely a learner in the process of being added: (n3,s3):3LEARNER
I200430 09:47:55.836202 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a < 3
I200430 09:47:55.839744 1121935 kv/kvserver/store_snapshot.go:932  [n2,s2,r178/2:/Table/86/1/{3/4-4}] sending LEARNER snapshot 5008b9c7 at applied index 46
I200430 09:47:55.842630 1121935 kv/kvserver/store_snapshot.go:975  [n2,s2,r178/2:/Table/86/1/{3/4-4}] streamed snapshot to (n3,s3):3LEARNER: kv pairs: 14, log entries: 0, rate-limit: 8.0 MiB/sec, 0.03s
I200430 09:47:55.882403 1122565 kv/kvserver/replica_raftstorage.go:780  [n3,s3,r178/3:{-}] applying LEARNER snapshot [id=5008b9c7 index=46]
I200430 09:47:55.922393 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a >= 3 AND a < 4
I200430 09:47:55.981785 1122565 kv/kvserver/replica_raftstorage.go:801  [n3,s3,r178/3:/Table/86/1/{3/4-4}] applied LEARNER snapshot [total=99ms ingestion=4@52ms id=5008b9c7 index=46]
I200430 09:47:55.996606 1121935 kv/kvserver/replica_command.go:1602  [n2,s2,r178/2:/Table/86/1/{3/4-4}] change replicas (add [(n3,s3):3] remove []): existing descriptor r178:/Table/86/1/{3/4-4} [(n2,s2):2, (n3,s3):3LEARNER, next=4, gen=116]
I200430 09:47:56.051358 1122756 kv/kvserver/replica_command.go:1602  [n2,replicate,s2,r185/2:/Table/86/1{-/3}] change replicas (add [] remove [(n3,s3):1VOTER_DEMOTING]): existing descriptor r185:/Table/86/1{-/3} [(n3,s3):1, (n2,s2):2, next=3, gen=124]
I200430 09:47:56.121614 1121935 kv/kvserver/replica_raft.go:247  [n2,s2,r178/2:/Table/86/1/{3/4-4}] proposing SIMPLE(v3) ADD_REPLICA[(n3,s3):3]: after=[(n2,s2):2 (n3,s3):3] next=4
I200430 09:47:56.273860 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a < 3
I200430 09:47:56.319430 1122756 kv/kvserver/replica_raft.go:247  [n2,s2,r185/2:/Table/86/1{-/3}] proposing ENTER_JOINT(r1 l1) REMOVE_REPLICA[(n3,s3):1VOTER_DEMOTING]: after=[(n3,s3):1VOTER_DEMOTING (n2,s2):2] next=3
I200430 09:47:56.365391 1122756 kv/kvserver/replica_command.go:1602  [n2,replicate,s2,r185/2:/Table/86/1{-/3}] change replicas (add [] remove []): existing descriptor r185:/Table/86/1{-/3} [(n3,s3):1VOTER_DEMOTING, (n2,s2):2, next=3, gen=125]
I200430 09:47:56.455989 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a >= 3 AND a < 4
I200430 09:47:56.486201 1122756 kv/kvserver/replica_raft.go:247  [n2,s2,r185/2:/Table/86/1{-/3}] proposing LEAVE_JOINT: after=[(n3,s3):1LEARNER (n2,s2):2] next=3
I200430 09:47:56.511744 1122756 kv/kvserver/replica_command.go:1602  [n2,replicate,s2,r185/2:/Table/86/1{-/3}] change replicas (add [] remove [(n3,s3):1LEARNER]): existing descriptor r185:/Table/86/1{-/3} [(n3,s3):1LEARNER, (n2,s2):2, next=3, gen=126]
I200430 09:47:56.532638 1121587 kv/kvserver/replica_command.go:663  [n3,merge,s3,r184/1:/Table/86/1/3{-/4}] initiating a merge of r178:/Table/86/1/{3/4-4} [(n2,s2):2, (n3,s3):3, next=4, gen=117] into this range (lhs+rhs has (size=0 B+0 B=0 B qps=0.00+0.00=0.00qps) below threshold (size=128 MiB, qps=1250.00))
I200430 09:47:56.620095 1123332 kv/kvserver/replica_command.go:1602  [n3,replicate,s3,r178/3:/Table/86/1/{3/4-4}] change replicas (add [] remove [(n2,s2):2VOTER_DEMOTING]): existing descriptor r178:/Table/86/1/{3/4-4} [(n2,s2):2, (n3,s3):3, next=4, gen=117]
I200430 09:47:56.722541 1122756 kv/kvserver/replica_raft.go:247  [n2,s2,r185/2:/Table/86/1{-/3}] proposing SIMPLE(r1) REMOVE_REPLICA[(n3,s3):1LEARNER]: after=[(n2,s2):2] next=3
I200430 09:47:56.737044 536668 server/status/runtime.go:498  [n1] runtime stats: 2.0 GiB RSS, 907 goroutines, 100 MiB/68 MiB/150 MiB GO alloc/idle/total, 101 MiB/153 MiB CGO alloc/total, 1992.3 CGO/sec, 240.5/22.8 %(u/s)time, 0.1 %gc (6x), 2.4 MiB/2.4 MiB (r/w)net
I200430 09:47:56.768339 539653 kv/kvserver/store_remove_replica.go:122  [n3,s3,r185/1:/Table/86/1{-/3}] removing replica r185/1
I200430 09:47:56.842911 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a >= 3 AND a < 4
I200430 09:47:57.224203 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a >= 4 AND (a, b) < (5, 6)
I200430 09:47:57.472915 1123332 kv/kvserver/replica_raft.go:247  [n3,s3,r178/3:/Table/86/1/{3/4-4}] proposing ENTER_JOINT(r2 l2) REMOVE_REPLICA[(n2,s2):2VOTER_DEMOTING]: after=[(n2,s2):2VOTER_DEMOTING (n3,s3):3] next=4
I200430 09:47:57.515661 1123332 kv/kvserver/replica_command.go:1602  [n3,replicate,s3,r178/3:/Table/86/1/{3/4-4}] change replicas (add [] remove []): existing descriptor r178:/Table/86/1/{3/4-4} [(n2,s2):2VOTER_DEMOTING, (n3,s3):3, next=4, gen=118]
I200430 09:47:57.522048 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a < 3
I200430 09:47:57.740569 1081813 ccl/partitionccl/partition_test.go:218  query: SELECT count(*) FROM "multi col range partitioning - MAXVALUE" WHERE a >= 3 AND a < 4

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestRepartitioning PKG=./pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

craig bot pushed a commit that referenced this issue Apr 30, 2020
48190: sql: inject tenant ID in sqlServerArgs, pass through ExecutorConfig r=nvanbenschoten a=nvanbenschoten

Fixes #47903.
Informs #48123.

Also known as "the grand plumbing", this change replaces a few instances of `TODOSQLCodec` in `pkg/sql/sqlbase/index_encoding.go` and watches the house of cards fall apart. It then glues the world back together, this time using a properly injected tenant-bound SQLCodec to encode and decode all SQL table keys.

A tenant ID field is added to `sqlServerArgs`. This is used to construct a tenant-bound `keys.SQLCodec` during server creation. This codec morally lives on the `sql.ExecutorConfig`. In practice, it is also copied onto `tree.EvalContext` and `execinfra.ServerConfig` to help carry it around. SQL code is adapted to use this codec whenever it needs to encode or decode keys.

If all tests pass after this refactor, there is a good chance it got things right. This is because any use of an uninitialized SQLCodec will panic immediately when the codec is first used. This was helpful in ensuring that it was properly plumbed everywhere.

48245: kv: declare write access to AbortSpan on all aborting EndTxn reqs r=nvanbenschoten a=nvanbenschoten

Fixes #43707.
Fixes #48046.
Fixes #48189.

Part of the change made by #42765 was to clear AbortSpan entries on
non-poisoning, aborting EndTxn requests. Specifically, this change was
made in 1328787. The change forgot to update the corresponding span
declaration logic to reflect the fact that we were now writing to the
AbortSpan in cases where we previously weren't.

This was triggering an assertion in race builds that tried to catch this
kind of undeclared span access. The assertion failure was very rare
because it required the following conditions to all be met:
1. running a test with the race detector enabled
2. a txn (A) must have been aborted by another txn (B)
3. txn B must have cleared an intent on txn A's transaction record range
4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT)

We should backport this when we get a change (once v20.1.0 has
stabilized), but I don't expect that this could actually cause any
issues. The AbortSpan update was strictly a matter of performance and we
should never be racing with another request that is trying to read the
same AbortSpan entry.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in 18f8836 Apr 30, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 4, 2020
Fixes cockroachdb#43707.
Fixes cockroachdb#48046.
Fixes cockroachdb#48189.

Part of the change made by cockroachdb#42765 was to clear AbortSpan entries on
non-poisoning, aborting EndTxn requests. Specifically, this change was
made in 1328787. The change forgot to update the corresponding span
declaration logic to reflect the fact that we were now writing to the
AbortSpan in cases where we previously weren't.

This was triggering an assertion in race builds that tried to catch this
kind of undeclared span access. The assertion failure was very rare
because it required the following conditions to all be met:
1. running a test with the race detector enabled
2. a txn (A) must have been aborted by another txn (B)
3. txn B must have cleared an intent on txn A's transaction record range
4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT)

We should backport this when we get a change (once v20.1.0 has
stabilized), but I don't expect that this could actually cause any
issues. The AbortSpan update was strictly a matter of performance and we
should never be racing with another request that is trying to read the
same AbortSpan entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants