Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: sysbench/oltp_update_index/nodes=3/cpu=32/conc=256 failed #97389

Closed
cockroach-teamcity opened this issue Feb 21, 2023 · 9 comments
Closed
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 21, 2023

roachtest.sysbench/oltp_update_index/nodes=3/cpu=32/conc=256 failed with artifacts on master @ dd2749ae4ab61eed2f99238acb74e8d3c6b4cb1d:

test artifacts and logs in: /artifacts/sysbench/oltp_update_index/nodes=3/cpu=32/conc=256/run_1
(cluster.go:1956).Run: output in run_155040.686266349_n4_sysbench-dbdriverpgs: sysbench \
		--db-driver=pgsql \
		--pgsql-host={pghost:1} \
		--pgsql-port=26257 \
		--pgsql-user=root \
		--pgsql-password= \
		--pgsql-db=sysbench \
		--report-interval=1 \
		--time=600 \
		--threads=256 \
		--tables=10 \
		--table_size=10000000 \
		--auto_inc=false \
		oltp_update_index prepare returned: context canceled
(monitor.go:127).Wait: monitor failure: monitor command failure: unexpected node event: 3: dead (exit status 7)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=32 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-24671

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 21, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 21, 2023
@blathers-crl blathers-crl bot added the T-testeng TestEng Team label Feb 21, 2023
@renatolabs
Copy link
Contributor

renatolabs commented Feb 21, 2023

Node 3 panics with "use of Span after Finish":

E230221 15:55:05.370166 358 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n3] 2253 +panic: ‹use of Span after Finish. Span: dist sender. Finish previously called at: <stack not captured. Set debugUseAfterFinish>›

The dist sender span is created on dist_sender_rangefeed.go [1], owned by Replication, so I'm tagging @cockroachdb/repl-prs on this one.

[1]

ctx, sp := tracing.EnsureChildSpan(ctx, ds.AmbientContext.Tracer, "dist sender")

@blathers-crl
Copy link

blathers-crl bot commented Feb 21, 2023

cc @cockroachdb/replication

@renatolabs
Copy link
Contributor

FYI, I believe this happened again in this failure.

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Feb 24, 2023
@erikgrinaker erikgrinaker removed the T-testeng TestEng Team label Feb 27, 2023
@erikgrinaker
Copy link
Contributor

@aliher1911 Can you have a look at this when you get a chance?

@erikgrinaker
Copy link
Contributor

These are only fatal in test builds. Downgrading to GA blocker.

@erikgrinaker erikgrinaker added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 1, 2023
@aliher1911
Copy link
Contributor

The underlying failure here is "Was the raft log corrupted, truncated, or lost?" as in #97926. I'm checking why we have a problem with dist sender span because of this.

@aliher1911
Copy link
Contributor

In this case this is replica 2 on node 3 which is lagging behind

  "raft_state": {
    "replica_id": 1,
    "hard_state": {
      "term": 6,
      "vote": 1,
      "commit": 1272
    },
    "lead": 1,
    "state": "StateLeader",
    "applied": 1272,
    "progress": {
      "1": {
        "match": 1272,
        "next": 1273,
        "state": "StateReplicate"
      },
      "2": {
        "match": 1085,
        "next": 1086,
        "state": "StateProbe",
        "paused": true
      },
      "3": {
        "match": 1272,
        "next": 1273,
        "state": "StateReplicate"
      }
    }
  },

There are lots of log lines like

W230221 15:55:03.236211 3095 kv/kvserver/store_raft.go:315 ⋮ [T1] 2159  raft receive queue for r134 is full

prior to node panic-ing. Mind they are for other logs, not the one that caused panic.

This is tangental to why context/span was incorrectly closed when shutting down listeners.

@aliher1911
Copy link
Contributor

This failure (with context) is the same as #97032 and is likely fixed by #97555

@erikgrinaker
Copy link
Contributor

This is indeed a dupe of #97926, marking as such.

F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239  tocommit(1148) is out of range [lastIndex(1085)]. Was the raft log corrupted, truncated, or lost?
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !goroutine 432 [running]:
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x1)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0x89
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc000e5d640, {{{0xc00359c5a0, 0x24}, {0x0, 0x0}, {0x58e38be, 0x1}, {0x58e38c0, 0x1}}, 0x1745e1fb36cb8a34, ...})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/util/log/clog.go:262 +0x97
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x6d0b9a8, 0xc01064cff0}, 0x3, 0x4, 0x0, 0x0?, {0x5a296a2, 0x5d}, {0xc001b877a0, 0x2, ...})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x645
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/util/log.FatalfDepth(...)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/util/log/log_channels_generated.go:920
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftLogger).Panicf(0xc0005bf820, {0x5a296a2, 0x5d}, {0xc001b877a0, 0x2, 0x2})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/raft.go:118 +0xcd
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !go.etcd.io/raft/v3.(*raftLog).commitTo(0xc0017fedc0, 0x47c)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/log.go:324 +0x103
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !go.etcd.io/raft/v3.(*raftLog).maybeAppend(0xc0017fedc0, 0x47c, 0x0, 0x48f, {0x0?, 0x0, 0x0})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/log.go:124 +0x2d4
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !go.etcd.io/raft/v3.(*raft).handleAppendEntries(0xc0030fed80, {0x3, 0x2, 0x1, 0x6, 0x0, 0x47c, {0x0, 0x0, 0x0}, ...})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:1651 +0x77
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !go.etcd.io/raft/v3.stepFollower(0xc0030fed80, {0x3, 0x2, 0x1, 0x6, 0x0, 0x47c, {0x0, 0x0, 0x0}, ...})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:1606 +0x2f8
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !go.etcd.io/raft/v3.(*raft).Step(0xc0030fed80, {0x3, 0x2, 0x1, 0x6, 0x0, 0x47c, {0x0, 0x0, 0x0}, ...})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:1155 +0xed5
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !go.etcd.io/raft/v3.(*RawNode).Step(0x0?, {0x3, 0x2, 0x1, 0x6, 0x0, 0x47c, {0x0, 0x0, 0x0}, ...})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/rawnode.go:123 +0x138
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).stepRaftGroup.func1(0xc00b2fc690)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:586 +0x278
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).withRaftGroupLocked.func1(...)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1934
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).withRaftGroupLocked(0xc004c9e000, 0x60, 0x91)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1935 +0x26a
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).withRaftGroup(0xc004c9e000, 0x0?, 0xc005ed59e8?)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1980 +0x92
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).stepRaftGroup(0x6d0b9a8?, 0xc01064d020?)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:564 +0x45
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processRaftRequestWithReplica(0x3fc3333333333333?, {0x6d0b9a8, 0xc01064d020}, 0xc004c9e000, 0xc001f3aa80)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:387 +0x3f9
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processRequestQueue.func1({0xc005ed5d60?, 0x6d0b9a8?}, 0xc0039270e0?)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:589 +0x38
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).withReplicaForRequest(0xc0011668b0?, {0x6d0b9a8, 0xc0039270e0}, 0xc001f3aa80, 0xc005ed5e20)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:342 +0xe7
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processRequestQueue(0xc001166000, {0x6d0b9a8, 0xc0039270e0}, 0x656c646e61486564?)
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:587 +0x1c9
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc0038f8a00, {0x6d0b9a8, 0xc0039270e0})
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:314 +0x1ea
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x146
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !   github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:461 +0x43b
F230221 15:55:05.337727 432 go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n3,s3,r145/2:‹/Table/11{0/1/5149…-1}›] 2239 !

@erikgrinaker erikgrinaker added X-duplicate Closed as a duplicate of another issue. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Mar 7, 2023
@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. and removed X-duplicate Closed as a duplicate of another issue. GA-blocker labels Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

No branches or pull requests

5 participants