Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: use-span-after-finish panic during drain #97032

Closed
stevendanna opened this issue Feb 13, 2023 · 1 comment
Closed

server: use-span-after-finish panic during drain #97032

stevendanna opened this issue Feb 13, 2023 · 1 comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@stevendanna
Copy link
Collaborator

stevendanna commented Feb 13, 2023

Describe the problem

When attempting to write a roachtest that drains a server, I occasionally see:

E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +(1) attached stack trace
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  -- stack trace:
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | runtime.gopanic
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       GOROOT/src/runtime/panic.go:884
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).detectUseAfterFinish
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:181
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).RecordingType
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:441
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).IsVerbose
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:449
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/log.getSpanOrEventLog
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/util/log/trace.go:92
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/util/log/channels.go:101
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/log.logfDepth
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/log.Infof
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/bazel-out/darwin-fastbuild/bin/pkg/util/log/log_channels_generated.go:277
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/server.startListenRPCAndSQL.func2
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/server/start_listen.go:103
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cmux.(*cMux).handleErr
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cmux/external/com_github_cockroachdb_cmux/cmux.go:167
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cmux.(*cMux).Serve
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cmux/external/com_github_cockroachdb_cmux/cmux.go:126
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/server.startListenRPCAndSQL.func6.3.1
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/server/start_listen.go:180
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | sync.(*Once).doSlow
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       GOROOT/src/sync/once.go:74
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | sync.(*Once).Do
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       GOROOT/src/sync/once.go:65
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/server.startListenRPCAndSQL.func6.3
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/server/start_listen.go:179
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  | runtime.goexit
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +  |       GOROOT/src/runtime/asm_amd64.s:1594
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +Wraps: (2) panic: ‹use of Span after Finish. Span: storage.RaftTransport: processing queue. Finish previously called at: <stack not captured. Set debugUseAfterFinish>›
E230213 12:01:50.515135 265 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n2] 232 +Error types: (1) *withstack.withStack (2) *errutil.leafError
E230213 12:01:50.515321 122 server/server_sql.go:476 ⋮ [T1,n2] 233  failed to run update of instance with new session ID: node unavailable; try another peer

It looks like this is caused by:

	// cmux auto-retries Accept() by default. Tell it
	// to stop doing work if we see a request to shut down.
	m.HandleError(func(err error) bool {
		select {
		case <-stopper.ShouldQuiesce():
			log.Infof(ctx, "server shutting down: instructing cmux to stop accepting")
			return false
		default:
			return true
		}
	})

Perhaps we should have used workersCtx there.

Jira issue: CRDB-24478

@stevendanna stevendanna added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Feb 13, 2023
stevendanna added a commit to stevendanna/cockroach that referenced this issue Feb 23, 2023
Since the error handler outlives the startup, it should be using the
workersCtx.

I suspect this is the cause of cockroachdb#97032 but I haven't been able to
reproduce that locally in a unit test.

Fixes cockroachdb#93032

Epic: none

Release note: none
craig bot pushed a commit that referenced this issue Feb 23, 2023
97555: server: use workersCtx in cmux error handler r=knz a=stevendanna

Since the error handler outlives the startup, it should be using the workersCtx.

I suspect this is the cause of #97032 but I haven't been able to reproduce that locally in a unit test.

Fixes #93032

Epic: none

Release note: none

Co-authored-by: Steven Danna <danna@cockroachlabs.com>
@stevendanna
Copy link
Collaborator Author

I believe we fixed this in #97555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

1 participant