context: cancelCtx exclusive lock causes extreme contention #42564

niaow · 2020-11-12T21:09:54Z

What version of Go are you using (`go version`)?

$ go version
go version go1.15.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niaow/.cache/go-build"
GOENV="/home/niaow/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/niaow/go/pkg/mod"
GONOPROXY="github.com/molecula"
GONOSUMDB="github.com/molecula"
GOOS="linux"
GOPATH="/home/niaow/go"
GOPRIVATE="github.com/molecula"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build497931594=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Passed a single context with cancellation to a bunch of goroutines.
These goroutines had a cold-path compute task, interlaced with calls to context.Err() to detect cancellation.
The loop looks something like:

var out []Thing
for iterator.Next() {
	if err := ctx.Err(); err != nil {
		// caller doesn't need a result anymore.
		return nil, err
	}

	// Fetch thing from iterator, apply some filtering functions, and append it to out.
}

What did you expect to see?

A bit of a slowdown from the context check maybe?

What did you see instead?

Slightly over 50% of the CPU time was spent in runtime.findrunnable. The cancelContext struct uses a sync.Mutex, and due to extreme lock contention (64 CPU threads spamming it), this was triggering lockSlow. From poking at pprof, it appears that about 86% of CPU time was spent in functions related to this lock acquire.

I was able to work around this by adding a counter and checking it less frequently. However, I do not think that this is an intended performance degradation path. Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.

The text was updated successfully, but these errors were encountered:

davecheney · 2020-11-12T21:12:00Z

Thank you for raising this issue. Can you please provide more information, a sample program which demonstrates this, or if that is not possible, the pprof trace you used to identify the lock contention.

Thank you

bcmills · 2020-11-13T03:48:23Z

Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.

If you have 64 hardware threads, sync.RWMutex is unlikely to do any better than sync.Mutex: you'll just transform the lock contention to cache contention. (See #17973.)

muirdm · 2020-11-13T22:02:31Z

As a workaround, you can guard slow Err() call with a non-blocking Done() read:

done := ctx.Done()
for something {
  select {
  case <-done:
    return ctx.Err()
  default:
  }
}

bradfitz · 2021-01-16T04:49:04Z

FWIW, this just showed up in some of our profiles recently. It was only 1% of our overall CPU time, but I expected 0.

Here's a benchmark:

func BenchmarkContextCancelDone(b *testing.B) {
        ctx, cancel := context.WithCancel(context.Background())
        defer cancel()
        b.RunParallel(func(pb *testing.PB) {
                for pb.Next() {
                        select {
                        case <-ctx.Done():
                        default:
                        }
                }
        })
}

Profile:

dev:issue42564_cancelctx $ go tool pprof prof.cpu
File: issue42564_cancelctx.test
Type: cpu
Time: Jan 15, 2021 at 8:41pm (PST)
Duration: 1.51s, Total samples = 3.40s (225.88%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2630ms, 77.35% of 3400ms total
Dropped 12 nodes (cum <= 17ms)
Showing top 10 nodes out of 67
      flat  flat%   sum%        cum   cum%
     890ms 26.18% 26.18%     1280ms 37.65%  sync.(*Mutex).lockSlow
     590ms 17.35% 43.53%     2500ms 73.53%  context.(*cancelCtx).Done
     250ms  7.35% 50.88%      620ms 18.24%  sync.(*Mutex).Unlock (inline)
     220ms  6.47% 57.35%      220ms  6.47%  runtime.usleep
     150ms  4.41% 61.76%      150ms  4.41%  runtime.cansemacquire (inline)
     140ms  4.12% 65.88%      280ms  8.24%  runtime.lock2
     120ms  3.53% 69.41%      120ms  3.53%  runtime.futex
      90ms  2.65% 72.06%       90ms  2.65%  runtime.osyield
      90ms  2.65% 74.71%       90ms  2.65%  runtime.procyield
      90ms  2.65% 77.35%      120ms  3.53%  runtime.unlock2

If you cache the ctx.Done() value, though:

func BenchmarkContextCancelDone_Workaround(b *testing.B) {
        ctx, cancel := context.WithCancel(context.Background())
        defer cancel()
        done := ctx.Done()
        b.RunParallel(func(pb *testing.PB) {
                for pb.Next() {
                        select {
                        case <-done:
                        default:
                        }
                }
        })
}

Then it's much faster:

BenchmarkContextCancelDone
BenchmarkContextCancelDone-8                     8253436               164.1 ns/op
BenchmarkContextCancelDone_Workaround
BenchmarkContextCancelDone_Workaround-8         504514413                2.365 ns/op

Looks like this is due to @josharian's 986768d (context: lazily initialize cancelCtx done channel) 😄 which had good numbers, but didn't consider mutex contention.

@josharian elsewhere suggested we could switch cancelCtx to use an atomic.Value.

/cc @crawshaw

bradfitz · 2021-01-16T04:53:05Z

And fortunately #39351 (sync/atomic: add (*Value).Swap and (*Value).CompareAndSwap) was finally accepted after 5 years of requests (and has open CL https://go-review.googlesource.com/c/go/+/241678/), so this could use that.

I'll mark this as Go 1.17, with the assumption that that'll work out.

bradfitz · 2021-01-16T04:55:30Z

(It'd probably need to CompareAndSwap with an old value of an package-unexported sentinel channel that means "uninitialized", as CompareAndSwap per that proposal/CL doesn't accept a nil value for old)

gopherbot · 2021-01-29T23:42:14Z

Change https://golang.org/cl/288193 mentions this issue: context: reduce contention in cancelCtx.Done

josharian · 2021-01-29T23:49:15Z

This doesn't need compare and swap; we already have a mutex available for other reasons to guard writes. I took a first stab in CL 288193, but apparently I screwed something up. Will aim to fix next week...

ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 13, 2020

ianlancetaylor added this to the Backlog milestone Nov 13, 2020

ianlancetaylor added the help wanted label Nov 13, 2020

bcmills added the Performance label Nov 13, 2020

bradfitz modified the milestones: Backlog, Go1.17 Jan 16, 2021

gopherbot closed this as completed in ae1fa08 Feb 24, 2021

golang locked and limited conversation to collaborators Feb 24, 2022

gopherbot added the FrozenDueToAge label Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context: cancelCtx exclusive lock causes extreme contention #42564

context: cancelCtx exclusive lock causes extreme contention #42564

niaow commented Nov 12, 2020

davecheney commented Nov 12, 2020

bcmills commented Nov 13, 2020 •

edited

Loading

muirdm commented Nov 13, 2020

bradfitz commented Jan 16, 2021

bradfitz commented Jan 16, 2021

bradfitz commented Jan 16, 2021

gopherbot commented Jan 29, 2021

josharian commented Jan 29, 2021

context: cancelCtx exclusive lock causes extreme contention #42564

context: cancelCtx exclusive lock causes extreme contention #42564

Comments

niaow commented Nov 12, 2020

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

davecheney commented Nov 12, 2020

bcmills commented Nov 13, 2020 • edited Loading

muirdm commented Nov 13, 2020

bradfitz commented Jan 16, 2021

bradfitz commented Jan 16, 2021

bradfitz commented Jan 16, 2021

gopherbot commented Jan 29, 2021

josharian commented Jan 29, 2021

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

bcmills commented Nov 13, 2020 •

edited

Loading