Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memmetrics: simplify locking and solve data race #209

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sbunce
Copy link

@sbunce sbunce commented Nov 5, 2020

This package was updating counters concurrently because of incorrect
locking. There's not really a reason to have two separate locks or try
to optimize with RW locks.

This change replaces the two RW locks with one exclusive lock. The
RTMetrics struct methods always acquire this exclusive lock. This is
simple and will be easier to keep correct as the code evolves.

This package was updating counters concurrently because of incorrect
locking. There's not really a reason to have two separate locks or try
to optimize with RW locks.

This change replaces the two RW locks with one exclusive lock. The
RTMetrics struct methods always acquire this exclusive lock. This is
simple and will be easier to keep correct as the code evolves.
@sbunce sbunce marked this pull request as ready for review November 5, 2020 23:22
Copy link

@dustin-decker dustin-decker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

On a 3ghz intel CPU we do zero allocs and record takes 255ns.
@sbunce
Copy link
Author

sbunce commented Nov 6, 2020

Here is the performance before/after with the benchmark I added.

We went from 280ns -> 255ns on my intel 3ghz CPU. We got a little faster by switching from RWMutex to Mutex. The benchmark I added isn't concurrent. But this is sufficient evidence that we're not introducing any performance issue with this change. 255ns means we can lock around 11.7 million times per second which is much higher than any HTTP request rates.

Before this change.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -count=1 -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20    	 4032144	       280 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/vulcand/oxy/memmetrics	1.463s

After this change.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -count=1 -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20    	 4430594	       253 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/vulcand/oxy/memmetrics	1.431s

@sylr
Copy link
Contributor

sylr commented Nov 7, 2020

Maybe add a concurrent benchmark ?

func BenchmarkRecordConcurrently(b *testing.B) {
	b.ReportAllocs()

	rr, err := NewRTMetrics(RTClock(testutils.GetClock()))
	require.NoError(b, err)

	// warm up metrics. Adding a new code can do allocations, but in the steady
	// state recording a code is cheap. We want to measure the steady state.
	const codes = 100
	for code := 0; code < codes; code++ {
		rr.Record(code, time.Second)
	}

	concurrency := runtime.NumCPU() * 2
	wg := sync.WaitGroup{}
	wg.Add(concurrency)

	b.ResetTimer()
	for i := 0; i < concurrency; i++ {
		go func() {
			for j := 0; j < b.N; j++ {
				rr.Record(j%codes, time.Second)
			}
			wg.Done()
		}()
	}

	wg.Wait()
}

@sbunce
Copy link
Author

sbunce commented Nov 9, 2020

Our IT guys upgraded my CPU and I didn't even know it. I have 10 cores 20 threads at 3.3Ghz now. (the previous benchmarks were also on this CPU)
Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz

Good idea to add a concurrent benchmark because the context in which this package will be used should be highly concurrent.

I switched the benchmark so that b.N is divided among goroutines so we can do more of an apples to apples comparison. This makes it so the same number of results are recorded in the concurrent and non-concurrent test. I also set concurrency (number of goroutines in the benchmark) equal to the number of CPUs (hardware threads). The goroutines will end up having affinity to a hardware thread, so having extra concurrency would add more scheduler overhead. I tried setting concurrency double to number of CPUs and it only made like 40ns/op difference.

The poor scaling with CPU count of RWMutex is explained here.
golang/go#17973

Here is before the locking change. We observe a 6.8x slowdown with concurrency.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20                	 4036707	       281 ns/op	       0 B/op	       0 allocs/op
BenchmarkRecordConcurrently-20    	  919545	      1929 ns/op	       0 B/op	       0 allocs/op
--- BENCH: BenchmarkRecordConcurrently-20
    roundtrip_test.go:48: NumCPU: 20, Concurrency: 20, GOMAXPROCS: 20
PASS
ok  	github.com/vulcand/oxy/memmetrics	3.260s

Here is after the locking change. We observe a 4x slowdown with concurrency. We're roughly twice as efficient as the old code with RWMutex.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20                	 4408964	       255 ns/op	       0 B/op	       0 allocs/op
BenchmarkRecordConcurrently-20    	 1221457	      1009 ns/op	       0 B/op	       0 allocs/op
--- BENCH: BenchmarkRecordConcurrently-20
    roundtrip_test.go:48: NumCPU: 20, Concurrency: 20, GOMAXPROCS: 20
PASS
ok  	github.com/vulcand/oxy/memmetrics	3.669s

@ldez ldez self-requested a review November 9, 2020 18:49
@dreamstick
Copy link

dreamstick commented Apr 3, 2023

So this issue has not been resolved yet? i have the same issue. i think this issue may cause inaccurate statistics? why not resolve thie issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants