add BenchmarkARCCacheConcurrentOps #70

mvdan · 2021-04-28T12:54:18Z

No description provided.

mvdan · 2021-04-28T13:11:28Z

Sample output:

$ go test -run=- -bench=ARCCacheConcurrentOps -cpu=1,64,1024 -benchtime=1s
goos: linux
goarch: amd64
pkg: github.com/ipfs/go-ipfs-blockstore
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
BenchmarkARCCacheConcurrentOps/PutDelete              	  972183	      1082 ns/op	     309 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/PutDelete-64           	  944838	      1260 ns/op	     309 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/PutDelete-1024         	  858927	      1386 ns/op	     315 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/GetDelete              	 2808345	       423.0 ns/op	     113 B/op	       3 allocs/op
BenchmarkARCCacheConcurrentOps/GetDelete-64           	 1969640	       595.9 ns/op	     113 B/op	       3 allocs/op
BenchmarkARCCacheConcurrentOps/GetDelete-1024         	 1826703	       637.6 ns/op	     117 B/op	       3 allocs/op
BenchmarkARCCacheConcurrentOps/GetPut                 	 1171681	      1025 ns/op	     313 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/GetPut-64              	 1082220	      1115 ns/op	     313 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/GetPut-1024            	  962059	      1246 ns/op	     319 B/op	       7 allocs/op

Note that I capped my CPU frequency at 70%, to prevent turbo boost and throttling from messing with the numbers.

Main changes from #65:

One single benchmark, which helps reuse code.
One sub-benchmark for each pair of racy operations.
All sub-benchmarks get a new ARC cache, pre-filled with half of the 4k dummy blocks.
All concurrency is within the number of goroutines spawned by the parallel benchmark, controlled by -cpu.
Each of the benchmark goroutines will randomly perform an operation (e.g. Put or Delete) on a random dummy block.
No artificial delays on the store. That's something done in fix(arc): Per-CID locking. Map CID to lock. #66, but it's not clear to me why we need them.

The number of blocks is hard-coded right now, but it could be turned into a flag if we want. The nice thing about reusing -cpu for concurrency is that it integrates very well with how one runs benchmarks and sees their results, and we don't have to worry about spawning extra goroutines of our own.

I run the benchmarks with 4k blocks, and 1, 64, and 1024 concurrent goroutines. You can see that the numbers are fairly similar, while the time/op slowly increases with the number of goroutines. The numbers are also very stable; they don't change much at all running for longer with -benchtime=3s, and six consecutive runs with -count=6 shows little variance with benchstat:

name                                  time/op
ARCCacheConcurrentOps/PutDelete       1.09µs ± 1%
ARCCacheConcurrentOps/PutDelete-64    1.26µs ± 1%
ARCCacheConcurrentOps/PutDelete-1024  1.39µs ± 0%
ARCCacheConcurrentOps/GetDelete        420ns ± 1%
ARCCacheConcurrentOps/GetDelete-64     602ns ± 1%
ARCCacheConcurrentOps/GetDelete-1024   639ns ± 1%
ARCCacheConcurrentOps/GetPut          1.04µs ± 1%
ARCCacheConcurrentOps/GetPut-64       1.12µs ± 0%
ARCCacheConcurrentOps/GetPut-1024     1.23µs ± 0%

@frrist @warpfork @Stebalien @iand I'd love your thoughts. If the approach sounds good, I can rebase the other PRs atop this one and post before/after benchmark numbers for each one of them on the original issue thread.

mvdan · 2021-04-28T13:14:03Z

Also, there's no commit message for now, but I'll write one when I move this out of draft state. The code is also pretty well documented.

iand · 2021-04-28T13:49:50Z

The timings per op seem sensitive to the number of blocks which I don't have a good explanation for. For example I changed to use numBlocks = 4 << 14 and ns/op doubled or higher across the board while bytes and allocs per op remained the same.

Maybe the default 4k setting is small enough to cause contention on the blocks within the 1s timeframe, but 64k is not. I don't know anything about the locking strategy already in place for the blockstore so I am just guessing.

mvdan · 2021-04-28T13:59:52Z

Fewer blocks will cause more collisions. Some pairs of ops like Get+Delete also gradually remove all blocks from the cache, so the smaller the number of blocks, the quicker that happens.

I'm also not particularly familiar with the internal architecture or what a good default for the number of blocks should be. The original PR used one million, and that definitely feels too large - even with 1k goroutines and tens of thousands of iterations, the chances for concurrent collisions are practically none. So I felt like we needed a number of blocks that was closer to the number of concurrent goroutines.

…he-bench add BenchmarkARCCacheConcurrentOps This commit was moved from ipfs/go-ipfs-blockstore@10b7bf0

This comment has been minimized.

Sign in to view

mvdan marked this pull request as draft April 28, 2021 12:54

mvdan force-pushed the mvdan/arc-cache-bench branch from 5056b98 to f1f191b Compare April 28, 2021 13:14

mvdan mentioned this pull request Apr 29, 2021

Delete race in ARC cache #64

Closed

mvdan marked this pull request as ready for review May 3, 2021 20:35

mvdan changed the title ~~WIP: add BenchmarkARCCacheConcurrentOps~~ add BenchmarkARCCacheConcurrentOps May 3, 2021

frrist mentioned this pull request May 4, 2021

test: add simple arc cache benchmarks #65

Closed

2 tasks

WIP: add BenchmarkARCCacheConcurrentOps

b3408ff

Stebalien force-pushed the mvdan/arc-cache-bench branch from f1f191b to b3408ff Compare May 4, 2021 22:42

Stebalien approved these changes May 4, 2021

View reviewed changes

Stebalien merged commit 10b7bf0 into master May 4, 2021

Stebalien deleted the mvdan/arc-cache-bench branch May 4, 2021 22:43

aschmahmann mentioned this pull request Dec 13, 2021

Release v0.12 ipfs/kubo#8344

Closed

59 tasks

Jorropo pushed a commit to ipfs/go-libipfs-rapide that referenced this pull request Mar 23, 2023

Merge pull request ipfs/go-ipfs-blockstore#70 from ipfs/mvdan/arc-cac…

e74b9b5

…he-bench add BenchmarkARCCacheConcurrentOps This commit was moved from ipfs/go-ipfs-blockstore@10b7bf0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add BenchmarkARCCacheConcurrentOps #70

add BenchmarkARCCacheConcurrentOps #70

mvdan commented Apr 28, 2021 •

edited

Loading

This comment has been minimized.

mvdan commented Apr 28, 2021

mvdan commented Apr 28, 2021

iand commented Apr 28, 2021

mvdan commented Apr 28, 2021

add BenchmarkARCCacheConcurrentOps #70

add BenchmarkARCCacheConcurrentOps #70

Conversation

mvdan commented Apr 28, 2021 • edited Loading

This comment has been minimized.

mvdan commented Apr 28, 2021

mvdan commented Apr 28, 2021

iand commented Apr 28, 2021

mvdan commented Apr 28, 2021

mvdan commented Apr 28, 2021 •

edited

Loading