-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CborReader and CborWriter #67
Conversation
Adds pools for small buffers used for reading cbor headers and for larger buffers used when reading strings. This trades off some cpu for less pressure on the garbage collector. The benchmarks show a notable increase in cpu but allocations are amortized to near zero in many cases. Internalising the management of scratch buffers simplifies the code and allows removal/deprecation of duplicate implementations for several functions. Users will need to re-generate marshaling methods to benefit from the removal of scratch buffers from those methods. Benchstat comparison with master: name old time/op new time/op delta Marshaling-8 564ns ± 1% 1123ns ± 3% +99.16% (p=0.000 n=9+10) Unmarshaling-8 2.75µs ± 5% 3.53µs ± 4% +28.63% (p=0.000 n=10+10) LinkScan-8 744ns ± 0% 1694ns ± 1% +127.69% (p=0.000 n=10+9) Deferred-8 1.68µs ± 1% 3.90µs ± 0% +131.76% (p=0.000 n=10+9) MapMarshaling-8 317ns ± 0% 667ns ± 2% +110.55% (p=0.000 n=9+10) MapUnmarshaling-8 2.71µs ±10% 3.33µs ± 3% +23.26% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Marshaling-8 160B ± 0% 0B -100.00% (p=0.000 n=10+10) Unmarshaling-8 3.44kB ± 0% 1.96kB ± 0% -42.94% (p=0.000 n=10+10) LinkScan-8 112B ± 0% 0B -100.00% (p=0.000 n=10+10) Deferred-8 88.0B ± 0% 72.0B ± 0% -18.18% (p=0.000 n=10+10) MapMarshaling-8 48.0B ± 0% 2.0B ± 0% -95.83% (p=0.000 n=10+10) MapUnmarshaling-8 2.53kB ± 0% 1.54kB ± 0% -39.15% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Marshaling-8 10.0 ± 0% 0.0 -100.00% (p=0.000 n=10+10) Unmarshaling-8 43.0 ± 0% 21.0 ± 0% -51.16% (p=0.000 n=10+10) LinkScan-8 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) Deferred-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) MapMarshaling-8 5.00 ± 0% 2.00 ± 0% -60.00% (p=0.000 n=10+10) MapUnmarshaling-8 56.0 ± 0% 27.0 ± 0% -51.79% (p=0.000 n=10+10)
for i := range buf { | ||
buf[i] = 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't really need this, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(although I guess it likely doesn't matter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good practice to clear the buffer before returning to pool. Although we're currently careful not to read past bounds so it's not strictly necessary.
174d96c
to
da8b264
Compare
This gives a small improvement in some cases, but an 8% improvement in map marshalling speed.
It doesn't look like this escapes, but removing this saves ~5% for LinkScan, Deferred, and MapMarshaling.
utils.go
Outdated
@@ -614,12 +632,15 @@ func bufToCid(buf []byte) (cid.Cid, error) { | |||
var byteArrZero = []byte{0} | |||
|
|||
func WriteCid(w io.Writer, c cid.Cid) error { | |||
if cw, ok := w.(*CborWriter); ok { | |||
w = cw // take advantage of cbor writer scratch buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I should call NewCborWriter here
Updated comparison with master shows a much better time delta for the Marshaling benchmark
|
Results from go-hamt-ipld, comparison against master of cbor-gen. Find benchmark exercises unmarshaling pretty well.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Also, I'm seeing different perf numbers when comparing to master:
What version of go are you using? (I'm using 1.18, on an intel i7 laptop). |
1.18, Linux, Intel i7 desktop |
Gives a better tradeoff between cpu and allocations than #65 so I am closing that PR in favour of this one.
Adds CborReader and CborWriter that contain a small buffer for optimizing CBOR header reads and writes. In contrast to #65 no pool is used for header buffers, instead the reader or writer reuses a single buffer over multiple operations. A package level pool is used for 8k buffers used when reading strings. The benchmarks show an increase in cpu but allocations are substantially reduced.
Benchstat comparison with master:
There's probably some additional opportunities for refactoring around the reader/writers but I'm leaviing this change focussed on the initial performance improvements.