You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
convertVarIntToBytes is heavy handed, burns at least 48Bytes, CPU cycles and causes convertLeafOp to run in microseconds instead of nanoseconds, and could take in a byte array
#344
Closed
odeke-em opened this issue
Dec 13, 2020
· 3 comments
· Fixed by #352
Noticed a bunch of code that invokes convertVarIntToByteswhose internals and seems to be heavily used in the code base, stemming from (*ImmutableTree).GetNonMembership. Looking at some hits
Notice in there that for every invocation, we create a bytes.Buffer, use it and then discard it :-), that's at least 48Bytes gone, as well as garbage collection penalty. It is heavily use by convertLeafOp and others.
Remedy
Given the successive hits identified above e.g. if proof_ics23.go
I think we can definitely do better. The functions reuse code but almost pedantically to the point of just a varint encoding needing to be written to a writer then back to a slice. We can fix this by perhaps reusing a byte array and updating the signature for convertVarintToBytes to take a byte array that's allocated once, and then reused in all those call sites. We'll pass in a byte array and not a byte slice to ensure no need for bounds checking. Perhaps this diff
diff --git a/proof_ics23.go b/proof_ics23.go
index 293270d..43c36df 100644
--- a/proof_ics23.go+++ b/proof_ics23.go@@ -1,7 +1,7 @@
package iavl
import (
- "bytes"+ "encoding/binary"
"fmt"
ics23 "github.com/confio/ics23/go"
@@ -95,9 +95,10 @@ func convertExistenceProof(p *RangeProof, key, value []byte) (*ics23.ExistencePr
func convertLeafOp(version int64) *ics23.LeafOp {
// this is adapted from iavl/proof.go:proofLeafNode.Hash()
- prefix := convertVarIntToBytes(0)- prefix = append(prefix, convertVarIntToBytes(1)...)- prefix = append(prefix, convertVarIntToBytes(version)...)+ var varintBuf [binary.MaxVarintLen64]byte+ prefix := convertVarIntToBytes(0, varintBuf)+ prefix = append(prefix, convertVarIntToBytes(1, varintBuf)...)+ prefix = append(prefix, convertVarIntToBytes(version, varintBuf)...)
return &ics23.LeafOp{
Hash: ics23.HashOp_SHA256,
@@ -109,6 +110,7 @@ func convertLeafOp(version int64) *ics23.LeafOp {
// we cannot get the proofInnerNode type, so we need to do the whole path in one function
func convertInnerOps(path PathToLeaf) []*ics23.InnerOp {
+ var varintBuf [binary.MaxVarintLen64]byte
steps := make([]*ics23.InnerOp, 0, len(path))
// lengthByte is the length prefix prepended to each of the sha256 sub-hashes
@@ -118,9 +120,9 @@ func convertInnerOps(path PathToLeaf) []*ics23.InnerOp {
// we want to go up from the leaf to the root
for i := len(path) - 1; i >= 0; i-- {
// this is adapted from iavl/proof.go:proofInnerNode.Hash()
- prefix := convertVarIntToBytes(int64(path[i].Height))- prefix = append(prefix, convertVarIntToBytes(path[i].Size)...)- prefix = append(prefix, convertVarIntToBytes(path[i].Version)...)+ prefix := convertVarIntToBytes(int64(path[i].Height), varintBuf)+ prefix = append(prefix, convertVarIntToBytes(path[i].Size, varintBuf)...)+ prefix = append(prefix, convertVarIntToBytes(path[i].Version, varintBuf)...)
var suffix []byte
if len(path[i].Left) > 0 {
@@ -147,12 +149,7 @@ func convertInnerOps(path PathToLeaf) []*ics23.InnerOp {
return steps
}
-func convertVarIntToBytes(orig int64) []byte {- buf := new(bytes.Buffer)- err := encodeVarint(buf, orig)- // write should not fail- if err != nil {- panic(err)- }- return buf.Bytes()+func convertVarIntToBytes(orig int64, buf [binary.MaxVarintLen64]byte) []byte {+ n := binary.PutVarint(buf[:], orig)+ return buf[:n]
}
Thanks @alessio@erikgrinaker. Sure, I've mailed #352. I prefer to let the developers mail the PRs despite my bug reports being full PRs, so that I can de-pollute commits and can be detached and let the folks running the projects have control over commits :-) It's odd to explain but it ensures
that I am offering tips that can be ignored if the maintainers chose to.
Noticed while auditing and profiling dependencies of cosmos-sdk, that
convertVarIntToBytes, while reusing already implemented code, it
was expensively creating a bytes.Buffer (40B on 64-bit architectures)
returning a result and discarding it, yet that code was called
3 times successively at least.
By reusing a byte array (not a slice, to ensure bounds checks
eliminations by the compiler), we are able to dramatically improve
performance, taking it from ~4µs down to 850ns (~4.5X reduction),
reduce allocations by >=~80% in every dimension:
```shell
$ benchstat before.txt after.txt
name old time/op new time/op delta
ConvertLeafOp-8 3.90µs ± 1% 0.85µs ± 4% -78.12% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
ConvertLeafOp-8 5.18kB ± 0% 0.77kB ± 0% -85.19% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
ConvertLeafOp-8 120 ± 0% 24 ± 0% -80.00% (p=0.000 n=10+10)
```
Fixes#344
Noticed a bunch of code that invokes convertVarIntToByteswhose internals and seems to be heavily used in the code base, stemming from (*ImmutableTree).GetNonMembership. Looking at some hits
$ git grep -n 'convertVarIntToBytes(' proof_ics23.go:98: prefix := convertVarIntToBytes(0) proof_ics23.go:99: prefix = append(prefix, convertVarIntToBytes(1)...) proof_ics23.go:100: prefix = append(prefix, convertVarIntToBytes(version)...) proof_ics23.go:121: prefix := convertVarIntToBytes(int64(path[i].Height)) proof_ics23.go:122: prefix = append(prefix, convertVarIntToBytes(path[i].Size)...) proof_ics23.go:123: prefix = append(prefix, convertVarIntToBytes(path[i].Version)...) proof_ics23.go:150:func convertVarIntToBytes(orig int64) []byte {
iavl/proof_ics23.go
Lines 150 to 158 in 903e53d
Notice in there that for every invocation, we create a bytes.Buffer, use it and then discard it :-), that's at least 48Bytes gone, as well as garbage collection penalty. It is heavily use by convertLeafOp and others.
Remedy
Given the successive hits identified above e.g. if proof_ics23.go
iavl/proof_ics23.go
Lines 121 to 123 in 903e53d
which is 48 * 3 = 144 bytes at least wasted
and same for
iavl/proof_ics23.go
Lines 98 to 100 in 903e53d
I think we can definitely do better. The functions reuse code but almost pedantically to the point of just a varint encoding needing to be written to a writer then back to a slice. We can fix this by perhaps reusing a byte array and updating the signature for convertVarintToBytes to take a byte array that's allocated once, and then reused in all those call sites. We'll pass in a byte array and not a byte slice to ensure no need for bounds checking. Perhaps this diff
Results
Given this benchmark
The results are dramatic, the code can now run in nanoseconds instead of microseconds, ~80% faster, with ~>=80% less allocations
Kindly cc-ing @alessio @erikgrinaker @marbar3778
The text was updated successfully, but these errors were encountered: