Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convertVarIntToBytes is heavy handed, burns at least 48Bytes, CPU cycles and causes convertLeafOp to run in microseconds instead of nanoseconds, and could take in a byte array #344

Closed
odeke-em opened this issue Dec 13, 2020 · 3 comments · Fixed by #352

Comments

@odeke-em
Copy link
Contributor

odeke-em commented Dec 13, 2020

Noticed a bunch of code that invokes convertVarIntToByteswhose internals and seems to be heavily used in the code base, stemming from (*ImmutableTree).GetNonMembership. Looking at some hits

$ git grep -n 'convertVarIntToBytes('
proof_ics23.go:98:      prefix := convertVarIntToBytes(0)
proof_ics23.go:99:      prefix = append(prefix, convertVarIntToBytes(1)...)
proof_ics23.go:100:     prefix = append(prefix, convertVarIntToBytes(version)...)
proof_ics23.go:121:             prefix := convertVarIntToBytes(int64(path[i].Height))
proof_ics23.go:122:             prefix = append(prefix, convertVarIntToBytes(path[i].Size)...)
proof_ics23.go:123:             prefix = append(prefix, convertVarIntToBytes(path[i].Version)...)
proof_ics23.go:150:func convertVarIntToBytes(orig int64) []byte {

iavl/proof_ics23.go

Lines 150 to 158 in 903e53d

func convertVarIntToBytes(orig int64) []byte {
buf := new(bytes.Buffer)
err := encodeVarint(buf, orig)
// write should not fail
if err != nil {
panic(err)
}
return buf.Bytes()
}

Notice in there that for every invocation, we create a bytes.Buffer, use it and then discard it :-), that's at least 48Bytes gone, as well as garbage collection penalty. It is heavily use by convertLeafOp and others.

Remedy

Given the successive hits identified above e.g. if proof_ics23.go

iavl/proof_ics23.go

Lines 121 to 123 in 903e53d

prefix := convertVarIntToBytes(int64(path[i].Height))
prefix = append(prefix, convertVarIntToBytes(path[i].Size)...)
prefix = append(prefix, convertVarIntToBytes(path[i].Version)...)

which is 48 * 3 = 144 bytes at least wasted

and same for

iavl/proof_ics23.go

Lines 98 to 100 in 903e53d

prefix := convertVarIntToBytes(0)
prefix = append(prefix, convertVarIntToBytes(1)...)
prefix = append(prefix, convertVarIntToBytes(version)...)

I think we can definitely do better. The functions reuse code but almost pedantically to the point of just a varint encoding needing to be written to a writer then back to a slice. We can fix this by perhaps reusing a byte array and updating the signature for convertVarintToBytes to take a byte array that's allocated once, and then reused in all those call sites. We'll pass in a byte array and not a byte slice to ensure no need for bounds checking. Perhaps this diff

diff --git a/proof_ics23.go b/proof_ics23.go
index 293270d..43c36df 100644
--- a/proof_ics23.go
+++ b/proof_ics23.go
@@ -1,7 +1,7 @@
 package iavl
 
 import (
-	"bytes"
+	"encoding/binary"
 	"fmt"
 
 	ics23 "github.com/confio/ics23/go"
@@ -95,9 +95,10 @@ func convertExistenceProof(p *RangeProof, key, value []byte) (*ics23.ExistencePr
 
 func convertLeafOp(version int64) *ics23.LeafOp {
 	// this is adapted from iavl/proof.go:proofLeafNode.Hash()
-	prefix := convertVarIntToBytes(0)
-	prefix = append(prefix, convertVarIntToBytes(1)...)
-	prefix = append(prefix, convertVarIntToBytes(version)...)
+	var varintBuf [binary.MaxVarintLen64]byte
+	prefix := convertVarIntToBytes(0, varintBuf)
+	prefix = append(prefix, convertVarIntToBytes(1, varintBuf)...)
+	prefix = append(prefix, convertVarIntToBytes(version, varintBuf)...)
 
 	return &ics23.LeafOp{
 		Hash:         ics23.HashOp_SHA256,
@@ -109,6 +110,7 @@ func convertLeafOp(version int64) *ics23.LeafOp {
 
 // we cannot get the proofInnerNode type, so we need to do the whole path in one function
 func convertInnerOps(path PathToLeaf) []*ics23.InnerOp {
+	var varintBuf [binary.MaxVarintLen64]byte
 	steps := make([]*ics23.InnerOp, 0, len(path))
 
 	// lengthByte is the length prefix prepended to each of the sha256 sub-hashes
@@ -118,9 +120,9 @@ func convertInnerOps(path PathToLeaf) []*ics23.InnerOp {
 	// we want to go up from the leaf to the root
 	for i := len(path) - 1; i >= 0; i-- {
 		// this is adapted from iavl/proof.go:proofInnerNode.Hash()
-		prefix := convertVarIntToBytes(int64(path[i].Height))
-		prefix = append(prefix, convertVarIntToBytes(path[i].Size)...)
-		prefix = append(prefix, convertVarIntToBytes(path[i].Version)...)
+		prefix := convertVarIntToBytes(int64(path[i].Height), varintBuf)
+		prefix = append(prefix, convertVarIntToBytes(path[i].Size, varintBuf)...)
+		prefix = append(prefix, convertVarIntToBytes(path[i].Version, varintBuf)...)
 
 		var suffix []byte
 		if len(path[i].Left) > 0 {
@@ -147,12 +149,7 @@ func convertInnerOps(path PathToLeaf) []*ics23.InnerOp {
 	return steps
 }
 
-func convertVarIntToBytes(orig int64) []byte {
-	buf := new(bytes.Buffer)
-	err := encodeVarint(buf, orig)
-	// write should not fail
-	if err != nil {
-		panic(err)
-	}
-	return buf.Bytes()
+func convertVarIntToBytes(orig int64, buf [binary.MaxVarintLen64]byte) []byte {
+	n := binary.PutVarint(buf[:], orig)
+	return buf[:n]
 }

Results

Given this benchmark

package iavl

import (
        "testing"
)       

var values = []struct {
        n    int64
        want int
}{      
        {0, 1},
        {1, 1},
        {100, 2},
        {127, 2},
        {128, 2},
        {1 << 29, 5},
        {-0, 1},
        {-1, 1},
        {-100, 2},
        {-127, 2},
        {-128, 2},
        {-1 << 29, 5},  
}
        
var sink interface{}
                
func BenchmarkConvertLeafOp(b *testing.B) { 
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
                for _, tt := range values {
                        sink = convertLeafOp(tt.n)
                }
        }
        if sink == nil {
                b.Fatal("Benchmark wasn't run")
        }               
        sink = nil
}

The results are dramatic, the code can now run in nanoseconds instead of microseconds, ~80% faster, with ~>=80% less allocations

$ benchstat before.txt after.txt 
name             old time/op    new time/op    delta
ConvertLeafOp-8    4.08µs ± 1%    0.83µs ± 0%  -79.59%  (p=0.000 n=9+9)

name             old alloc/op   new alloc/op   delta
ConvertLeafOp-8    5.18kB ± 0%    0.77kB ± 0%  -85.19%  (p=0.000 n=10+10)

name             old allocs/op  new allocs/op  delta
ConvertLeafOp-8       120 ± 0%        24 ± 0%  -80.00%  (p=0.000 n=10+10)

Kindly cc-ing @alessio @erikgrinaker @marbar3778

@alessio
Copy link

alessio commented Dec 13, 2020

Oh wow. Great catch indeed @odeke-em!

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Dec 13, 2020

Nice find, thanks! Feel like submitting a PR?

@AdityaSripal I think you mentioned when submitting this code that it was temporary and needed to be improved, want to do a pass over it?

@odeke-em
Copy link
Contributor Author

Thanks @alessio @erikgrinaker. Sure, I've mailed #352. I prefer to let the developers mail the PRs despite my bug reports being full PRs, so that I can de-pollute commits and can be detached and let the folks running the projects have control over commits :-) It's odd to explain but it ensures
that I am offering tips that can be ignored if the maintainers chose to.

erikgrinaker pushed a commit that referenced this issue Dec 17, 2020
Noticed while auditing and profiling dependencies of cosmos-sdk, that
convertVarIntToBytes, while reusing already implemented code, it
was expensively creating a bytes.Buffer (40B on 64-bit architectures)
returning a result and discarding it, yet that code was called
3 times successively at least.

By reusing a byte array (not a slice, to ensure bounds checks
eliminations by the compiler), we are able to dramatically improve
performance, taking it from ~4µs down to 850ns (~4.5X reduction),
reduce allocations by >=~80% in every dimension:

```shell
$ benchstat before.txt after.txt
name             old time/op    new time/op    delta
ConvertLeafOp-8    3.90µs ± 1%    0.85µs ± 4%  -78.12%  (p=0.000 n=10+10)

name             old alloc/op   new alloc/op   delta
ConvertLeafOp-8    5.18kB ± 0%    0.77kB ± 0%  -85.19%  (p=0.000 n=10+10)

name             old allocs/op  new allocs/op  delta
ConvertLeafOp-8       120 ± 0%        24 ± 0%  -80.00%  (p=0.000 n=10+10)
```

Fixes #344
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants