Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robert/addr memory leak #8717

Merged
merged 15 commits into from
Mar 3, 2021
Merged

Robert/addr memory leak #8717

merged 15 commits into from
Mar 3, 2021

Conversation

robert-zaremba
Copy link
Collaborator

@robert-zaremba robert-zaremba commented Feb 26, 2021

Description

In #8694 cache for AccAddress.String() was added as a simple map without control and estimates how that map will grow. This leads to unmanageable memory growth, which could put a node under a risk.

In this PR,

  • we add a benchmark for the Address.String method.
  • we add an LRU cache instead of the map.
  • remove deferred calls
  • optimize the Address.Empty function

Before:

BenchmarkAccAddressString-4   	 4533326	       268 ns/op	     144 B/op	       3 allocs/op
BenchmarkAccAddressString-4   	 4592979	       283 ns/op	     144 B/op	       3 allocs/op

After:

BenchmarkAccAddressString-4   	 4180477	       279 ns/op	     144 B/op	       3 allocs/op
BenchmarkAccAddressString-4   	 4633738	       274 ns/op	     144 B/op	       3 allocs/op

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer
  • Review Codecov Report in the comment section below once CI passes

@robert-zaremba
Copy link
Collaborator Author

BTW: we could add same thing for ValAddress and ConsAddress.
However, to be honest, I don't see much sense adding cache for Address. There are tonnes of other, better places for optimization without using a cache. Since it was added, I wanted to fix the memory issue.

@codecov
Copy link

codecov bot commented Feb 26, 2021

Codecov Report

Merging #8717 (ecc288e) into master (30f58b5) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8717      +/-   ##
==========================================
- Coverage   61.41%   61.37%   -0.04%     
==========================================
  Files         674      671       -3     
  Lines       38399    38336      -63     
==========================================
- Hits        23581    23529      -52     
- Misses      12335    12348      +13     
+ Partials     2483     2459      -24     
Impacted Files Coverage Δ
types/address.go 65.99% <100.00%> (-0.94%) ⬇️
x/ibc/core/02-client/abci.go 60.00% <0.00%> (-24.62%) ⬇️
client/keys/migrate.go 43.47% <0.00%> (-12.86%) ⬇️
x/upgrade/keeper/grpc_query.go 50.00% <0.00%> (-8.83%) ⬇️
x/upgrade/keeper/keeper.go 68.10% <0.00%> (-6.90%) ⬇️
x/upgrade/types/plan.go 83.78% <0.00%> (-3.18%) ⬇️
x/ibc/core/02-client/types/codec.go 77.52% <0.00%> (-2.48%) ⬇️
store/cachekv/store.go 87.12% <0.00%> (-1.05%) ⬇️
x/auth/tx/builder.go 75.59% <0.00%> (-0.79%) ⬇️
simapp/app.go 85.46% <0.00%> (ø)
... and 23 more


aa2 := AccAddress{}
return bytes.Equal(aa.Bytes(), aa2.Bytes())
return aa == nil || len(aa) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

types/address.go Outdated Show resolved Hide resolved
types/address.go Outdated
var addMu sync.Mutex
var addrStrMemoize = make(map[string]string)
var addrStrMemoize, _ = simplelru.NewLRU(1000, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialization should be in an init() function, so that we can check the error returned and panic() if it is != nil.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the current design. That being said I agree with you - I will use init.

Copy link
Contributor

@alessio alessio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK on the concept from me

types/address.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alessio alessio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Copy link
Collaborator

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this change @robert-zaremba! My apologies for the late reply, I was swamped with a couple of things.

A few things:
a) Please make the []byte->string conversion zero cost and not incur allocations
b) Please revert to using defers, the Go compiler made defer dirt cheap as of Go1.13 in 2019 so these are no longer required, please see https://go.googlesource.com/proposal/+/refs/heads/master/design/34481-opencoded-defers.md which was also publicized for that release
c) To get a really good reason to benchmark the LRU just remember that even if each Bech32 address is 90 characters at most as per https://wiki.trezor.io/Bech32#:~:text=A%20Bech32%20string%20is%20at,The%20human%2Dreadable%20part.&text=The%20separator%2C%20which%20is%20always%20%221%22, to consume 1GiB of RAM, that would need to process ~12M Bech32 addresses per (1<<30)/90, am not sure how often this will happen
d) the consideration of the LRU's contention should be checked and this can be done by writing a benchmark with say 1006 (>LRU_SIZE) bech32 values, run the benchmark before and after this change, and then run benchstat before.txt after.txt

Thank you!

types/address.go Outdated Show resolved Hide resolved
types/address.go Outdated
Comment on lines 261 to 263
addrMu.Lock()
addrCache.Add(key, "")
addrMu.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to add this, I'd move all this into the defer as it was before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer is doing extra allocation ;)
If we won't add a lock here, the race detector will complain.

types/address_bench_test.go Show resolved Hide resolved
types/address.go Show resolved Hide resolved
Comment on lines 19 to 23
var str string
for i := 0; i < b.N; i++ {
str = aa.String()
}
require.NotEmpty(b, str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To better get a good benchmark in which code won't be optimized away, use a global variable that's an interface e.g.

var sink, revert interface{}
...
for i := 0; i < b.N; i++ {
   result := ...
   sink = result
}
if  sink == nil {
   b.Fatal("benchmark never ran")
}
sink = revert

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean a variable outside of any function? We are not using it in other benchmarks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well just because it wasn't done correctly before doesn't mean we shouldn't do it correctly, I certainly have submitted many other benchmarks where we set to a global sink and check that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it locally - it doesn't change anything. Are you sure we need to trick the benchmark code?
My only objection is that we make the whole thing less readable and more complex without any observable effect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure about code at times getting optimized away and that being the only way to block that. Humbly, for a bit of clarity, I work on the Go project, across almost all the packages and I am bringing this advice from the many years of working on benchmarking related code :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But why to obfuscate more if there is no difference (I verified it)? The require.NoEmpty already assures that value must not be ignored.

@robert-zaremba
Copy link
Collaborator Author

We can make more sophisticated benchmarks later (feel free to create a task for that).
I will add the unsafe conversion - but I don't think this is needed (especially if we want to use defer - which will create a new function stack)

Copy link
Collaborator

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the next round of review, @robert-zaremba!

We can make more sophisticated benchmarks later (feel free to create a task for that).

Am not sure what you mean by "sophisticated benchmarks", we need to get conclusive data on the performance and results. I did describe above how to do it. This change is adding in an LRU, replacing a global cache, whose size will only matter if we hit millions of accounts. If the performance of the LRU is worse than for the global map at scale, then we perhaps have other problems.

For benchmarks, after you run them before and after, please use benchstathttps://pkg.go.dev/golang.org/x/tools/cmd/benchstat to give a summary of the comparisons otherwise humans can't tell what changed easily. Also to ensure you get good consistent runs, please use -count=5 or even more, to like -count=20.

The benchmark that I had suggested earlier on will give us conclusive results on the impacts and changes plus what performance improvements or adverse effects that this change brings.

Thanks for the PR.

types/address.go Show resolved Hide resolved
types/address.go Show resolved Hide resolved
@robert-zaremba
Copy link
Collaborator Author

robert-zaremba commented Mar 2, 2021

@odeke-em - adding more benchmarks is going out of scope and we should have a good real world scenario here. We can do it in other issue.

The intention of this PR was to remove the memory leak. I think it's better to have this update then rolling back the cache. I suggest to create a new issue with further benchmarks.

I added one more, non trivial address to the benchmark.

types/address.go Outdated Show resolved Hide resolved
Copy link
Contributor

@amaury1093 amaury1093 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@robert-zaremba
Copy link
Collaborator Author

@odeke-em your lock on this PR is still active. Please comment if there is something wrong in this PR. Adding more benchmarks is, imho, out of scope - we would need to go into use-cases / scenario bench-marking to bring bore valuable information.

@odeke-em
Copy link
Collaborator

odeke-em commented Mar 3, 2021

Thanks for the ping, Robert! I’ll remove my review to avoid blocking you but I definitely think that without a proper and conclusive benchmark, we might not get the right results, as the point of using an LRU was to counteract hogging up memory that won’t get evicted. If the cost of using an LRU is worse that’s a problem.

@odeke-em odeke-em dismissed their stale review March 3, 2021 03:36

Differing opinions so unblocking author

@robert-zaremba
Copy link
Collaborator Author

If the cost of using an LRU is worse that’s a problem.

The benchmark (which extensively uses cache) shows that's not worse. Moreover the new code uses less allocations - this is due to avoiding defer and optimizing "empty" case.

@robert-zaremba robert-zaremba added the A:automerge Automatically merge PR once all prerequisites pass. label Mar 3, 2021
@mergify mergify bot merged commit 2864eb6 into master Mar 3, 2021
@mergify mergify bot deleted the robert/addr-memory-leak branch March 3, 2021 03:53
@colin-axner
Copy link
Contributor

The race-detector tests appears to begin failing with the merge of this pr. Maybe it is due to something else, but thought I'd mention it here

@robert-zaremba
Copy link
Collaborator Author

@colin-axner What exactly and Where do you see the test failing?
Here, in the merge checks nothing was failing.
When running make test-race I see the following things failing:

  • FAIL: TestListSnapshots (137.46s)
  • FAIL github.com/cosmos/cosmos-sdk/baseapp 468.134s

It's hard to digest the output of maket test-race. There are tonnes of lines related to leveldb....

The only interesting thing I found is the following warning:

WARNING: DATA RACE
Write at 0x00c000fbf470 by goroutine 43:
  container/list.(*List).move()
      /usr/lib/go-1.15/src/container/list/list.go:123 +0x28a
  container/list.(*List).MoveToFront()
      /usr/lib/go-1.15/src/container/list/list.go:188 +0x23f
  github.com/hashicorp/golang-lru/simplelru.(*LRU).Get()
      /home/robert/go/pkg/mod/github.com/hashicorp/golang-lru@v0.5.4/simplelru/lru.go:75 +0xed
  github.com/cosmos/cosmos-sdk/types.AccAddress.String()
      /home/robert/go/src/github.com/cosmos/cosmos-sdk/types/address.go:271 +0x152

This access is OK - because we are fine with a concurrent read, even if it's done during the read - we write data only once, and it never depends on what we read. I can add a mutex there if we want to mute that warning. What do you think?

@colin-axner
Copy link
Contributor

colin-axner commented Mar 3, 2021

Ah, I got a little confused by not looking at the full logs

I also see this (and similar errors in a couple other places):

2021-03-03T03:52:25.9594698Z --- FAIL: TestIntegrationTestSuite (102.02s)
2021-03-03T03:52:25.9595884Z     --- FAIL: TestIntegrationTestSuite/TestCLIMultisign (11.95s)
2021-03-03T03:52:25.9597152Z     --- FAIL: TestIntegrationTestSuite/TestCLISignBatch (4.64s)
2021-03-03T03:52:25.9598103Z     testing.go:1038: race detected during execution of test
2021-03-03T03:52:25.9598655Z FAIL
2021-03-03T03:52:25.9599405Z FAIL	github.com/cosmos/cosmos-sdk/x/auth/client/cli	102.710s

My main concern is having the race-detector become stuck on always failing. I never check the logs, so I imagine it could be dangerous to have a test which always fails. If someone does introduce a race condition, it could go unnoticed

I have little context on this pr, so will defer to the decisions of others here. Maybe the other test-race tests are a good enough to ensure race conditions don't get introduced

@robert-zaremba
Copy link
Collaborator Author

Yeah, the logs are huuuuge. I will add a check and let's see what will happen.

@alessio
Copy link
Contributor

alessio commented Mar 3, 2021

Oh wow - why was mergify allowed to merge something whose tests are not all green in the first place?

@marbar3778 can we do something to prevent this from happening again?

@alessio
Copy link
Contributor

alessio commented Mar 3, 2021

I think I know why. The race-detector-report job was not mandatory for PRs to be merged in.

EDIT: I fixed the branch configuration

@colin-axner
Copy link
Contributor

It should be noted that I don't think any pr's can be merged until the race detector is fixed

@alessio
Copy link
Contributor

alessio commented Mar 3, 2021

It should be noted that I don't think any pr's can be merged until the race detector is fixed

The branch configuration is now fixed, so PRs will be blocked until the race is fixed. Does that work for you?

@colin-axner
Copy link
Contributor

It is fine with me, but it might be blocking a 0.42.0 final release.

I'm hoping to remove IBC #8735 this week (so it is included in 0.42). That pr is just waiting on approval of #8624 which I guess now will wait on a race-detector fix

Overall though, I am most certainly in favor of fixing tests before merging more pr's and making the problem worse. Just hope we can do so soon

@robert-zaremba
Copy link
Collaborator Author

PR: #8767

@odeke-em odeke-em mentioned this pull request Mar 18, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:automerge Automatically merge PR once all prerequisites pass. T:Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants