Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state: replace fastcache with gc-friendly structure #74

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

yoomee1313
Copy link
Contributor

@yoomee1313 yoomee1313 commented Aug 22, 2024

Proposed changes

  • This PR is brought from ethereum to fix the oom issue when calling debug api repeatedly. (e.g. once every 10 seconds)

Types of changes

Please put an x in the boxes related to your change.

  • Bugfix
  • New feature or enhancement
  • Others

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING GUIDELINES doc
  • I have read the CLA and signed by comment I have read the CLA Document and I hereby sign the CLA in first time contribute
  • Lint and unit tests pass locally with my changes ($ make test)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Related issues

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@yoomee1313 yoomee1313 self-assigned this Aug 22, 2024
@yoomee1313 yoomee1313 marked this pull request as ready for review August 23, 2024 10:13
@yoomee1313 yoomee1313 mentioned this pull request Aug 25, 2024
Copy link
Collaborator

@ian0371 ian0371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are we going to replace all LRU cache (hashicorp/golang-lru) with LRU in the future?
  2. Do we need blob_lru?

@yoomee1313
Copy link
Contributor Author

yoomee1313 commented Sep 5, 2024

@ian0371 In my opinion, we can still consider using hashicorp lru implementation. In terms of usability, it seems to be a matter of preference because I don't see much difference in performance.
Check this comment: ethereum/go-ethereum#26162 (comment).
About the blob_lru, I don't know why name is blob_lru, but this PR uses SizeConstrainedCache struct.

Copy link
Contributor

@hyunsooda hyunsooda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BasicLRU = LRU implementation (not thread safe)
SizeConstrainedCache = Wrapped BaiscLRU (thread safe)
Cache(lru.go) = Where is this to be used?

@hyunsooda
Copy link
Contributor

Just to clarify, it seems like #73 is key to resolving the OOM issue. However, it may not be directly related to the current OOM context. Could you confirm?

@yoomee1313
Copy link
Contributor Author

Cache(lru.go) = Where is this to be used?

This is the hashicorp lru cache and it is used for codeSizeCache of database cachingDB. It can be refactored to use basicLRU later.

	return &cachingDB{
		db:            statedb.NewDatabaseWithNewCache(db, cacheConfig),
		codeSizeCache: getCodeSizeCache(),
		codeCache:     lru.NewSizeConstrainedCache[common.Hash, []byte](codeCacheSize),
	}

Just to clarify, it seems like #73 is key to resolving the OOM issue. However, it may not be directly related to the current OOM context. Could you confirm?

Please check this comment ethereum/go-ethereum#26092 (comment). fastcache is the key problem of leak, and it seems heavily triggered by stateAtBlock. If my understanding is different from you, please let me know.
FYI, when analyzing the heap profiling of the leaking node, fastcache consumes up to 30% of memory usage.
image

@hyunsooda
Copy link
Contributor

Cache(lru.go) = Where is this to be used?

The codeCache type is SizeConstrainedCache, defined in blob_lur.go. I confused that where lru.Cache is used and now found it at https://github.com/kaiachain/kaia/pull/74/files#diff-37b6e7ed4f299b87b357b54376123393b34ee0699fc035f25bc0621627dcc6baR187. No function name or usage changed, thus no further change happened for this.

Please check this comment ethereum/go-ethereum#26092 (comment). fastcache is the key problem of leak, and it seems heavily triggered by stateAtBlock. If my understanding is different from you, please let me know.
FYI, when analyzing the heap profiling of the leaking node, fastcache consumes up to 30% of memory usage.

NewDatabaseWithExistingCache was dominated by trace* functions. Now somethings got it. Thanks.

@blukat29 blukat29 merged commit d09c294 into kaiachain:dev Sep 9, 2024
11 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants