Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Clear cacheKV RAM if its too large #15767

Merged
merged 4 commits into from
May 11, 2023

Conversation

ValarDragon
Copy link
Contributor

@ValarDragon ValarDragon commented Apr 9, 2023

Description

Clear the CacheKV store's RAM if its too large. If the CacheKV's caches are too large, we've likely done some linear time workload.

The old design of never clearing this memory, meant significant RAM increases (all genesis size) throughout the entire InitGenesis process, unclear to me if this RAM overhead persisted in subsequent block processing. (Sometimes 2x large portions of genesis state, if the module does iteration and has a sorted cache) Most other RAM overheads throughout init genesis, are at least module-local for extra data.

This was 20+ GB reduced in peak Osmosis RAM usage during InitGenesis from a state export. This significantly reduces the RAM pressure.

This should also come in handy for Osmosis epoch blocks.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@ValarDragon ValarDragon requested a review from a team as a code owner April 9, 2023 18:15
}
}

sort.Strings(keys)
store.deleteCaches()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after the Write call, the whole cache store is discarded usually?

Copy link
Contributor Author

@ValarDragon ValarDragon Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. If so:

  • This is still a win, for getting the data discarded earlier. (rather than being held with the underlying DB's writes as well)
  • We should just always clear everything, rather than the old capacity preservation code then

Copy link
Member

@mark-rushakoff mark-rushakoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a couple ways to improve readability here. I'm not familiar with the cachekv store yet, and I get the impression there are some subtleties I haven't fully identified yet.

store/cachekv/store.go Outdated Show resolved Hide resolved
store/cachekv/store.go Show resolved Hide resolved
keys := make([]string, 0, len(store.cache))
// Not the best. To reduce RAM pressure, we copy the values as well
// and clear out the old caches right after the copy.
sortedCache := make([]cEntry, 0, len(store.cache))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation -- I assume this slice is often oversized, as it ideally match the number of dirty entries in the store.cache. That may be worth tuning in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the entire CacheKV store should just be rewritten entirely. We perenially run into performance issues due to some wack part of its design.

Its very overly complicated, for little reason

}
}

sort.Strings(keys)
store.deleteCaches()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method would read easier if resetting the caches happened before assigning into sortedCache, even if you need to store len(store.cache) first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, you can only reset the caches, once you've grabbed a copy of the data?

store/cachekv/store.go Outdated Show resolved Hide resolved
store/cachekv/store.go Outdated Show resolved Hide resolved
@ValarDragon
Copy link
Contributor Author

Friendly bump on this PR

@tac0turtle
Copy link
Member

@kocubinski could we get your review in this?

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks good to me, it might be good to have a quick benchmark to test the difference.

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @ValarDragon. I would say the arbitrary 100k value would warrant a benchmark though.

Comment on lines +108 to +110
for key := range store.cache {
delete(store.cache, key)
}

Check warning

Code scanning / CodeQL

Iteration over map

Iteration over map may be a possible source of non-determinism
Comment on lines +111 to +113
for key := range store.unsortedCache {
delete(store.unsortedCache, key)
}

Check warning

Code scanning / CodeQL

Iteration over map

Iteration over map may be a possible source of non-determinism
@@ -93,6 +93,28 @@ func (store *Store) Delete(key []byte) {
store.setCacheValue(key, nil, true)
}

func (store *Store) resetCaches() {
if len(store.cache) > 100_000 {
Copy link
Contributor

@elias-orijtech elias-orijtech May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of #15767 (comment) consider making this branch unconditional, that is, never use the cache clearing idiom.

The clearing was introduced as part of dbb9923, so I wonder how much performance is gained by the clearing itself. If not much, that's another reason to stop clearing.

@@ -103,44 +125,40 @@ func (store *Store) Write() {
return
}

type cEntry struct {
key string
val *cValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

type cEntry struct {
    key string
    value []byte
}

to shave off a field from each cache key? val.dirty is never referenced.

@ValarDragon ValarDragon added this pull request to the merge queue May 10, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 10, 2023
@tac0turtle tac0turtle added this pull request to the merge queue May 11, 2023
Merged via the queue into cosmos:main with commit d39f645 May 11, 2023
@tac0turtle tac0turtle deleted the dev/cachekv_clear_ram branch May 11, 2023 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants