Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort #10486

Merged
merged 1 commit into from
Nov 8, 2021
Merged

perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort #10486

merged 1 commit into from
Nov 8, 2021

Conversation

odeke-em
Copy link
Collaborator

@odeke-em odeke-em commented Nov 2, 2021

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

  • Before
  • Memory profiles
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
  • CPU profiles
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
  • After
  • Memory profiles
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
  • CPU profiles
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

Fixes #10487
Updates evmos/ethermint#710

@odeke-em odeke-em changed the title store/cachekv: avoid a map lookup if unnecessary plus clear maps fast perf! store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort Nov 2, 2021
@odeke-em odeke-em changed the title perf! store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort Nov 2, 2021
Copy link
Collaborator

@robert-zaremba robert-zaremba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the edits are passed to the root store and saved in DB, so we can't remove sorting.

store/cachekv/store.go Outdated Show resolved Hide resolved
@odeke-em odeke-em dismissed robert-zaremba’s stale review November 3, 2021 21:35

Reverted sort.Strings(keys) please take a look again.

@odeke-em odeke-em requested a review from ValarDragon November 3, 2021 21:40
@odeke-em
Copy link
Collaborator Author

odeke-em commented Nov 6, 2021

Kindly pinging y'all for a review/merge.

Copy link
Collaborator

@fedekunze fedekunze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, can you add a changelog entry?

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates evmos/ethermint#710
@odeke-em
Copy link
Collaborator Author

odeke-em commented Nov 8, 2021

ACK, can you add a changelog entry?

Done, thank you @fedekunze!

@odeke-em odeke-em merged commit 5399e72 into cosmos:master Nov 8, 2021
@odeke-em odeke-em deleted the store-cachekv-reduce-unnecessary-conversions-and-memory-pressure-in-Store.Write branch November 8, 2021 23:49
Comment on lines +133 to +144
// Clear the cache using the map clearing idiom
// and not allocating fresh objects.
// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
for key := range store.cache {
delete(store.cache, key)
}
for key := range store.deleted {
delete(store.deleted, key)
}
for key := range store.unsortedCache {
delete(store.unsortedCache, key)
}
Copy link
Contributor

@ValarDragon ValarDragon Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect in the general case. I've had to actually undo this in the CacheKV store myself in the Osmosis branch in a couple places.

This preserves the old capacity of the map, which makes iterating over map entries / sorting / etc. take much longer. (hashmap iteration time is proportional to its capacity, not its number of entries)

So in the Osmosis case, or a chain upgrading case, if you have one very large blcok, then many short blocks, every short block will take similar store time for iteration as the long block does.

What we did in the Osmosis branch was only use a map clearing idiom if the map is sufficiently small, otherwise we just clear it.

(This made a difference between N^2 behavior and O(N) behavior in our codebase, due to iterator things. Potentially less crucial here, but I'd still caution against this as a premature optimization)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://go-review.googlesource.com/c/go/+/110055/, the struct reinitialized and memory is cleared. So the optimization is that we save on heap memory allocation. However, if one TX will consume looooot of memory then we may have a problem (eg: huge iterator).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odeke-em can you help answer ^^

fedekunze pushed a commit to tharsis/cosmos-sdk that referenced this pull request Nov 15, 2021
fedekunze added a commit to tharsis/cosmos-sdk that referenced this pull request Nov 15, 2021
* IAVL

* perf: store/cachekv: Correct the naming and implementation for existing benchmarks (cosmos#10116)

## Description

Cref discussion in cosmos#10026, updates the CacheKV Benchmark naming and implementation to correspond to whats actually going on, and remove many irrelevant/incorrect components from being in the benchmarks timing.

Basically the old Benchmark's iterator creation was very flawed, and never hit the complex case, only repeatedly performing the best case performance of the iterator. Instead it really benchmarked iterator set and next operations.

This PR splits out the benchmarks for those two accordingly.

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [x] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [x] added a changelog entry to `CHANGELOG.md` - N/A imo
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [x] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [x] confirmed all CI checks have passed - lint failure unrelated

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed 
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

* store cosmos#10486

* supply cosmos#10393

* supply cosmos#10393

* telemetry cosmos#10334

* module account

Co-authored-by: Marko <marbar3778@yahoo.com>
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: likhita-809 <78951027+likhita-809@users.noreply.github.com>
blewater pushed a commit to e-money/cosmos-sdk that referenced this pull request Dec 8, 2021
…st (cosmos#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes cosmos#10487
Updates evmos/ethermint#710
@fedekunze
Copy link
Collaborator

@Mergifyio backport release/v0.45.x

mergify bot pushed a commit that referenced this pull request Dec 28, 2021
…st (#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md
@mergify
Copy link
Contributor

mergify bot commented Dec 28, 2021

backport release/v0.45.x

✅ Backports have been created

@fedekunze fedekunze mentioned this pull request Dec 28, 2021
12 tasks
amaury1093 pushed a commit that referenced this pull request Jan 3, 2022
…st, avoid keys sort (backport #10486) (#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
@faddat faddat mentioned this pull request Feb 28, 2022
8 tasks
JimLarson pushed a commit to agoric-labs/cosmos-sdk that referenced this pull request Jul 7, 2022
…st, avoid keys sort (backport cosmos#10486) (cosmos#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (cosmos#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes cosmos#10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
JeancarloBarrios pushed a commit to agoric-labs/cosmos-sdk that referenced this pull request Sep 28, 2024
…st, avoid keys sort (backport cosmos#10486) (cosmos#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (cosmos#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes cosmos#10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

store/cachekv: why should (*Store).Write sort the keys before deletion yet ordering doesn't seem important?
5 participants