Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: abstractions for snapshot and pruning; snapshot intervals eventually pruned; unit tests #11496

Merged

Conversation

p0mvn
Copy link
Member

@p0mvn p0mvn commented Mar 29, 2022

Description

Upstreaming: osmosis-labs#140

We've run into issues where attempting to prune a height under snapshot started being too frequent. To temporarily mitigate, we had to require node operators to have a large pruning-keep-recent.

This PR fixes this problem by avoiding pruning snapshot heights until after a snapshot is complete.

In addition, an abstraction for pruning (pruning/manager.go) was added. Also, unit tested rigorously from config to base app.

Marked as API breaking because changed the following:

  • snapshot options
  • pruning options
  • baseapp
  • type Committer interface
  • type Snapshotter interface

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@codecov
Copy link

codecov bot commented Mar 29, 2022

Codecov Report

Merging #11496 (e53691a) into master (d4dd444) will increase coverage by 0.23%.
The diff coverage is 73.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11496      +/-   ##
==========================================
+ Coverage   65.91%   66.14%   +0.23%     
==========================================
  Files         667      678      +11     
  Lines       70466    72010    +1544     
==========================================
+ Hits        46448    47632    +1184     
- Misses      21323    21598     +275     
- Partials     2695     2780      +85     
Impacted Files Coverage Δ
server/config/toml.go 66.66% <ø> (ø)
server/mock/store.go 19.38% <0.00%> (-0.83%) ⬇️
server/rollback.go 0.00% <0.00%> (ø)
simapp/simd/cmd/root.go 71.48% <0.00%> (-0.93%) ⬇️
store/iavl/store.go 67.31% <0.00%> (ø)
store/mem/store.go 82.35% <0.00%> (-5.15%) ⬇️
store/rootmulti/dbadapter.go 0.00% <0.00%> (ø)
store/types/store.go 70.73% <ø> (ø)
store/v2alpha1/mem/store.go 84.61% <0.00%> (ø)
store/v2alpha1/transient/store.go 86.66% <0.00%> (ø)
... and 31 more

@p0mvn p0mvn changed the title refactor: abstractions for snapshot and pruning; snapshot intervals eventually pruned; unit tests refactor!: abstractions for snapshot and pruning; snapshot intervals eventually pruned; unit tests Mar 30, 2022
@alexanderbez
Copy link
Contributor

@p0mvn is this ready for review?

@p0mvn
Copy link
Member Author

p0mvn commented Mar 31, 2022

@alexanderbez not yet, need to fix one more test. I will ping you once ready

@p0mvn p0mvn marked this pull request as ready for review March 31, 2022 18:21
@p0mvn
Copy link
Member Author

p0mvn commented Mar 31, 2022

The test I referred to above is broken on master so this is ready. @alexanderbez

I have some questions about v2alpha1. I did not refactor it. I think it might be best to do it in a separate PR to limit the scope. Would appreciate reviewer feedback on this

server/pruning.go Outdated Show resolved Hide resolved
server/pruning_test.go Outdated Show resolved Hide resolved
server/start.go Outdated Show resolved Hide resolved
server/start.go Outdated Show resolved Hide resolved

The strategies are configured in `app.toml`:
pruning = "< strategy >" # where the options are:
- `default`: only the last 100,000 states(approximately 1 week worth of state) are kept; pruning at 100 block intervals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed the default value some time ago. Did you change it back to 100k in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 100k in Osmosis. I missed this difference when cherry-picking. Changed back to the updated default value of 362880.

What was the reason for this change?

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty large PR to review in detail TBH. But since it was merged in Osmosis, I see no reason it can't be merged here.

@amaury1093 amaury1093 mentioned this pull request Apr 4, 2022
56 tasks
@p0mvn p0mvn force-pushed the roman/upstream/snapshot-pruning-refactor branch from 74d245d to b57d056 Compare April 6, 2022 00:26
@github-actions github-actions bot added the T: CI label Apr 6, 2022
@@ -1,31 +1,17 @@
name: Lint
# Lint runs golangci-lint over the entire cosmos-sdk repository
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we revert changing this file, its setup to not run if no go code is down to not clog and slow down ci for docs prs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an accidental change, removed now. Thanks for catching that

@p0mvn p0mvn force-pushed the roman/upstream/snapshot-pruning-refactor branch from 519c590 to b57d056 Compare April 6, 2022 17:13
@github-actions github-actions bot removed the T: CI label Apr 6, 2022
Copy link
Collaborator

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this change @p0mvn. I've added a review, please take a look.

if !ok {
return errors.New("rootmulti store is required")
}
if err := rms.GetPruning().Validate(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simply do: return rms.GetPruning().Validate()
and that will remove the need for the 3+ lines below.

func (m *Manager) flushPruningSnapshotHeights(batch dbm.Batch) {
m.mx.Lock()
defer m.mx.Unlock()
bz := make([]byte, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can preallocate this buffer by doing:
bz := make([]byte, 0, m.pruneSnapshotHeights.Len()*8)

pruning/manager.go Show resolved Hide resolved
}

func (m *Manager) flushPruningHeights(batch dbm.Batch) {
bz := make([]byte, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bz := make([]byte, 0, len(m.pruneHeights)*8)

offset += 8
}

if pruneSnapshotHeights.Len() > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will never fail given that offset which starts off as 0 is ALWAYS less than len(bz) which you just tested right above and have as the invariant for the loop. You will always add at least 1 element to the list.

snapshots/manager.go Show resolved Hide resolved
for key, store := range rs.stores {
// If the store is wrapped with an inter-block cache, we must first unwrap
// it to get the underlying IAVL store.
if store.GetStoreType() == types.StoreTypeIAVL {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reverse conditional this and avoid the nesting so

  if store... != types.StoreTypeIAVL. {
     continue
  }

  ...


store = rs.GetCommitKVStore(key)

if err := store.(*iavl.Store).DeleteVersions(pruningHeights...); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here if you'd like above reverse conditional

// GetStoreByName performs a lookup of a StoreKey given a store name typically
func (rs *Store) handlePruning(version int64) error {
rs.pruningManager.HandleHeight(version - 1) // we should never prune the current version.
if rs.pruningManager.ShouldPruneAtHeight(version) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reverse conditional this to return if !rs...

@@ -26,11 +21,18 @@ import (
"github.com/cosmos/cosmos-sdk/store/transient"
"github.com/cosmos/cosmos-sdk/store/types"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"

iavltree "github.com/cosmos/iavl"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this block imports as they were idiomatic before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a "github.com/cosmos/cosmos-sdk/store/iavl" on top. So we need a rename to avoid the clash.

@alexanderbez
Copy link
Contributor

@odeke-em provided excellent feedback here. TY!

@p0mvn
Copy link
Member Author

p0mvn commented Apr 8, 2022

@odeke-em Thank you for your review. All comments are now addressed. Please take a look when you have time

Copy link
Collaborator

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my feedback @p0mvn! Nice work and LGTM!

server/start.go Outdated
@@ -25,13 +25,14 @@ import (
"github.com/cosmos/cosmos-sdk/client"
"github.com/cosmos/cosmos-sdk/client/flags"
"github.com/cosmos/cosmos-sdk/codec"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this newline though as this is the same block of imports :-)

snapshots/manager.go Show resolved Hide resolved
snapshots/manager.go Show resolved Hide resolved
@@ -559,7 +544,47 @@ func (rs *Store) GetKVStore(key types.StoreKey) types.KVStore {
return store
}

// GetStoreByName performs a lookup of a StoreKey given a store name typically
func (rs *Store) handlePruning(version int64) error {
rs.pruningManager.HandleHeight(version - 1) // we should never prune the current version.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please check that this "version" is always greater than 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version can be 1. However, that is not a problem because we always require keep-recent > 2 so a 0 height never gets pruned. Just in case this ever gets changed, I added a guard against previousHeight == 0 in HandleHeight

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if version becomes negative? Checking for == 0 only guards against that single value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not possible. The only valid values are 0 and greater

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always a good idea to practice defensive programming. As version is of type int64 it's possible this method is called with a value of 0 or -1 or -123456. If HandleHeight has specific constraints on the values it accepts, then this method should enforce those constraints before calling the method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to cover all possible inputs, table tests TestHandleHeight_Inputs to test all possible behaviors, and updated godoc

Comment on lines 946 to 947
err := rs.pruneStores()
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, you can make a compound one-liner

if err := rs.pruneStores(); err != nil {
   panic(err)
}

@p0mvn p0mvn force-pushed the roman/upstream/snapshot-pruning-refactor branch from 062847f to d43a785 Compare April 8, 2022 22:16
@p0mvn
Copy link
Member Author

p0mvn commented Apr 20, 2022

I tested all these changes on Osmosis v7.x by cherry-picking all relevant commits onto the desired branch. Works exactly as expected.

The summary of the tests can be seen here: osmosis-labs#184 (comment)

@alexanderbez Please let me know how we should proceed. I'm ready to merge this if you agree that the tests are acceptable

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We need to undo the changelog changes. Then we can proceed with merging.

Also, tbh, I haven't fully reviewed this PR in complete detail -- it's just way too large. So I'm mainly giving an ACK as it's been seen to work reliably for Osmosis.

snapshots/manager.go Outdated Show resolved Hide resolved
@p0mvn p0mvn force-pushed the roman/upstream/snapshot-pruning-refactor branch from a65917f to 073b664 Compare April 21, 2022 14:10
CHANGELOG.md Outdated Show resolved Hide resolved
@alexanderbez
Copy link
Contributor

Will merge once CI passes 👍

alexanderbez and others added 2 commits April 21, 2022 14:52
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
@alexanderbez alexanderbez merged commit 42f8d45 into cosmos:master Apr 21, 2022
@alexanderbez alexanderbez deleted the roman/upstream/snapshot-pruning-refactor branch April 21, 2022 19:30
mkXultra added a commit to UnUniFi/chain that referenced this pull request Jul 11, 2022
mkXultra added a commit to UnUniFi/chain that referenced this pull request Jul 14, 2022
larry0x pushed a commit to larry0x/cosmos-sdk that referenced this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants