Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning Refactor #6475

Merged
merged 18 commits into from
Jun 22, 2020
Merged

Pruning Refactor #6475

merged 18 commits into from
Jun 22, 2020

Conversation

alexanderbez
Copy link
Contributor

@alexanderbez alexanderbez commented Jun 20, 2020

Description

In order to re-introduce the old pruning behavior while also attempting to address concerns around pruning, the following changes are made. Note, that this process fundamentally requires a few client-facing API breaking changes. In order to backport this to the 0.38 line, we must be OK with this (@zmanian @jackzampolin).

Client Changes:

  • Remove current pruning options as the pruning behavior is reverting back to SDK 0.371
    • Change "syncable" to "default" -- the former doesn't make any sense
    • Re-introduce keep-recent and keep-every options
  • Introduce new interval option to allow operators to decide when the pruning actually happens

Protocol changes

  • Pruning no longer happens in iavl/store.go
  • Pruning now occurs directly at the root multi-store. For every Commit, we commit stores as we typically do. However, we bubble up the existing pruning check and execution from iavl/store.go to here.
  • Instead of potentially pruning a height at every block, we collect heights to prune in-memory and prune them in batches at every interval using MutableTree#DeleteVersions. This should yield at least a moderate performance gain especially with some changes @erikgrinaker will be making to orphans.

Example:

With keepRecent=100;keepEvery=50;interval=10 and at current height is 500, we'd have the following heights on disk: 400-500, 350, 300, 250, 200, 150, 100, 50. When the next height, 501, comes, we know we need to prune this, but we don't do it yet. We keep this decision in memory. We do this up to 510 and then finally delete 501-510 all at once. This repeats for 511-520 and so on.

  1. This is blocked on the next IAVL release which includes DeleteVersions and the removal of the pruning behavior introduced in IAVL 0.13.

/cc @ethanfrey @erikgrinaker


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

@alexanderbez alexanderbez added S:blocked Status: Blocked WIP labels Jun 20, 2020
@codecov
Copy link

codecov bot commented Jun 21, 2020

Codecov Report

Merging #6475 into master will increase coverage by 0.71%.
The diff coverage is 61.77%.

@@            Coverage Diff             @@
##           master    #6475      +/-   ##
==========================================
+ Coverage   55.60%   56.31%   +0.71%     
==========================================
  Files         457      470      +13     
  Lines       27440    28178     +738     
==========================================
+ Hits        15257    15869     +612     
- Misses      11083    11187     +104     
- Partials     1100     1122      +22     

@alexanderbez
Copy link
Contributor Author

What happens if I restart the node at height 505? Will it remember that heights 501-505 are to be pruned?

In the current design, no, as it's meant to be dead-simple using an ephemeral map. If interval is small enough (default 10, but could be 100), it should not be a problem. If the node ends up not pruning a few blocks, no harm really, right?

An alternative design might be to simply call AvailableVersions() on the IAVL store and prune the relevant versions;

Can you elaborate? Do you mean retroactively calling that method on start and remove anything we missed since (re)start?

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jun 22, 2020

What happens if I restart the node at height 505? Will it remember that heights 501-505 are to be pruned?

In the current design, no, as it's meant to be dead-simple using an ephemeral map. If interval is small enough (default 10, but could be 100), it should not be a problem. If the node ends up not pruning a few blocks, no harm really, right?

I'll leave that for the SDK team to judge, but as a user I might be surprised if it's not pruning what it's supposed to.

An alternative design might be to simply call AvailableVersions() on the IAVL store and prune the relevant versions;

Can you elaborate? Do you mean retroactively calling that method on start and remove anything we missed since (re)start?

If the pruning simply scans available versions and deletes those that match the pruning predicate then it will always remove the stuff that it's supposed to - after a pruning run the entire database is consistent with the current pruning settings.

Of course, it needs to be careful so that it doesn't accidentally remove everything if the previous and current settings do not have any overlap at all - I guess KeepRecent > 0 makes sure of that, but changing from e.g. KeepEvery:71 to KeepEvery:199 (both prime) would delete all historical versions beyond the most recent.

@alexanderbez
Copy link
Contributor Author

That makes sense @erikgrinaker, although, for the concern you've laid out, I would be wary of retroactively applying a new pruning strategy to all previous heights. In addition, it could have very costly IO.

I'll leave that for the SDK team to judge, but as a user I might be surprised if it's not pruning what it's supposed to.

True, but we can clearly document this. To keep the mental model simple, I would have heights pruned based on the strategy used at the time of the height being committed. In order to do this, we could persist the heights to prune during Commit. During init, we would read them and if not empty, remove them.

What do you think about this approach? I would either consider this or leave as-is and document this corner-case.

@erikgrinaker
Copy link
Contributor

That makes sense, your call as to whether they should be persisted or not.

Copy link
Member

@AdityaSripal AdityaSripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few nits and doc comments

store/iavl/store.go Outdated Show resolved Hide resolved
store/rootmulti/store.go Show resolved Hide resolved
store/rootmulti/store.go Outdated Show resolved Hide resolved
@alexanderbez
Copy link
Contributor Author

alexanderbez commented Jun 22, 2020

@AdityaSripal @erikgrinaker I've updated the PR a bit to now flush pruneHeights on Commit, which is read and if non-empty, populated on loadVersion. This way, we don't miss any heights on restart.

The overhead in storing []int64 should be minimal. After this is backported, I would recommend using gob or proto for all metadata and remove Amino.

@alexanderbez alexanderbez merged commit 4716260 into master Jun 22, 2020
@alexanderbez alexanderbez deleted the bez/pruning-refactor branch June 22, 2020 20:31
@ethanfrey
Copy link
Contributor

Nice one @alexanderbez

As to backporring. The proper semantic versioning thing would be to release 0.38 + iavl fixes as 0.39, which is an easy code upgrade from 0.38.

And then stargate "the release formerly known as 0.39" could be 0.40

@ethanfrey
Copy link
Contributor

Afaik, there are only testnets on 0.38 and no one has brought it to production

@zmanian
Copy link
Member

zmanian commented Jun 22, 2020

Based on users of latests TM KMS alpha, we have Band, Kava, Commercio and Desmos as mainnets using 0.38.

@Codegnosis
Copy link

At Unification (FUND), we have also deployed v0.38 on our MainNet

@alexanderbez
Copy link
Contributor Author

Mhhh, in that case you may as well wait for the full 0.39. I was under this impression 0.38.5 would just have these fixes and that's it. If we're going to cut 0.39 off of master after this PR, I believe we may as well wait then and not even bother backporting.

@zmanian
Copy link
Member

zmanian commented Jun 23, 2020

Given that most of protobuf conversion is on master, I think the 0.38 networks will want a backport.

@ethanfrey
Copy link
Contributor

If we're going to cut 0.39 off of master after this PR, I believe we may as well wait then and not even bother backporting.

Huh? There are tons of open issues and PRs for 0.39 milestone: https://github.com/cosmos/cosmos-sdk/issues?q=is%3Aopen+is%3Aissue+milestone%3Av0.39

And if "IBC 1.0" is supposed to be part of that, even more issues: https://github.com/cosmos/cosmos-sdk/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22v1.0+LTS%22

I think there is no sense in cutting a release off master until all that is stabilized. Best to have a stable point to upgrade to.

@ethanfrey
Copy link
Contributor

Based on users of latests TM KMS alpha, we have Band, Kava, Commercio and Desmos as mainnets using 0.38.

This is a good point, then any 0.38.5 / Launchpad release should be easy to upgrade to from exisiting 0.38.x networks. Ideally they just restart with a new binary. If it is trickier, then we should test that and provide a path forward (via the upgrade module?) but ideally avoid any changes that break state on disk

@alexanderbez
Copy link
Contributor Author

alexanderbez commented Jun 23, 2020

I think there is no sense in cutting a release off master until all that is stabilized. Best to have a stable point to upgrade to.

Right, I agree. Hence, I'm confused by your original statement:

As to backporring. The proper semantic versioning thing would be to release 0.38 + iavl fixes as 0.39, which is an easy code upgrade from 0.38.

EDIT: Nvm, I understand what you mean. This is an option, one I'm not going to make the call on. I would love for others to weigh in here. Ideally, we just accept 0.38.5 with some client API breaking changes (very minor) or we cut 0.39 and have to rework our milestones and projects :-/

@ethanfrey
Copy link
Contributor

Nvm, I understand what you mean. This is an option, one I'm not going to make the call on.

Fair enough. I have no strong opinion against 0.38.5, just pointing out what the "proper way" to handle this is. Everyone is looking forward to 0.39 so a renumbering would also be confusing

@alexanderbez
Copy link
Contributor Author

Totally understandable. Your suggestion is certainly the "correct" approach.

alexanderbez added a commit that referenced this pull request Jun 24, 2020
@clevinson clevinson mentioned this pull request Jul 6, 2020
4 tasks
larry0x pushed a commit to larry0x/cosmos-sdk that referenced this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S:blocked Status: Blocked
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants