Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add state sync support #823

Merged
merged 22 commits into from
May 4, 2022
Merged

Add state sync support #823

merged 22 commits into from
May 4, 2022

Conversation

ethanfrey
Copy link
Member

@ethanfrey ethanfrey commented Apr 27, 2022

This replaces #816 incorporating much of the feedback.

Main changes:

  • Iterates over codes in the iavl tree to get proper list
  • compresses wasm before sending over snapshot item
  • skips duplicates (multiple codes with same byte code)
  • uses wasmvm.Create() on restore, which will precompile the code
  • pins all proper codes in the cache, just like on normal startup
  • Play better with other extensions - done in 7a29351

TODO:

  • unit tests

As a side, I pull the compress logic (in cli/utils) and the uncompress logic (from keeper) into a common package. That could be it's own PR, if someone really cares, I guess I could cherry pick 71ff81e out

// we should return it an let the manager handle this one
payload := item.GetExtensionPayload()
if payload == nil {
return item, nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihuang If I understand #816 (comment) correctly, then this is the correct behavior. Can you confirm or correct me?

Copy link
Contributor

@giansalex giansalex Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not a payload, it is because another module begins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you, I need to run finalise here as well.


func (ws *WasmSnapshotter) Snapshot(height uint64, protoWriter protoio.Writer) error {
var rerr error
ctx := sdk.NewContext(ws.cms, tmproto.Header{}, false, log.NewNopLogger())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

item := snapshot.SnapshotItem{}
err := protoReader.ReadMsg(&item)
if err == io.EOF {
return snapshot.SnapshotItem{}, finalize(ctx, ws.wasm)
Copy link
Contributor

@giansalex giansalex Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be other modules after wasm snapshotter, and this condition will not occur.

Copy link
Contributor

@assafmo assafmo Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then how would you exit the loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when SnaphostItem is not a payload

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay, so this condition is still needed and it was just a general comment.

Copy link
Contributor

@giansalex giansalex Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, is necessary, but finalize(ctx, ws.wasm) will not run in some scenarios

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed this for the other case we end without error.
Please confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanfrey yes, ok

Copy link

@yihuang yihuang Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The multitstore snapshotter handle the finalization at the end of loop, and use break here and when payload == nil.

@ethanfrey
Copy link
Member Author

Thanks for the reviews, I made some updated

@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

Merging #823 (1253c8e) into main (23c75b1) will increase coverage by 0.34%.
The diff coverage is 55.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #823      +/-   ##
==========================================
+ Coverage   59.25%   59.60%   +0.34%     
==========================================
  Files          51       52       +1     
  Lines        5898     5944      +46     
==========================================
+ Hits         3495     3543      +48     
+ Misses       2150     2139      -11     
- Partials      253      262       +9     
Impacted Files Coverage Δ
app/test_helpers.go 0.84% <0.00%> (+0.08%) ⬆️
x/wasm/ioutils/utils.go 71.42% <ø> (ø)
app/app.go 87.65% <33.33%> (-1.30%) ⬇️
x/wasm/client/cli/tx.go 25.00% <66.66%> (ø)
x/wasm/keeper/snapshotter.go 66.66% <66.66%> (ø)
x/wasm/ioutils/ioutil.go 100.00% <100.00%> (ø)
x/wasm/keeper/keeper.go 88.09% <100.00%> (+0.70%) ⬆️

@ethanfrey
Copy link
Member Author

I used this line from @giansalex to properly load from a historical height: https://github.com/disperze/wasmd/blob/state-sync-sdk-45.2/x/wasm/keeper/snapshot.go#L49

However, all the tests panic when I try to commit the state before snapshotting (so I have a height to read from). It keeps saying the version was already committed. I spent 30 minutes digging in the sdk, but remain lost. Maybe someone has insights.

See 7663d6f and error messages

@ethanfrey ethanfrey marked this pull request as ready for review April 28, 2022 14:45
@ethanfrey ethanfrey requested a review from alpe as a code owner April 28, 2022 14:45
@ethanfrey
Copy link
Member Author

I think this is about as good as I can do.

Happy for reviews. And if anyone wants to make a PR on top to address the historical height issue (or add app level tests), I would be happy to merge it in. Otherwise, I would revisit this next week Tues/Wed and try to get this in main.

Also, very happy for any manual testing on this. I definitely need that before tagging the release, but would merge before that if possible.

Copy link
Member

@webmaster128 webmaster128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

x/wasm/ioutils/utils.go Show resolved Hide resolved
x/wasm/keeper/snapshotter.go Outdated Show resolved Hide resolved
return sdkerrors.Wrap(types.ErrCreateFailed, err.Error())
}

// FIXME: check which codeIDs the checksum matches??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which code ID is not important. But ensuring the checksum exists in the codes list at all makes sense. Otherwise a malicious node could send all kind of unrelated Wasm that gets stored and compiled, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, they can send us anything.
And at the point we have the checksum to compare it has been stored and compiled.

I guess this check would only allow one useless wasm.

I am more interested in asserting that all needed wasm has been synced. The biggest problem is a sync where some wasm file is missing and the node crashes later when that contract is called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this check would only allow one useless wasm.

What prevents the other node from sending us thousands of Wasm blobs that we decompress, store and compile but that are not part of the blockchain at all?

I am more interested in asserting that all needed wasm has been synced. The biggest problem is a sync where some wasm file is missing and the node crashes later when that contract is called.

Right, the codes missing issue is mentioned in the finalize method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave this for a follow up PR, nice to have check. I would love to get a full-node integration test on current state

// Since codeIDs and wasm are immutible, it is never wrong to return new wasm data than the
// user requests
// ------
// cacheMS, err := ws.cms.CacheMultiStoreWithVersion(int64(height))
Copy link

@yihuang yihuang Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't fix the storage at the height, would different nodes generate different snapshots, because their current block height maybe different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that might make result in 2 nodes creating a different snapshot for the same height, which will make a client reject the snapshot.

@yihuang
Copy link

yihuang commented Apr 29, 2022

FYI, I just proposed an API change to cosmos-sdk to make the snapshotter api safer to use: cosmos/cosmos-sdk#11825

Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to make myself more familiar with the internals of the snapshot package. Beside some minor comments, your code looks very good.

)

/*
API to implement:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of in line doc. Personally, I don't think it adds more value than a code reference, like this:

var (
_ snapshot.Snapshotter = &WasmSnapshotter{}
 _ snapshot.ExtensionSnapshotter = &WasmSnapshotter{}
 )

Doc needs to be maintained, while the code reference gives you the latest with 1 click.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtensionSnapshotter inherits from Snapshotter so that would be enough as a ref

*/

// Format 1 is just gzipped wasm byte code for each item payload. No protobuf envelope, no metadata.
const SnapshotFormat = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good description. To render nice Godoc, it should start with the element name, In this case SnapshotFormat

func (ws *WasmSnapshotter) Restore(
height uint64, format uint32, protoReader protoio.Reader,
) (snapshot.SnapshotItem, error) {
if format == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if format == 1 {
if format == SnapshotFormat {

}
initMsgBz, err := json.Marshal(initMsg)
require.NoError(t, err)
contractAddr, _, err := keepers.ContractKeeper.Instantiate(ctx, codeID, creator, nil, initMsgBz, "demo contract 1", deposit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to instantiate or query here


recovery := NewWasmSnapshotter(newKeepers.MultiStore, newKeepers.WasmKeeper)
_, err = recovery.Restore(100, 1, protoio.NewFullReader(&buf, buf.Len()))
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the restored wasm file should to be confirmed. Can you call k.wasmVM.GetCode(hash) with the original hash?

return true
}

compressedWasm, err := ioutils.GzipIt(wasmBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found:

The current version 1 snapshot format is a zlib-compressed, length-prefixed Protobuf stream of cosmos.base.store.v1beta1.SnapshotItem messages, split into chunks at exact 10 MB byte boundaries.

https://github.com/cosmos/cosmos-sdk/tree/main/snapshots#snapshot-format

@ethanfrey ethanfrey merged commit f2c2a58 into main May 4, 2022
@ethanfrey ethanfrey deleted the add-state-sync branch May 4, 2022 08:30
@ethanfrey ethanfrey mentioned this pull request May 4, 2022
@assafmo
Copy link
Contributor

assafmo commented May 4, 2022

Glad to see this merged! Did you find what caused ws.cms.CacheMultiStoreWithVersion(int64(height)) not to work at first?

@ethanfrey
Copy link
Member Author

It worked. Just not with our test code.

Alex used a different test setup (the full app setup) and it works now

Cashmaney added a commit to scrtlabs/SecretNetwork that referenced this pull request May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants