-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add state sync support #823
Conversation
x/wasm/keeper/snapshotter.go
Outdated
// we should return it an let the manager handle this one | ||
payload := item.GetExtensionPayload() | ||
if payload == nil { | ||
return item, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yihuang If I understand #816 (comment) correctly, then this is the correct behavior. Can you confirm or correct me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not a payload, it is because another module begins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you, I need to run finalise here as well.
x/wasm/keeper/snapshotter.go
Outdated
|
||
func (ws *WasmSnapshotter) Snapshot(height uint64, protoWriter protoio.Writer) error { | ||
var rerr error | ||
ctx := sdk.NewContext(ws.cms, tmproto.Header{}, false, log.NewNopLogger()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
x/wasm/keeper/snapshotter.go
Outdated
item := snapshot.SnapshotItem{} | ||
err := protoReader.ReadMsg(&item) | ||
if err == io.EOF { | ||
return snapshot.SnapshotItem{}, finalize(ctx, ws.wasm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there may be other modules after wasm snapshotter, and this condition will not occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then how would you exit the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when SnaphostItem is not a payload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay, so this condition is still needed and it was just a general comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, is necessary, but finalize(ctx, ws.wasm)
will not run in some scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I addressed this for the other case we end without error.
Please confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanfrey yes, ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The multitstore snapshotter handle the finalization at the end of loop, and use break here and when payload == nil
.
This helps state sync to track the WASM files too, as they're not in IAVL
7a29351
to
d2b0fb5
Compare
Thanks for the reviews, I made some updated |
Codecov Report
@@ Coverage Diff @@
## main #823 +/- ##
==========================================
+ Coverage 59.25% 59.60% +0.34%
==========================================
Files 51 52 +1
Lines 5898 5944 +46
==========================================
+ Hits 3495 3543 +48
+ Misses 2150 2139 -11
- Partials 253 262 +9
|
I used this line from @giansalex to properly load from a historical height: https://github.com/disperze/wasmd/blob/state-sync-sdk-45.2/x/wasm/keeper/snapshot.go#L49 However, all the tests panic when I try to commit the state before snapshotting (so I have a height to read from). It keeps saying the version was already committed. I spent 30 minutes digging in the sdk, but remain lost. Maybe someone has insights. See 7663d6f and error messages |
I think this is about as good as I can do. Happy for reviews. And if anyone wants to make a PR on top to address the historical height issue (or add app level tests), I would be happy to merge it in. Otherwise, I would revisit this next week Tues/Wed and try to get this in main. Also, very happy for any manual testing on this. I definitely need that before tagging the release, but would merge before that if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
return sdkerrors.Wrap(types.ErrCreateFailed, err.Error()) | ||
} | ||
|
||
// FIXME: check which codeIDs the checksum matches?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which code ID is not important. But ensuring the checksum exists in the codes list at all makes sense. Otherwise a malicious node could send all kind of unrelated Wasm that gets stored and compiled, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, they can send us anything.
And at the point we have the checksum to compare it has been stored and compiled.
I guess this check would only allow one useless wasm.
I am more interested in asserting that all needed wasm has been synced. The biggest problem is a sync where some wasm file is missing and the node crashes later when that contract is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this check would only allow one useless wasm.
What prevents the other node from sending us thousands of Wasm blobs that we decompress, store and compile but that are not part of the blockchain at all?
I am more interested in asserting that all needed wasm has been synced. The biggest problem is a sync where some wasm file is missing and the node crashes later when that contract is called.
Right, the codes missing issue is mentioned in the finalize method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave this for a follow up PR, nice to have check. I would love to get a full-node integration test on current state
3e512fd
to
cdcb18d
Compare
x/wasm/keeper/snapshotter.go
Outdated
// Since codeIDs and wasm are immutible, it is never wrong to return new wasm data than the | ||
// user requests | ||
// ------ | ||
// cacheMS, err := ws.cms.CacheMultiStoreWithVersion(int64(height)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't fix the storage at the height, would different nodes generate different snapshots, because their current block height maybe different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that might make result in 2 nodes creating a different snapshot for the same height, which will make a client reject the snapshot.
FYI, I just proposed an API change to cosmos-sdk to make the snapshotter api safer to use: cosmos/cosmos-sdk#11825 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to make myself more familiar with the internals of the snapshot package. Beside some minor comments, your code looks very good.
x/wasm/keeper/snapshotter.go
Outdated
) | ||
|
||
/* | ||
API to implement: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of in line doc. Personally, I don't think it adds more value than a code reference, like this:
var (
_ snapshot.Snapshotter = &WasmSnapshotter{}
_ snapshot.ExtensionSnapshotter = &WasmSnapshotter{}
)
Doc needs to be maintained, while the code reference gives you the latest with 1 click.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExtensionSnapshotter
inherits from Snapshotter
so that would be enough as a ref
*/ | ||
|
||
// Format 1 is just gzipped wasm byte code for each item payload. No protobuf envelope, no metadata. | ||
const SnapshotFormat = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good description. To render nice Godoc, it should start with the element name, In this case SnapshotFormat
x/wasm/keeper/snapshotter.go
Outdated
func (ws *WasmSnapshotter) Restore( | ||
height uint64, format uint32, protoReader protoio.Reader, | ||
) (snapshot.SnapshotItem, error) { | ||
if format == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if format == 1 { | |
if format == SnapshotFormat { |
x/wasm/keeper/snapshotter_test.go
Outdated
} | ||
initMsgBz, err := json.Marshal(initMsg) | ||
require.NoError(t, err) | ||
contractAddr, _, err := keepers.ContractKeeper.Instantiate(ctx, codeID, creator, nil, initMsgBz, "demo contract 1", deposit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to instantiate or query here
x/wasm/keeper/snapshotter_test.go
Outdated
|
||
recovery := NewWasmSnapshotter(newKeepers.MultiStore, newKeepers.WasmKeeper) | ||
_, err = recovery.Restore(100, 1, protoio.NewFullReader(&buf, buf.Len())) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the restored wasm file should to be confirmed. Can you call k.wasmVM.GetCode(hash)
with the original hash?
return true | ||
} | ||
|
||
compressedWasm, err := ioutils.GzipIt(wasmBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found:
The current version 1 snapshot format is a zlib-compressed, length-prefixed Protobuf stream of cosmos.base.store.v1beta1.SnapshotItem messages, split into chunks at exact 10 MB byte boundaries.
https://github.com/cosmos/cosmos-sdk/tree/main/snapshots#snapshot-format
Add snapshotter integration tests and minor cleanup
Glad to see this merged! Did you find what caused |
It worked. Just not with our test code. Alex used a different test setup (the full app setup) and it works now |
This replaces #816 incorporating much of the feedback.
Main changes:
TODO:
As a side, I pull the compress logic (in cli/utils) and the uncompress logic (from keeper) into a common package. That could be it's own PR, if someone really cares, I guess I could cherry pick 71ff81e out