-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor snapshotter #8072
refactor snapshotter #8072
Conversation
I followed up with a lot of code comment clarifications / godoc fixes, so it may be easier to review if I'm allowed to rebase all the fixup commits. Any objections from reviewers? |
a13849d
to
47f41a5
Compare
payloadWriter snapshots.ExtensionPayloadWriter | ||
} | ||
|
||
// SwingsetSnapshotter manages Swingset snapshots, ensuring insensitivity to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good place to document and/or reference external documentation of how a Swingset snapshot is structured and/or the general process for constructing one?
// InitiateSnapshot initiates a SwingStore snapshot for the given | ||
// height. If a snapshot is already in progress, this will fail. The snapshot | ||
// processing is delegated to the provided `snapshotTaker`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// InitiateSnapshot initiates a SwingStore snapshot for the given | |
// height. If a snapshot is already in progress, this will fail. The snapshot | |
// processing is delegated to the provided `snapshotTaker`. | |
// InitiateSnapshot synchronously verifies that there is not already a snapshot | |
// in progress and records the start of a new one for the provided block height, | |
// then launches a goroutine to asynchronously coordinate with JS for initiating | |
// and retrieving a snapshot, sending it to the provided `snapshotTaker` for processing, | |
// and ultimately releasing it. |
That the behavior includes retrieval and processing also suggests to me that "initiate" is not the best name, although a similar argument would probably apply to using the name "take" for a function that synchronously starts a process which finishes asynchronously. It might be best to just reify the separation such that e.g. call sites look like
if err := snapshotter.InitiateSnapshot(height); err != nil {
…
}
go snapshotter.FinishSnapshot(taker, exportOptions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shape is not possible. Let's chat
47f41a5
to
0583b45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in real time, I'd like to see a refactor in which the swing-store snapshotter communicates completion of its synchronous initiation to the main thread rather than directly driving the cosmos snapshot itself, which seems like an inversion of responsibility. But to reiterate, this need not be considered a blocking concern.
// This module abstract the handling of swing-store snapshots, also known as | ||
// swing-store imports/exports, and the necessary communication with the | ||
// JS side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have anywhere that describes how swing-store exports fit into the overall structure of cosmos state-sync snapshots? My recollection from our call is that there is some hook by which a module registers its integration(s) for cosmos to call when creating/loading/etc. a snapshot, and that the responsibility for creation is to produce an arbitrary number of protobuf "extension" entries with a common format version and for loading is to validate the format version of each extension entry and then consume it as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving up on the commit-by-commit review halfway through the 5th commit - too much back-and-forth - and publishing the comments so far. I'm going to update to see recent commits and review the PR in its entirety.
caacb17
to
90e0bfd
Compare
@michaelfig @JimLarson I believe I've addressed all feedback. It ended up being extensive changes, and introduces a couple conflicts with master, so right now I squashed everything and will rebase / relayer everything. I don't expect any code to change until your new review. |
The rebase landed while I was in mid-review. I've submitted what was in-flight. Please address those comments as much as possible then please confirm that I have a stable target for review. |
I have added a sequence diagram of the snapshot creation process: https://github.com/Agoric/agoric-sdk/blob/mhofman/refactor-snapshotter/docs/architecture/state-sync.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed changes since last review.
a9a77e8
to
2d6fbfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've addressed all new feedback and then some, PTAL @JimLarson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at last! Thanks for applying all the previous feedback - I've verified that previous issues have been addressed to my satisfaction. Approval is conditional on applying the two comment suggestions here. One is trivial, the other explains why the dual-channel race (which is still present) is harmless.
cf97e7d
to
e08e8e0
Compare
Refactor block/snapshot synchronization
56a38b4
to
ead6730
Compare
refs: #6527
refs: #8025
refs: #8031
Best reviewed commit-by-commit
Description
In order to support the migrations needed for #8025 #8031, and to implement #6527, we need to enable using the snapshot pathways between golang and JS outside of the cosmos state-sync mechanism.
This PR splits the snapshotter into a
SwingStoreExportsHandler
module responsible for the communication and synchronization with JS, and a simplified cosmosExtensionSnapshotter
that uses the new SwingStoreExportsHandler hide the JS interactions.Given the goroutine synchronization and multistep process of performing a SwingStore snapshot, there is a decent amount of interleaving needed, which is now expressed as facets provided as arguments to methods representing each step:
SwingStoreExportsHandler
'sInitiateExport()
takes aSwingStoreExportEventHandler
, which has aExportInitiated()
method called from the goroutine it starts.ExportInitiated()
is provided aretrieveExport
callback. The event handler implemented by theExtensionSnapshotter
stores that callback and invokes it in itsSnapshotExtension()
method (internally invoked by the cosmos snapshot manager)retrieveExport()
processes the data received from JS, and creates aSwingStoreExportProvider
for it, which is provided as an argument to theExportRetrieved()
method of theSwingStoreExportEventHandler
ExportRetrieved()
is also implemented by theExtensionsSnapshotter
and uses thatprovider
to recover the snapshot artifacts and write them out as state-sync payloads.The new layering allows the
SwingStoreExportsHandler
to more easily notice errors happening during the retrieval phase, which are currently swallowed by cosmos's snapshot manager. It also introduces a way to explicitly await for the completion of a snapshot in progress, which will be needed for genesis export and the SwingStore shadow copy migration.Finally the swing-store import/export options are threaded through all the way into the
SwingStoreExportsHandler
interface, allowing different kind of snapshot artifacts to be handled, as needed for the future use cases above.Security Considerations
None
Scaling Considerations
None
Documentation Considerations
None
Testing Considerations
This is a refactor splitting an existing module. The tests covering that module were similarly split, and sometimes duplicated where appropriate.
With the split it may be possible to cover some more granular steps which could only be covered in aggregate before.
Upgrade Considerations
This PR does not change any behavior or stored data, as such it does not have any upgrade impact.