-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: slightly better marshal identification for diagnostics #9153
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Mar 27, 2024
erights
commented
Mar 27, 2024
@@ -74,11 +74,11 @@ export function buildSerializationTools(syscall, deviceName) { | |||
} | |||
|
|||
const m = makeMarshal(convertValToSlot, convertSlotToVal, { | |||
marshalName: `device:${deviceName}`, | |||
marshalName: `deviceTools:${deviceName}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix device:
is also used by deviceSlots.js
, so changing this one to be distinct.
erights
force-pushed
the
markm-marshal-marking
branch
from
March 27, 2024 20:35
4ccbdb0
to
5445ec8
Compare
erights
force-pushed
the
markm-marshal-marking
branch
from
March 28, 2024 00:08
5445ec8
to
975d67c
Compare
michaelfig
approved these changes
Mar 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy-peasy. Approved!
erights
force-pushed
the
markm-marshal-marking
branch
from
March 28, 2024 05:37
975d67c
to
f42e7ff
Compare
mergify bot
pushed a commit
that referenced
this pull request
May 19, 2024
closes: #9302 refs: #9125, #9126 #9153 #9154, #9280 ## Description Upgrade while suspended at `await` points! Uses membrane to log and replay everything that happened before each upgrade. In the first incarnation, somewhere, using a ***closed*** async function argument ```js const wrapperFunc = asyncFlow( zone, 'funcName`, async (...) => {... await ...; ...}, ); ``` then elsewhere, as often as you'd like ```js const outcomeVow = wrapperFunc(...); ``` For all these `asyncFlow` calls that happened in the first incarnation, in the first crank of all later incarnations ```js asyncFlow( zone, 'funcName`, async (...) => {... await ...; ...}, ); ``` with async functions that reproduce the original's logged behavior. In these later incarnations, you only need to capture the returned `wrapperFunc` if you want to create new activations. Regardless, the old activations continue. #### Future Growth I designed this PR so it could grow in the following ways: - TODO: The membrane should use the `HandledPromise` API to make proper remote presences and handled promises, so that the guest function can use `E` on objects or promises it receives from the host as expected. I commented out the additional ops needed for these: `doSend` and `checkSend`. - TODO: Currently, I assume that the host side only presents vows, not promises. However, imported remote promises can be stored durably, and be durably watched, so the membrane should be extended to handle those. - TODO: We currently impose the restriction that the guest cannot export to the host guest-created remotables or promises. (It can pass back to the host remotables or promises it received from the host.) I commented out the additional ops needed for these: `doCall`, `checkReturn` and `checkThrow`. I wrote the `bijection` and `equate` subsystems so that old durable host wrappers can be hooked back up on replay to the corresponding new guest remotables and promises. ### Security Considerations Nothing enforces that the argument async function is closed, i.e., does not capture (lexically "close over") any direct access to mutable state or ability to cause effects. If it does, it still cannot harm anything but itself. But it -- or its replayings -- may be more confusable, and so more vulnerable to confusion attacks. Since this entire framework itself is written as a helper (well, a huge helper) with no special privilege, it cannot be used to do anything that could not have otherwise been done. It is not a source of new authority. See caveats in Upgrade Considerations about isolation of effects following a replay failure. ### Scaling Considerations We assume that the total number of async functions, i.e., calls to `asyncFlow`, are low cardinality. This is essential to the design, in exactly the same sense as our assumption that exoClasses are low cardinality. The RAM cost is proportional to the number of these. The current implementation further assumes that the total number of activations of these replayable async functions are low cardinality. This will likely be a scaling problem at some point, but is unlikely to be painful for the initial expected use cases. The current implementation imposes two simplifying restrictions that allow us to relieve some of this memory pressure: Currently, the async function argument cannot make and export new remotables or promises to the host. Thus, once its outcomeVow is settled, its job is done. There is very little more it can do that is observable. Thus, once this outcome is settled, the activation is shut down and most of the memory it held onto is dropped. Of the activations not shut down, they must replay from the beginning in each incarnation. If a given activation has a long history of past activity, this can become expensive. How do we verify in CI that when an asyncFlow is in use & when it has completed, resource usage in RAM & on disk meet our expectations? The PR assumes `low cardinality` of asyncFlows. what scale is `low cardinality` - Is 10^3, 10&5? What is the risk if cardinality is too high? ### Documentation Considerations For the normal developer documentation, `asyncFlow` should make things simpler and more like "just JavaScript". The membrane translates between host vows and guest promises, so the async function can simply `await` on promises without needing the `when` from `@agoric/vow`. ### Testing Considerations This PR is structured as a tower of building blocks, where I unit tested each as I went, in bottom up order, in order to build with confidence. Currently, each of these building blocks is also very paranoid about internal consistency checking, so I'd get early indications if I made a mistake. Probably some of this internal consistency checking can be reduced over time, as we gain more static confidence. This PR is currently using the fake upgrade testing framework from the `@agoric/zone` package. This caused bug #9126. Instead, we need to redo all these tests with a real upgrade testing framework, like the one in bootstrapTests. See https://github.com/Agoric/agoric-sdk/blob/master/packages/boot/test/bootstrapTests/test-zcf-upgrade.ts ### Upgrade Considerations The point. In an reviving incarnation, if the async function argument of ```js asyncFlow( zone, 'funcName`, async (...) => {... await ...; ...}, ); ``` fails to recapitulate the logs of each of its activations, those activations will not have done any damage. They will simply be stuck, with a diagnostic available via ```js adminAsyncFlow.getFailures(), ``` To unstick these, so those stuck activations can continue to make progress, upgrade again using an async function argument that does reproduce the logged behavior of each of its activations. #### Caveat: Imperfect isolation of effects following a replay failure Once a replay failure is detected, we attempt to isolate the replaying activation from its outside world, and to also shut down its further execution as soon as possible. But it may be in the midst of activity with a non-empty stack. Following a replay failure, we have no way to preemptively terminate it without any further execution of the activation. This further execution may therefore be arbitrarily confused. We simply try to isolate it as much as possible, immediately revoking all access it had through the membrane to all authority to cause observable effects. However, - We do not consider `console` logging activity to be an observable effect. These might be caused by diagnostics emitted by this framework in response to its "isolated" confused behavior. - Because we do not consider `console` logging to be an observable effect, we also allow this as an exception to our closed function rule. Messages it sends directly to the console are not logged, and can differ without causing replay failure. During its post-replay-failure confused execution, it can still directly log to the console. - It is not resource limited, so its post-replay confused execution can accidentally engage in resource exhaustion attacks, including infinite loops. However, the vat as a whole is resource limited. An infinite loop will eventually crash the vat, which can then be recovered with yet another upgrade. - Because of metering, an activation that executed successfully in a previous incarnation might not replay correctly, even if it doesn't cause any explicit side-effects. That's because metering is a hidden side-effect of any execution.
mhofman
added a commit
that referenced
this pull request
Jun 14, 2024
This partially reverts commit f8ea4a8.
mergify bot
pushed a commit
that referenced
this pull request
Jun 14, 2024
refs: #9153 ## Description This partially reverts the behavioral change from `deviceTools` from #9153. ### Security Considerations None ### Scaling Considerations None ### Documentation Considerations None ### Testing Considerations None ### Upgrade Considerations DeviceTools are bundled into device bundles, so we want to minimize difference between code executing and code on disk.
mhofman
added a commit
that referenced
this pull request
Jun 20, 2024
refs: #9153 ## Description This partially reverts the behavioral change from `deviceTools` from #9153. ### Security Considerations None ### Scaling Considerations None ### Documentation Considerations None ### Testing Considerations None ### Upgrade Considerations DeviceTools are bundled into device bundles, so we want to minimize difference between code executing and code on disk.
mhofman
added a commit
that referenced
this pull request
Jun 22, 2024
refs: #9153 ## Description This partially reverts the behavioral change from `deviceTools` from #9153. ### Security Considerations None ### Scaling Considerations None ### Documentation Considerations None ### Testing Considerations None ### Upgrade Considerations DeviceTools are bundled into device bundles, so we want to minimize difference between code executing and code on disk.
mhofman
added a commit
that referenced
this pull request
Jun 26, 2024
Rebase todo: ``` # Branch fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487- label base-fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487- pick bcecf52 test(vow): add test of more vow upgrade scenarios pick d7135b2 test: switch vow test to run under xs for metering pick 99fccca test(vow): add test for resolving vow to external promise pick 6d3f88c test(vow): add test for vow based infinite vat ping pong pick c78ff0e test(vow): check vow consumers for busy loops or hangs pick 3c63cba fix(vow): prevent loops and hangs from watching promises pick 188c810 chore(vat-data): remove the deprecated `@agoric/vat-data/vow.js` pick 44a6d16 fix(vow): allow resolving vow to external promise label fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487- reset base-fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487- merge -C 4fca040 fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487- # fix(vow): make watch/when more robust against loops and hangs (#9487) # Branch ci-mergify-strip-merge-commit-HTML-comments-9499- label base-ci-mergify-strip-merge-commit-HTML-comments-9499- pick 63e21ab ci(mergify): strip merge commit HTML comments label ci-mergify-strip-merge-commit-HTML-comments-9499- reset base-ci-mergify-strip-merge-commit-HTML-comments-9499- merge -C 7b93671 ci-mergify-strip-merge-commit-HTML-comments-9499- # ci(mergify): strip merge commit HTML comments (#9499) # Pull request #9506 pick 249a5d4 fix(SwingSet): Undo deviceTools behavioral change from #9153 (#9506) # Pull request #9507 pick a19a964 fix(liveslots): promise watcher to cause unhandled rejection if no handler (#9507) # Branch feat-make-vat-localchain-resumable-9488- label base-feat-make-vat-localchain-resumable-9488- pick 76c17c6 feat: make vat-localchain resumable pick 40ccba1 fix(vow): correct the typing of `unwrap` pick 90e062c fix(localchain): work around TypeScript mapped tuple bug pick 3027adf fix(network): use new `ERef` and `FarRef` label feat-make-vat-localchain-resumable-9488- reset base-feat-make-vat-localchain-resumable-9488- merge -C 5856dc0 feat-make-vat-localchain-resumable-9488- # feat: make vat-localchain resumable (#9488) # Branch ci-mergify-clarify-queue-conditions-and-merge-conditions-9510- label base-ci-mergify-clarify-queue-conditions-and-merge-conditions-9510- pick 7247bd9 ci(mergify): clarify `queue_conditions` and `merge_conditions` label ci-mergify-clarify-queue-conditions-and-merge-conditions-9510- reset base-ci-mergify-clarify-queue-conditions-and-merge-conditions-9510- merge -C 30e56ae ci-mergify-clarify-queue-conditions-and-merge-conditions-9510- # ci(mergify): clarify `queue_conditions` and `merge_conditions` (#9510) # Branch fix-liveslots-cache-delete-does-not-return-a-useful-value-9509- label base-fix-liveslots-cache-delete-does-not-return-a-useful-value-9509- pick 42ea8a3 fix(liveslots): cache.delete() does not return a useful value label fix-liveslots-cache-delete-does-not-return-a-useful-value-9509- reset base-fix-liveslots-cache-delete-does-not-return-a-useful-value-9509- merge -C a2e54e1 fix-liveslots-cache-delete-does-not-return-a-useful-value-9509- # fix(liveslots): cache.delete() does not return a useful value (#9509) # Branch chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694- label base-chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694- pick 9930bd3 chore(swingset): re-enable test of unrecognizable orphan cleanup label chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694- reset base-chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694- merge -C bc53ef7 chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694- # chore(swingset): re-enable test of unrecognizable orphan cleanup (#8694) # Pull request #9508 pick 513adc9 refactor(internal): move async helpers using AggregateError to node (#9508) # Branch report-bundle-sizing-in-agoric-run-9503- label base-report-bundle-sizing-in-agoric-run-9503- pick 68af59c refactor: inline addRunOptions pick a0115ed feat: writeCoreEval returns plan pick bd0edcb feat: stat-bundle and stat-plan scripts pick 0405202 feat: agoric run --verbose pick 22b43da feat(stat-bundle): show CLI to explode the bundle label report-bundle-sizing-in-agoric-run-9503- reset base-report-bundle-sizing-in-agoric-run-9503- merge -C 7b30169 report-bundle-sizing-in-agoric-run-9503- # report bundle sizing in agoric run (#9503) # Branch ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511- label base-ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511- pick 5f3c1d1 test(boot): use a single bundle directory for all tests pick 50229bd ci(all-packages): split tests according to AVA recipe label ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511- reset base-ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511- merge -C 5375019 ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511- # ci(test-boot): split up test jobs via AVA recipe (#9511) # Pull request #9514 pick f908f89 fix: endow with original unstructured `assert` (#9514) # Pull request #9535 pick 989aa19 fix(swingset): log vat termination and upgrade better (#9535) # Branch types-zoe-setTestJig-param-type-optional-9533- label base-types-zoe-setTestJig-param-type-optional-9533- pick 426a3be types(zoe): setTestJig param type optional label types-zoe-setTestJig-param-type-optional-9533- reset base-types-zoe-setTestJig-param-type-optional-9533- merge -C bf9f03b types-zoe-setTestJig-param-type-optional-9533- # types(zoe): setTestJig param type optional (#9533) # Branch adopt-TypeScript-5-5-9476- label base-adopt-TypeScript-5-5-9476- pick 381b6a8 chore(deps): bump Typescript to 5.5 release label adopt-TypeScript-5-5-9476- reset base-adopt-TypeScript-5-5-9476- merge -C 0cfea88 adopt-TypeScript-5-5-9476- # adopt TypeScript 5.5 (#9476) # Branch retry-flaky-agoric-cli-integration-test-9550- label base-retry-flaky-agoric-cli-integration-test-9550- pick 2a68510 ci: retry agoric-cli integration test label retry-flaky-agoric-cli-integration-test-9550- reset base-retry-flaky-agoric-cli-integration-test-9550- merge -C c5c52ec retry-flaky-agoric-cli-integration-test-9550- # retry flaky agoric-cli integration test (#9550) # Pull Request #9556 pick 0af876f fix(vow): watcher args instead of context (#9556) # Pull Request #9561 pick a4f86eb fix(vow): handle resolution loops in vows (#9561) # Branch Restore-a3p-tests-9557- label base-Restore-a3p-tests-9557- pick d36382d chore(a3p): restore localchain test pick 5ff628e Revert "test: drop or clean-up old Tests" pick b5cf8bd fix(localchain): `callWhen`s return `PromiseVow` label Restore-a3p-tests-9557- reset base-Restore-a3p-tests-9557- merge -C c65915e Restore-a3p-tests-9557- # Restore a3p tests (#9557) # Pull Request #9559 pick 6073b2b fix(agoric): convey tx opts to `agoric wallet` and subcommands (#9559) # Branch explicit-heapVowTools-9548- label base-explicit-heapVowTools-9548- pick 100de68 feat!: export heapVowTools pick 8cb1ee7 refactor: use heapVowTools import pick 0ac6774 docs: vow vat utils pick 9128f27 feat: export heapVowE pick 3b0c8c1 refactor: use heapVowE pick 9b84bfa BREAKING CHANGE: drop V export pick 6623af5 chore(types): concessions to prepack label explicit-heapVowTools-9548- reset base-explicit-heapVowTools-9548- merge -C 4440ce1 explicit-heapVowTools-9548- # explicit heapVowTools (#9548) # Pull Request #9564 pick 44926a7 fix(bn-patch): fix bad html evasion (#9564) # Branch Fix-upgrade-behaviors-9526- label base-Fix-upgrade-behaviors-9526- pick ef1e0a2 feat: Upgrade Zoe pick e4cc97c Revert "fix(a3p-integration): workaround zoe issues" pick 84dd229 feat: repair KREAd contract on zoe upgrade pick cb77160 test: validate KREAd character purchase pick e1d961e test: move vault upgrade from test to use phase (#9536) label Fix-upgrade-behaviors-9526- reset base-Fix-upgrade-behaviors-9526- merge -C 8d05faf Fix-upgrade-behaviors-9526- # Fix upgrade behaviors (#9526) # Branch Support-for-snapshots-export-command-9563- label base-Support-for-snapshots-export-command-9563- pick 2a3976e refactor(cosmos): use DefaultBaseappOptions for newApp pick 84208e9 fix(cosmos): use dedicated dedicate app creator for non start commands pick 8c1a62d chore(cosmos): refactor cosmos command extension pick 4386f8e feat(cosmos): support snapshot export pick 2dabb52 test(a3p): add test for snapshots export and restore label Support-for-snapshots-export-command-9563- reset base-Support-for-snapshots-export-command-9563- merge -C 309c7e1 Support-for-snapshots-export-command-9563- # Support for `snapshots export` command (#9563) # Branch Swing-store-export-data-outside-of-genesis-file-9549- label base-Swing-store-export-data-outside-of-genesis-file-9549- pick f1eacbe fix(x/swingset): handle defer errors on export write pick 496a430 feat(cosmos): add hooking kv reader pick f476bd5 feat(cosmos): separate swing-store export data from genesis file pick 17a5374 test(a3p): add genesis fork acceptance test label Swing-store-export-data-outside-of-genesis-file-9549- reset base-Swing-store-export-data-outside-of-genesis-file-9549- merge -C 3aa5d66 Swing-store-export-data-outside-of-genesis-file-9549- # Swing-store export data outside of genesis file (#9549) # Branch Remove-scaled-price-authority-upgrade-9585- label base-Remove-scaled-price-authority-upgrade-9585- pick bce49e3 test: add test during upgradeVaults; vaults detect new prices pick 88f6500 test: repair test by dropping upgrade of scaledPriceAuthorities label Remove-scaled-price-authority-upgrade-9585- reset base-Remove-scaled-price-authority-upgrade-9585- merge -C 8376991 Remove-scaled-price-authority-upgrade-9585- # Remove scaled price authority upgrade (#9585) # Branch feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575- label base-feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575- pick 95174a2 feat(builders): non-ambient `strictPriceFeedProposalBuilder` in `priceFeedSupport.js` pick 5cc190d feat(app): establish mechanism for adding core proposals by `upgradePlan.name` pick 52f02b7 fix(builders): use proper `oracleBrand` subkey case pick ddc072d chore(cosmos): extract `app/upgrade.go` pick b3182a4 chore: fix error handling of upgrade vaults proposal pick ea568a2 fix: retry upgrade vaults price quote label feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575- reset base-feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575- merge -C cbe061c feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575- # feat: make software upgrade `coreProposals` conditional on upgrade plan name (#9575) ``` This is followed by a commit to remove the `orchestration` and `async-flow` packages from the release, as these are not baked in yet and are not deployed anyway.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes: #XXXX
refs: #9097 #9154
Description
Staged on #9154 to enable #9097 to be easily restaged on both.
Originally extracted from #9097. Motivated by the lack of self-identification of the marshal in
fakeVirtualSupport.js
. But as long as I was looking at that, made each of our existing marshal self-identifications a bit clearer. Gave each a unique prefix name and a distinct starting number.Security Considerations
none
Scaling Considerations
none
Documentation Considerations
none
Testing Considerations
nothing for testing per se.
Helps slightly with debugging.
Upgrade Considerations
these identifications do not show up in any durable data, so none.