-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add a helper for publishing iteration results to chain storage #5432
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broad question: is packages/vats/src/lib-chainPublish.js the right place for this code, or should it be in some other package?
* @param {ReturnType<typeof import('./lib-chainStorage.js').makeChainStorageRoot>} chainStorageNode | ||
* @param {{ timerService: ERef<TimerService>, serialize?: (obj: any) => string }} powers | ||
*/ | ||
export async function publishToChainNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally envisioned as a function that accepts a chainStorage node and returns a subscription kit, but upon digging in it seemed more useful to me for it to consume from an existing iterable and return nothing. Still another option would be to make it effectively a tee, but I think it makes more sense to have the caller compose e.g. getKey()
and the base iterable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right abstraction. It crosses layers and combines concerns in a muddy way. What are your trying to improve wrt the sketch in #5400?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What layers do you think are being crossed and what concerns are being combined? Transmitting arbitrary data into a chainStorage node seems very respectful of layers and independent concerns. As for #5400, I looked it over carefully but didn't see anything to address the specific requirements of publishing to chain storage... am I just missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What layers do you think are being crossed and what concerns are being combined?
I guess what jumps out at me looking at the helper is it is trying to combine the iterator observation with the block batching. I think that the block batching belongs in the cosmic-swingset layer, and the iteration client is handled by #5400.
Transmitting arbitrary data into a chainStorage node seems very respectful of layers and independent concerns.
You're right that it needs to be done at a lower layer than #5400, but it should be even lower than JS entirely. I'm trying hard to "put the code where the data is". Since it is a cosmic-swingset concern (touching block height, event log, and merkle tree), it belongs outside of vats. I appreciate you're trying to respect peoples' time, but I would really like it if you would reach out if something about the layering doesn't make sense or seems unnecessarily weird (e.g. the chainTimer hack).
As for #5400, I looked it over carefully but didn't see anything to address the specific requirements of publishing to chain storage... am I just missing something?
Here is an adapter that explains how #5400 accomplishes these platform concerns, presuming that the batching is solved at the cosmic-swingset layer:
// adapt chainStorageNode to makeMarshalSubscriber
const publisher = Far('chainPublisherNode', {
publish: jsonable => E(chainStorageNode).setValue(JSON.stringify(jsonable)),
getPublisherDetails: () => { chainStorageKey: E(chainStorageNode).getKey() },
});
// Then something like:
makeMarshalSubscriber(observeEach(subscription), publisher);
Is that clearer now? I'll be working on splitting x/pubstore
Golang module from x/swingset
. I want to implement cosmos-specific publishing concerns there, such as batching, history, and event logging.
With some polish and moving the code around until we have minimal interfaces, I think this could be quite good.
E(timerService) | ||
.delay(1n) | ||
.then(() => { | ||
isNewBlock = true; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right/best way to detect a new block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's precedent for it. I'm not sure that it's guaranteed to detect a new block, but I think it's pretty close in practice. Does anything depend on this happening exactly once per block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detecting a new block should not be done in js, rather this functionality belongs in Golang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything depend on this happening exactly once per block?
@dckc No, but it would be a problem for this to happen more than once per block because then results produced earlier in the block could be lost.
this functionality belongs in Golang
@michaelfig How exactly would that work, given that many vats will have chain publishers that need to purge old results? I guess we could create a new vat that somehow gets end-of-block messages and is the sole source of subscriptions that publish to chain storage, but a) I'd need help with the first part, and b) I'm not sure using such a vat would actually be an improvement over using timerService
(although I'm willing to believe it might be—it would really help to see the code that actually needs chainStorage-backed subscriptions rather than guessing from the bottom up like this).
// To avoid loss when detecting the new block *after* already consuming | ||
// results produced within it, we associate each result with an index and | ||
// include "old" results in the batch. | ||
// Downstream consumers are expected to deduplicate by index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a way around this, because there's no way to know when the timerService delay resolves (or at any other time, for that matter) which results have actually been published in a block vs. which are still pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have control of the chain behaviour. We should solve it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm used to new features coming with tests. Is there a reason to skip it here? Is it cost-prohibitive?
No, I just wanted to get this up for review ASAP. I'm adding tests now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dckc Basic tests added, working on advanced tests now.
}, | ||
}; | ||
|
||
// Create an async iterable with a function for supplying results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any existing code for creating test doubles, particularly timerService and/or an async iterable that gets fed values asynchronously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a manual timer in packages/zoe
somewhere. I'm not familiar with "test doubles" so I'm not sure how relevant that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Test double" is the general term encompassing mocks/spies/stubs/etc.—implementations used as stand-ins to facilitate testing: https://en.wikipedia.org/wiki/Test_double
import { publishToChainNode } from '../src/lib-chainPublish.js'; | ||
|
||
test('publishToChainNode', async t => { | ||
// eslint-disable-next-line no-use-before-define |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason to break from convention here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find having to slog through a test-double library before getting to the actual tests makes the file much less readable. And the point will be moot if the functionality of getMockInputs
can be moved into a test-helpers module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find having to slog through a test-double library before getting to the actual tests makes the file much less readable.
That sounds like you disagree with the convention, not like there is some reason to break from it in this case. I appreciate the appeal of successive elaboration (I wrote python that way for decades...) but no-use-before-define
is not only something we inherited from airbnb style; it's a considered opinion by @erights and co that goes back at least as far as 2004: The Power of Irrelevance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I disagree with the convention in general, and here in particular strongly enough to explicitly deviate from it. But I'll change this code to conform with it if you think I should, and will read the essay regardless.
publishToChainNode(customAsyncIterable, storageNode, { timerService }); | ||
|
||
supplyIterationResult('foo'); | ||
await E(Promise).resolve(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I needed a sufficient delay to match the code under test and didn't like the approach of appending .then()
s to something like Promise.resolve()
until I observed progress.
E(timerService) | ||
.delay(1n) | ||
.then(() => { | ||
isNewBlock = true; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detecting a new block should not be done in js, rather this functionality belongs in Golang.
* @param {ReturnType<typeof import('./lib-chainStorage.js').makeChainStorageRoot>} chainStorageNode | ||
* @param {{ timerService: ERef<TimerService>, serialize?: (obj: any) => string }} powers | ||
*/ | ||
export async function publishToChainNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right abstraction. It crosses layers and combines concerns in a muddy way. What are your trying to improve wrt the sketch in #5400?
// To avoid loss when detecting the new block *after* already consuming | ||
// results produced within it, we associate each result with an index and | ||
// include "old" results in the batch. | ||
// Downstream consumers are expected to deduplicate by index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have control of the chain behaviour. We should solve it there.
}, | ||
async delete() { | ||
assert(key !== rootKey); | ||
// A 'set' with no value deletes a key if it has no children, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue has been resolved in Golang. Please remove the race and make this a direct wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in #5385 (comment) the problem is solved and merged. I think you and @warner may be misunderstanding the intended semantics (implemented by #5394):
- nodes with empty data are not stored in the tree, and unreferenced path linkage is cleared when emptying a node
- path linkage of nodes with nonempty data is always present, and when an empty node is filled any missing path linkage is built
The linkage information is just used to enumerate children. To demonstrate the separation between linkage and data, that linkage information is kept in swingset/path
Cosmos IAVL key prefix, unlike the swingset/data
key prefix.
Please remove this racy workaround. It is unnecessary, and the only reason I'm requesting changes.
|
||
function makeChainStorageNode(key) { | ||
const node = { | ||
getKey() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! The composer in #5400 can pass this method through directly.
The middleware belongs here, but the composer used by contracts belongs in |
a34f926
to
dd1d898
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think lib-chainPublish.js
should be done in Golang, but otherwise this PR looks good.
... except I am requesting changes to remove the non-robust and unnecessary child deletion test.
}, | ||
async delete() { | ||
assert(key !== rootKey); | ||
// A 'set' with no value deletes a key if it has no children, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in #5385 (comment) the problem is solved and merged. I think you and @warner may be misunderstanding the intended semantics (implemented by #5394):
- nodes with empty data are not stored in the tree, and unreferenced path linkage is cleared when emptying a node
- path linkage of nodes with nonempty data is always present, and when an empty node is filled any missing path linkage is built
The linkage information is just used to enumerate children. To demonstrate the separation between linkage and data, that linkage information is kept in swingset/path
Cosmos IAVL key prefix, unlike the swingset/data
key prefix.
Please remove this racy workaround. It is unnecessary, and the only reason I'm requesting changes.
* @param {ReturnType<typeof import('./lib-chainStorage.js').makeChainStorageRoot>} chainStorageNode | ||
* @param {{ timerService: ERef<TimerService>, serialize?: (obj: any) => string }} powers | ||
*/ | ||
export async function publishToChainNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What layers do you think are being crossed and what concerns are being combined?
I guess what jumps out at me looking at the helper is it is trying to combine the iterator observation with the block batching. I think that the block batching belongs in the cosmic-swingset layer, and the iteration client is handled by #5400.
Transmitting arbitrary data into a chainStorage node seems very respectful of layers and independent concerns.
You're right that it needs to be done at a lower layer than #5400, but it should be even lower than JS entirely. I'm trying hard to "put the code where the data is". Since it is a cosmic-swingset concern (touching block height, event log, and merkle tree), it belongs outside of vats. I appreciate you're trying to respect peoples' time, but I would really like it if you would reach out if something about the layering doesn't make sense or seems unnecessarily weird (e.g. the chainTimer hack).
As for #5400, I looked it over carefully but didn't see anything to address the specific requirements of publishing to chain storage... am I just missing something?
Here is an adapter that explains how #5400 accomplishes these platform concerns, presuming that the batching is solved at the cosmic-swingset layer:
// adapt chainStorageNode to makeMarshalSubscriber
const publisher = Far('chainPublisherNode', {
publish: jsonable => E(chainStorageNode).setValue(JSON.stringify(jsonable)),
getPublisherDetails: () => { chainStorageKey: E(chainStorageNode).getKey() },
});
// Then something like:
makeMarshalSubscriber(observeEach(subscription), publisher);
Is that clearer now? I'll be working on splitting x/pubstore
Golang module from x/swingset
. I want to implement cosmos-specific publishing concerns there, such as batching, history, and event logging.
With some polish and moving the code around until we have minimal interfaces, I think this could be quite good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @gibson042, I'm happy with the direction we settled on. I'm removing my request for changes.
This is a draft pending completion of makePublishKit. Once done, it will be updated to export a function from the notifier package that accepts a publish-kit subscriber and a [chain] storage node and returns an object exposing both the subscriber interface (for getting iteration results with configurable lossiness) and the node's storage key (for supporting off-chain consumption). |
Hi @gibson042. @dckc and I plumbed this feature through using #5400. I'd appreciate it if you had a look at that PR, and if we're missing any features from |
closes: #5353
refs: #5385 (diff)
Description
Adds a
publishToChainNode
helper for consuming iteration results and losslessly publishing them to chain storage.Security Considerations
We should be mindful about publishing too much data, but some redundancy seems necessary to avoid loss.
Documentation Considerations
Nothing specific.
Testing Considerations
TypeScript gave me some trouble with mocks. 😕