Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a helper for publishing iteration results to chain storage #5432

Closed
wants to merge 18 commits into from

Conversation

gibson042
Copy link
Member

@gibson042 gibson042 commented May 25, 2022

closes: #5353
refs: #5385 (diff)

Description

Adds a publishToChainNode helper for consuming iteration results and losslessly publishing them to chain storage.

Security Considerations

We should be mindful about publishing too much data, but some redundancy seems necessary to avoid loss.

Documentation Considerations

Nothing specific.

Testing Considerations

TypeScript gave me some trouble with mocks. 😕

@gibson042 gibson042 added enhancement New feature or request SwingSet package: SwingSet labels May 25, 2022
@gibson042 gibson042 requested review from dckc and michaelfig May 25, 2022 20:07
@gibson042 gibson042 self-assigned this May 25, 2022
Copy link
Member Author

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broad question: is packages/vats/src/lib-chainPublish.js the right place for this code, or should it be in some other package?

* @param {ReturnType<typeof import('./lib-chainStorage.js').makeChainStorageRoot>} chainStorageNode
* @param {{ timerService: ERef<TimerService>, serialize?: (obj: any) => string }} powers
*/
export async function publishToChainNode(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally envisioned as a function that accepts a chainStorage node and returns a subscription kit, but upon digging in it seemed more useful to me for it to consume from an existing iterable and return nothing. Still another option would be to make it effectively a tee, but I think it makes more sense to have the caller compose e.g. getKey() and the base iterable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right abstraction. It crosses layers and combines concerns in a muddy way. What are your trying to improve wrt the sketch in #5400?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What layers do you think are being crossed and what concerns are being combined? Transmitting arbitrary data into a chainStorage node seems very respectful of layers and independent concerns. As for #5400, I looked it over carefully but didn't see anything to address the specific requirements of publishing to chain storage... am I just missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What layers do you think are being crossed and what concerns are being combined?

I guess what jumps out at me looking at the helper is it is trying to combine the iterator observation with the block batching. I think that the block batching belongs in the cosmic-swingset layer, and the iteration client is handled by #5400.

Transmitting arbitrary data into a chainStorage node seems very respectful of layers and independent concerns.

You're right that it needs to be done at a lower layer than #5400, but it should be even lower than JS entirely. I'm trying hard to "put the code where the data is". Since it is a cosmic-swingset concern (touching block height, event log, and merkle tree), it belongs outside of vats. I appreciate you're trying to respect peoples' time, but I would really like it if you would reach out if something about the layering doesn't make sense or seems unnecessarily weird (e.g. the chainTimer hack).

As for #5400, I looked it over carefully but didn't see anything to address the specific requirements of publishing to chain storage... am I just missing something?

Here is an adapter that explains how #5400 accomplishes these platform concerns, presuming that the batching is solved at the cosmic-swingset layer:

// adapt chainStorageNode to makeMarshalSubscriber
const publisher = Far('chainPublisherNode', {
  publish: jsonable => E(chainStorageNode).setValue(JSON.stringify(jsonable)),
  getPublisherDetails: () => { chainStorageKey: E(chainStorageNode).getKey() },
});

// Then something like:
makeMarshalSubscriber(observeEach(subscription), publisher);

Is that clearer now? I'll be working on splitting x/pubstore Golang module from x/swingset. I want to implement cosmos-specific publishing concerns there, such as batching, history, and event logging.

With some polish and moving the code around until we have minimal interfaces, I think this could be quite good.

Comment on lines 28 to 32
E(timerService)
.delay(1n)
.then(() => {
isNewBlock = true;
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right/best way to detect a new block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's precedent for it. I'm not sure that it's guaranteed to detect a new block, but I think it's pretty close in practice. Does anything depend on this happening exactly once per block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detecting a new block should not be done in js, rather this functionality belongs in Golang.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything depend on this happening exactly once per block?

@dckc No, but it would be a problem for this to happen more than once per block because then results produced earlier in the block could be lost.

this functionality belongs in Golang

@michaelfig How exactly would that work, given that many vats will have chain publishers that need to purge old results? I guess we could create a new vat that somehow gets end-of-block messages and is the sole source of subscriptions that publish to chain storage, but a) I'd need help with the first part, and b) I'm not sure using such a vat would actually be an improvement over using timerService (although I'm willing to believe it might be—it would really help to see the code that actually needs chainStorage-backed subscriptions rather than guessing from the bottom up like this).

Comment on lines 34 to 37
// To avoid loss when detecting the new block *after* already consuming
// results produced within it, we associate each result with an index and
// include "old" results in the batch.
// Downstream consumers are expected to deduplicate by index.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a way around this, because there's no way to know when the timerService delay resolves (or at any other time, for that matter) which results have actually been published in a block vs. which are still pending.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have control of the chain behaviour. We should solve it there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how?

Copy link
Member

@dckc dckc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to new features coming with tests. Is there a reason to skip it here? Is it cost-prohibitive?

@gibson042
Copy link
Member Author

No, I just wanted to get this up for review ASAP. I'm adding tests now.

Copy link
Member Author

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dckc Basic tests added, working on advanced tests now.

},
};

// Create an async iterable with a function for supplying results.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any existing code for creating test doubles, particularly timerService and/or an async iterable that gets fed values asynchronously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a manual timer in packages/zoe somewhere. I'm not familiar with "test doubles" so I'm not sure how relevant that is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Test double" is the general term encompassing mocks/spies/stubs/etc.—implementations used as stand-ins to facilitate testing: https://en.wikipedia.org/wiki/Test_double

import { publishToChainNode } from '../src/lib-chainPublish.js';

test('publishToChainNode', async t => {
// eslint-disable-next-line no-use-before-define
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason to break from convention here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find having to slog through a test-double library before getting to the actual tests makes the file much less readable. And the point will be moot if the functionality of getMockInputs can be moved into a test-helpers module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find having to slog through a test-double library before getting to the actual tests makes the file much less readable.

That sounds like you disagree with the convention, not like there is some reason to break from it in this case. I appreciate the appeal of successive elaboration (I wrote python that way for decades...) but no-use-before-define is not only something we inherited from airbnb style; it's a considered opinion by @erights and co that goes back at least as far as 2004: The Power of Irrelevance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I disagree with the convention in general, and here in particular strongly enough to explicitly deviate from it. But I'll change this code to conform with it if you think I should, and will read the essay regardless.

publishToChainNode(customAsyncIterable, storageNode, { timerService });

supplyIterationResult('foo');
await E(Promise).resolve();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I needed a sufficient delay to match the code under test and didn't like the approach of appending .then()s to something like Promise.resolve() until I observed progress.

Comment on lines 28 to 32
E(timerService)
.delay(1n)
.then(() => {
isNewBlock = true;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detecting a new block should not be done in js, rather this functionality belongs in Golang.

* @param {ReturnType<typeof import('./lib-chainStorage.js').makeChainStorageRoot>} chainStorageNode
* @param {{ timerService: ERef<TimerService>, serialize?: (obj: any) => string }} powers
*/
export async function publishToChainNode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right abstraction. It crosses layers and combines concerns in a muddy way. What are your trying to improve wrt the sketch in #5400?

Comment on lines 34 to 37
// To avoid loss when detecting the new block *after* already consuming
// results produced within it, we associate each result with an index and
// include "old" results in the batch.
// Downstream consumers are expected to deduplicate by index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have control of the chain behaviour. We should solve it there.

},
async delete() {
assert(key !== rootKey);
// A 'set' with no value deletes a key if it has no children, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue has been resolved in Golang. Please remove the race and make this a direct wrapper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to allow deleting a non-leaf node—IIUC, @warner and I agree that silent failure (by means of replacement with "set data to empty") seems like bad behavior that could leave orphans. But at any rate, this is a comment for #5385 rather than here.

Copy link
Member

@michaelfig michaelfig May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in #5385 (comment) the problem is solved and merged. I think you and @warner may be misunderstanding the intended semantics (implemented by #5394):

  • nodes with empty data are not stored in the tree, and unreferenced path linkage is cleared when emptying a node
  • path linkage of nodes with nonempty data is always present, and when an empty node is filled any missing path linkage is built

The linkage information is just used to enumerate children. To demonstrate the separation between linkage and data, that linkage information is kept in swingset/path Cosmos IAVL key prefix, unlike the swingset/data key prefix.

Please remove this racy workaround. It is unnecessary, and the only reason I'm requesting changes.


function makeChainStorageNode(key) {
const node = {
getKey() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! The composer in #5400 can pass this method through directly.

@michaelfig
Copy link
Member

Broad question: is packages/vats/src/lib-chainPublish.js the right place for this code, or should it be in some other package?

The middleware belongs here, but the composer used by contracts belongs in packages/notifier.

@gibson042 gibson042 force-pushed the gibson-5353-publish-to-chainStorage branch from a34f926 to dd1d898 Compare May 26, 2022 07:32
Copy link
Member

@michaelfig michaelfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lib-chainPublish.js should be done in Golang, but otherwise this PR looks good.

... except I am requesting changes to remove the non-robust and unnecessary child deletion test.

},
async delete() {
assert(key !== rootKey);
// A 'set' with no value deletes a key if it has no children, but
Copy link
Member

@michaelfig michaelfig May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in #5385 (comment) the problem is solved and merged. I think you and @warner may be misunderstanding the intended semantics (implemented by #5394):

  • nodes with empty data are not stored in the tree, and unreferenced path linkage is cleared when emptying a node
  • path linkage of nodes with nonempty data is always present, and when an empty node is filled any missing path linkage is built

The linkage information is just used to enumerate children. To demonstrate the separation between linkage and data, that linkage information is kept in swingset/path Cosmos IAVL key prefix, unlike the swingset/data key prefix.

Please remove this racy workaround. It is unnecessary, and the only reason I'm requesting changes.

* @param {ReturnType<typeof import('./lib-chainStorage.js').makeChainStorageRoot>} chainStorageNode
* @param {{ timerService: ERef<TimerService>, serialize?: (obj: any) => string }} powers
*/
export async function publishToChainNode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What layers do you think are being crossed and what concerns are being combined?

I guess what jumps out at me looking at the helper is it is trying to combine the iterator observation with the block batching. I think that the block batching belongs in the cosmic-swingset layer, and the iteration client is handled by #5400.

Transmitting arbitrary data into a chainStorage node seems very respectful of layers and independent concerns.

You're right that it needs to be done at a lower layer than #5400, but it should be even lower than JS entirely. I'm trying hard to "put the code where the data is". Since it is a cosmic-swingset concern (touching block height, event log, and merkle tree), it belongs outside of vats. I appreciate you're trying to respect peoples' time, but I would really like it if you would reach out if something about the layering doesn't make sense or seems unnecessarily weird (e.g. the chainTimer hack).

As for #5400, I looked it over carefully but didn't see anything to address the specific requirements of publishing to chain storage... am I just missing something?

Here is an adapter that explains how #5400 accomplishes these platform concerns, presuming that the batching is solved at the cosmic-swingset layer:

// adapt chainStorageNode to makeMarshalSubscriber
const publisher = Far('chainPublisherNode', {
  publish: jsonable => E(chainStorageNode).setValue(JSON.stringify(jsonable)),
  getPublisherDetails: () => { chainStorageKey: E(chainStorageNode).getKey() },
});

// Then something like:
makeMarshalSubscriber(observeEach(subscription), publisher);

Is that clearer now? I'll be working on splitting x/pubstore Golang module from x/swingset. I want to implement cosmos-specific publishing concerns there, such as batching, history, and event logging.

With some polish and moving the code around until we have minimal interfaces, I think this could be quite good.

Copy link
Member

@michaelfig michaelfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @gibson042, I'm happy with the direction we settled on. I'm removing my request for changes.

@gibson042 gibson042 marked this pull request as draft May 31, 2022 20:09
@michaelfig michaelfig self-requested a review May 31, 2022 20:10
@gibson042
Copy link
Member Author

gibson042 commented May 31, 2022

This is a draft pending completion of makePublishKit. Once done, it will be updated to export a function from the notifier package that accepts a publish-kit subscriber and a [chain] storage node and returns an object exposing both the subscriber interface (for getting iteration results with configurable lossiness) and the node's storage key (for supporting off-chain consumption).

@michaelfig
Copy link
Member

Hi @gibson042. @dckc and I plumbed this feature through using #5400.

I'd appreciate it if you had a look at that PR, and if we're missing any features from makeStoredSubscription, to focus your efforts there.

@gibson042 gibson042 closed this Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support publishing to chainStorage-backed subscriptions
3 participants