archive/storage: Backport API from chainHead #94

lexnv · 2023-09-12T15:41:22Z

This PR backports the chainHead_storage method to archive_storage to bring it up to date.

The main changes include:

support for batch requests via the introduced items
removal of includeDescendants bool parameter in favor of type of the query
support for fetching value, hash, closest merkle value, as well for iterating via descedantsValues and descendantsHashes

Differences between chainHead_storage and archive_storage include:

archive_storage is a method that provides the responses directly, without any concepts of operationIds
introduced an optional paginationStartKey key for descendants queries to resume pagination

cc @tomaka @jsdw @josepot @paritytech/subxt-team

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

tomaka · 2023-09-14T08:09:46Z

My opinion of archive_storage is actually to have it more similar to the legacy JSON-RPC API, and make it closer to returning just the list of values in its return value. Same for all the archive-prefixed functions in general.

The reasoning is that, compared to the chainHead functions, there's no "real time" component to archive functions.

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

lexnv · 2023-09-15T12:19:55Z

I've updated the archive/storage to be a plain method, while keeping parity with the chainHead/storage interface.

One side effect of this being a method is the introduction of the paginationStartKey to avoid repeating a result. Would love to get your thoughts on this 🙏

src/api/archive_unstable_storage.md

Co-authored-by: James Wilson <james@jsdw.me>

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

lexnv · 2023-09-25T11:49:33Z

@tomaka would love to get your feedback again on this 🙏

src/api/archive_unstable_storage.md

Co-authored-by: James Wilson <james@jsdw.me>

src/api/archive_unstable_storage.md

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

jsdw

Looking good to me now, nice one!

tomaka · 2023-10-12T07:04:57Z

I'm a bit skeptical, because I feel like this is designed without a target use case in mind.
Who for example would need the closestDescendantMerkleValue of an old block? I really don't see why you'd want that information.

I have the impression that scanning the archive requires functions such as obtaining the list of blocks where a certain change to a storage value happened, and that kind of things, and I think that the archive-prefixed functions in general should be re-designed from the grounds up.

jsdw · 2023-10-12T10:17:15Z

Who for example would need the closestDescendantMerkleValue of an old block? I really don't see why you'd want that information.

One thing I have in mind is that I can fall back to using these functions in subxt in case blocks are unpinned or whatever via the chainHead calls. So basically, if archive fns are available, people can "hold onto" blocks for much longer than if only chainHead is available.

Hard to see how often that'll end up being used, but there are things (eg network issues) that can cause recent blocks to be unpinned, so nice to have a fallback where applicable

tomaka · 2023-10-12T13:21:32Z

if archive fns are available, people can "hold onto" blocks for much longer than if only chainHead is available.

The entire point of splitting between archive and chainHead is to specialize the ways the chain is accessed: either you're watching the chain live, or you're watching a replay.
Trying to create an abstraction that works transparently over both is wrong.

If you want the guarantee to be able to read any block even in case of networking issues and such, then just use archive all the time and not chainHead.

jsdw · 2023-10-13T08:56:24Z

Trying to create an abstraction that works transparently over both is wrong.

I don't agree; I think it's very reasonable for Subxt to use chainHead to follow and work with current blocks (by far the primary way people will use it), and if we happen to be talking to a node providing archive methods then we can do a little better and still attempt to return valid details for blocks that are no longer pinned via those, if the user has held on to them for longer for some reason.

This also means that when the user asks for some details at any block hash, subxt will use a pinned block if one is available, or fall back to using an archive call if that is available, or fail if neither is possible. It makes the experience simple and does a "best effort" to retrieve the requested details.

Subxt doesn't try to guarantee anything; instead it simply aims to keep the details for any block that the user is interested in available for as long as possible.

tomaka · 2023-10-15T16:00:20Z

It makes the experience simple and does a "best effort" to retrieve the requested details.

I really disagree with that.

If SubXT can "sometimes" return information about a block and "sometimes" can't, it doesn't make it simple. Users needs to know whether their code is going to work or not, and shrugging at them isn't a very satisfactory thing to do.

Telling the user that you're going to try give them some information but without any guarantee ultimately leads either to a bad user experience or to the user having to punch through the abstraction layers in order to understand how things work under the hood.

This kind of digging under the hood is precisely why Substrate/Polkadot is so complex nowadays. Everything "sometimes" work, and in order to code anything you have to understand how everything works under the hood.
If you try for example to work on the Substrate networking code, you'll realize for example that the client sometimes is able to provide you with a block, but doesn't offer any strong guarantee when, so you have to read the code of the client in order to know when it can and when it can't. Then you'll realize that the Wasm executor sometimes returns an error, but doesn't offer any strong guarantee when, so you have to read the code of the executor. And so on.

The objective of the new JSON-RPC API is to do what the legacy API didn't do, which is to put a clear barrier here. It is clear and precise enough that users of the API don't have to look under the hood in order to understand whether their request is going to work.

jsdw · 2023-10-16T14:28:23Z

If SubXT can "sometimes" return information about a block and "sometimes" can't, it doesn't make it simple. Users needs to know whether their code is going to work or not, and shrugging at them isn't a very satisfactory thing to do.

Subxt offers the best guarantees that I think it can here; it will keep blocks pinned for as long as it can while the user is interested in them, and so as long as you ask for information from a block within a reaosnable time frame, it will always successfully return it. The new API only guarantees that blocks will be pinned for some (currently unspecified and unknown) amount of time, and so if the user holds onto one for too long, I'll eventually have to say "no". What do you think would be a better approach here?

My thought then was that if the user does hold onto a block reference for too long, and they are specifically connected to an archive node (presumably this would be a deliberate choice, especially as light client connections will eventually be the default) then they can, though exactly the same Subxt APIs, continue access information about older blocks too, in a seamless sort of way.

It is clear and precise enough that users of the API don't have to look under the hood in order to understand whether their request is going to work.

A user has to be aware that a request will work for some period of time after a block hash is seen. Currently, they won't know how long that period is, but they will know when it's no longer possible because a "stop" event is emitted. Subxt offers the same promise; if you access details about some block within a reasonable time frame, the access will succeed. We don't currently allow the user to be notified when some given block ref is no longer pinned/accessible, but that's definitely something we can add.

The entire point of splitting between archive and chainHead is to specialize the ways the chain is accessed: either you're watching the chain live, or you're watching a replay.

It's not clear to me at least what should be different between archive and chainhead method; do you have any ideas on how the archive methods should differ?

I'd likely offer a separate set of APIs for accessing archived block information in Subxt if the methods did differ in a meaningful way, but it would add complexity, and the user would then have to learn two sets of APIs (one for recent blocks and one for old blocks), so I do think that there would have to be at least a slightly compelling reason for any significant differences, but I'm open to all thoughts here!

tomaka · 2023-10-16T15:58:24Z

The new API only guarantees that blocks will be pinned for some (currently unspecified and unknown) amount of time, and so if the user holds onto one for too long, I'll eventually have to say "no". What do you think would be a better approach here?

I think that the issue here is the "currently unspecified and unknown".
The reason why blocks need to be unpinned is because the node/JSON-RPC server needs to be able to liberate resources. If this node is later able to serve the same block through archive, that means that the resources haven't been liberated, and thus that this block didn't really "need" to be unpinned and could have remained pinned.

To put it otherwise, to me the solution is for example for archive nodes to allow an unlimited amount of pinned blocks, provided that these blocks are in the finalized canonical chain. In other words, the blocks that you would have been able to access through archive anyway can remain pinned.

Basically, I don't think that you should be using archive in order to overcome issues that can be solved by tweaking chainHead.
What matters when designing the API is the intent. Whenever you want to watch what happens at the head of the chain, chainHead should be able to provide whatever you need.

It's not clear to me at least what should be different between archive and chainhead method; do you have any ideas on how the archive methods should differ?

I'd likely offer a separate set of APIs for accessing archived block information in Subxt if the methods did differ in a meaningful way, but it would add complexity, and the user would then have to learn two sets of APIs (one for recent blocks and one for old blocks), so I do think that there would have to be at least a slightly compelling reason for any significant differences, but I'm open to all thoughts here!

It's not clear to me either what the differences are.

I've designed chainHead while having in mind how a full node syncs the head of the chain and how a light client syncs the head of the chain, and how a dApp might want to access the live chain. The API was designed knowing precisely what implementers are capable of and what JSON-RPC clients might want to request.

For archive it's unfortunately way less clear to me how to design it. The users of this namespace would probably be people who use indexers for example, but I don't know precisely what these people want. For example #108 is a legitimate wish, and I don't think I have enough knowledge to anticipate all wishes.

Furthermore there are many uncertainties w.r.t. how to manage the old chain. For example we might want to deprecate things in the future, which means maybe breaking old blocks. Is that acceptable? Is that something to bake in the API? I have no idea.

jsdw · 2023-10-17T09:26:45Z

To put it otherwise, to me the solution is for example for archive nodes to allow an unlimited amount of pinned blocks, provided that these blocks are in the finalized canonical chain. In other words, the blocks that you would have been able to access through archive anyway can remain pinned.

That makes a lot of sense to me!

I still think that Subxt could provide a single interface to allow fetching of current or old blocks, but could imagine that a user might explicitly pick which of these they are doing and select the appropriate backend for the job (in other words; instruct Subxt to use the archive methods or to use the chainHead methods to drive things). I guess it depends on how different the chainHead_ vs archive_ interfaces ended up being to how I'd design this stuff.

For archive it's unfortunately way less clear to me how to design it. The users of this namespace would probably be people who use indexers for example, but I don't know precisely what these people want. For example #108 is a legitimate wish, and I don't think I have enough knowledge to anticipate all wishes.

In the absence of more information, I think that it'd be good to merge this PR and start with the archive and chainHead methods being reasonably aligned (aside from the subscription vs request sort of approach, which makes sense to differ as this PR has done).

Then, we can iterate on the design from there.

I can imagine that a bunch of things (like #108) may want to exist in chainHead and archive namespaces if possible (albeit with perhaps a slightly different approach to cater for each API; chainHead maybe giving a "live" diff each block for instance, vs archive just being given the block numbers to diff between up front).

lexnv · 2023-10-17T11:45:56Z

In the absence of more information, I think that it'd be good to merge this PR and start with the archive and chainHead methods being reasonably aligned ..
Then, we can iterate on the design from there.

That sounds like a plan to me, and follow up with use cases for the archive class to make informed decisions about what developers need. @tomaka what do you think?

lexnv · 2023-10-27T10:41:06Z

@tomaka @josepot would love to get your thoughts on this PR

As a first step having the archive slightly aligned with the chainHead would allow us to iterate a bit faster on the design; and would also unblock the substrate side implementation (paritytech/polkadot-sdk#1846). Then we could gather more data about API usage and add/extend storage differences for example (#108).

tomaka · 2023-10-27T11:57:44Z

To me what you're suggesting is to change from a bad design that shouldn't be used to a different bad design that shouldn't be used. I don't have an opinion on this.

jsdw · 2023-11-09T10:34:29Z

To me, this PR improves on the state of the archive_ methods by making them more consistent with the chainHead ones. I think that this is a good place to aim for in the absense of any data supporting doing something different.

So let's merge this by the end of the week unless there are any remaining objections that we can solve, and when we get more data we can make the necessary changes.

tomaka · 2023-11-09T11:47:55Z

Again I'm neutral here, but to me consistency between archive and chainHead is not an objective.
They're two completely different APIs for two completely different usages.

jsdw · 2023-11-09T11:54:57Z

consistency between archive and chainHead is not an objective. They're two completely different APIs for two completely different usages.

Agreed; not an objective, but a good default when we don't know better :)

niklasad1 · 2023-11-15T09:13:01Z

src/api/archive_unstable_storage.md

-Yields an item that was found in the storage.
+The JSON object corresponds to one of the requested items whose `type` was `"value"` or `"descendantsValues"`.
+
+If the `key` is not associated with a storage value in the trie, then no response is generated.


I guess this is fine but it would be easier for clients if the server would just send back null to indicate that key was not found.

For this API, a client probably needs to have a timeout for these request and after for instance 1 min the call is regarded as failed.

Will modify the phrasing of the spec. What I meant by this, given archive/storage is a plain method, no response will be generated for this "key" in the "result" array. Users will still get back once all queries are processed the responses back

niklasad1 · 2023-11-15T09:14:18Z

src/api/archive_unstable_storage.md

-The `waiting-for-continue` event is generated after at least one `"item"` event has been generated, and indicates that the JSON-RPC client must call `archive_unstable_storageContinue` before more events are generated.
+The JSON object corresponds to one of the requested items whose `type` was `"hash"` or `"descendantsHashes"`.
+
+If the `key` is not associated with a storage value in the trie, then no response is generated.


same here 👆

src/api/archive_unstable_storage.md

niklasad1 · 2023-11-15T10:12:16Z

For archive it's unfortunately way less clear to me how to design it. The users of this namespace would probably be people who use indexers for example, but I don't know precisely what these people want. For example #108 is a legitimate wish, and I don't think I have enough knowledge to anticipate all wishes.

Furthermore there are many uncertainties w.r.t. how to manage the old chain. For example we might want to deprecate things in the future, which means maybe breaking old blocks. Is that acceptable? Is that something to bake in the API? I have no idea.

Yeah, I agree with ☝️ there lots of unknowns for sure but on the hand we have to start somewhere to know what issues that comes up to "fetch old blocks" with this archive API.

It doesn't have to be perfect but as long we can get folks to move away from the current RPC API "to fetch old storage" and get feedback for users what doesn't work etc.. to iterate further on it.

Co-authored-by: Niklas Adolfsson <niklasadolfsson1@gmail.com>

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

This PR implements the `archive_unstable_storage` method that offers support for: - fetching values - fetching hashes - iterating over keys and values - iterating over keys and hashes - fetching merkle values from the trie-db A common component dedicated to RPC-V2 storage queries is created to bridge the gap between `chainHead/storage` and `archive/storage`. Query pagination is supported by `paginationStartKey`, similar to the old APIs. Similarly to the `chainHead/storage`, the `archive/storage` method accepts a maximum number of queried items. The design builds upon: paritytech/json-rpc-interface-spec#94. Closes #1512. cc @paritytech/subxt-team --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Niklas Adolfsson <niklasadolfsson1@gmail.com>

This PR implements the `archive_unstable_storage` method that offers support for: - fetching values - fetching hashes - iterating over keys and values - iterating over keys and hashes - fetching merkle values from the trie-db A common component dedicated to RPC-V2 storage queries is created to bridge the gap between `chainHead/storage` and `archive/storage`. Query pagination is supported by `paginationStartKey`, similar to the old APIs. Similarly to the `chainHead/storage`, the `archive/storage` method accepts a maximum number of queried items. The design builds upon: paritytech/json-rpc-interface-spec#94. Closes paritytech#1512. cc @paritytech/subxt-team --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Niklas Adolfsson <niklasadolfsson1@gmail.com>

lexnv added 3 commits September 12, 2023 18:28

archive/storage: Backport API from chainHead

6e7d735

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

archive/storage: Rename result to event for started

5ebe0c2

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

archive/storage: Remove operationId

e866731

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

lexnv self-assigned this Sep 12, 2023

lexnv mentioned this pull request Sep 12, 2023

archive/storage: Use a similar API with chainHead #90

Closed

lexnv added 3 commits September 14, 2023 15:31

archive/storage: Use method response instead of subscription

4dde069

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

archive/storage: Remove the continue function

7c5d5b5

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

Merge remote-tracking branch 'origin/main' into lexnv/archive_backport

c76e13c

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

jsdw reviewed Sep 21, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

jsdw reviewed Sep 21, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

jsdw reviewed Sep 21, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

jsdw reviewed Sep 21, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

jsdw reviewed Sep 21, 2023

View reviewed changes

src/api/archive_unstable_storage.md Show resolved Hide resolved

jsdw reviewed Sep 21, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

lexnv and others added 6 commits September 21, 2023 20:50

Update src/api/archive_unstable_storage.md

a9db9f4

Co-authored-by: James Wilson <james@jsdw.me>

Update src/api/archive_unstable_storage.md

aff40ed

Co-authored-by: James Wilson <james@jsdw.me>

archive/storage: Adjust notes

0fb1932

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

archive/storage: Add missing dots and unify params

bfd5dd4

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

archive/storage: Remove the overview section

5a58b9a

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

Merge remote-tracking branch 'origin/main' into lexnv/archive_backport

baeddbf

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

lexnv mentioned this pull request Sep 22, 2023

archive/call: Synchronize the return type with sudo/sessionsKeys #104

Merged

lexnv mentioned this pull request Oct 11, 2023

archive: Implement archive_unstable_storage paritytech/polkadot-sdk#1846

Merged

jsdw reviewed Oct 11, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

Update src/api/archive_unstable_storage.md

2e955ee

Co-authored-by: James Wilson <james@jsdw.me>

jsdw reviewed Oct 11, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

jsdw reviewed Oct 11, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

archive: Move note section above possible errors

22f1a15

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

archive: Remove storageContinue method

d26e3ce

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

jsdw previously approved these changes Oct 11, 2023

View reviewed changes

niklasad1 reviewed Nov 15, 2023

View reviewed changes

src/api/archive_unstable_storage.md Outdated Show resolved Hide resolved

niklasad1 reviewed Nov 15, 2023

View reviewed changes

src/api/archive_unstable_storage.md Show resolved Hide resolved

niklasad1 previously approved these changes Nov 15, 2023

View reviewed changes

Update src/api/archive_unstable_storage.md

00d6825

Co-authored-by: Niklas Adolfsson <niklasadolfsson1@gmail.com>

lexnv dismissed stale reviews from niklasad1 and jsdw via 00d6825 November 15, 2023 10:56

archive/storage: Clarify empty responses

e894512

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

niklasad1 approved these changes Nov 15, 2023

View reviewed changes

lexnv merged commit 6daf453 into main Nov 15, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive/storage: Backport API from chainHead #94

archive/storage: Backport API from chainHead #94

lexnv commented Sep 12, 2023 •

edited

Loading

tomaka commented Sep 14, 2023 •

edited

Loading

lexnv commented Sep 15, 2023

lexnv commented Sep 25, 2023

jsdw left a comment

tomaka commented Oct 12, 2023

jsdw commented Oct 12, 2023

tomaka commented Oct 12, 2023

jsdw commented Oct 13, 2023

tomaka commented Oct 15, 2023 •

edited

Loading

jsdw commented Oct 16, 2023

tomaka commented Oct 16, 2023 •

edited

Loading

jsdw commented Oct 17, 2023

lexnv commented Oct 17, 2023

lexnv commented Oct 27, 2023

tomaka commented Oct 27, 2023

jsdw commented Nov 9, 2023

tomaka commented Nov 9, 2023 •

edited

Loading

jsdw commented Nov 9, 2023

niklasad1 Nov 15, 2023 •

edited

Loading

lexnv Nov 15, 2023

niklasad1 Nov 15, 2023

niklasad1 commented Nov 15, 2023

archive/storage: Backport API from chainHead #94

archive/storage: Backport API from chainHead #94

Conversation

lexnv commented Sep 12, 2023 • edited Loading

tomaka commented Sep 14, 2023 • edited Loading

lexnv commented Sep 15, 2023

lexnv commented Sep 25, 2023

jsdw left a comment

Choose a reason for hiding this comment

tomaka commented Oct 12, 2023

jsdw commented Oct 12, 2023

tomaka commented Oct 12, 2023

jsdw commented Oct 13, 2023

tomaka commented Oct 15, 2023 • edited Loading

jsdw commented Oct 16, 2023

tomaka commented Oct 16, 2023 • edited Loading

jsdw commented Oct 17, 2023

lexnv commented Oct 17, 2023

lexnv commented Oct 27, 2023

tomaka commented Oct 27, 2023

jsdw commented Nov 9, 2023

tomaka commented Nov 9, 2023 • edited Loading

jsdw commented Nov 9, 2023

niklasad1 Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

lexnv Nov 15, 2023

Choose a reason for hiding this comment

niklasad1 Nov 15, 2023

Choose a reason for hiding this comment

niklasad1 commented Nov 15, 2023

lexnv commented Sep 12, 2023 •

edited

Loading

tomaka commented Sep 14, 2023 •

edited

Loading

tomaka commented Oct 15, 2023 •

edited

Loading

tomaka commented Oct 16, 2023 •

edited

Loading

tomaka commented Nov 9, 2023 •

edited

Loading

niklasad1 Nov 15, 2023 •

edited

Loading