-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON-RPC messages arrive out of order #2992
Comments
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/papi-is-coming-showcasing-the-validator-selector-tool-dapp/5756/1 |
All right, these seems to be caused by To give an example what is happening the following is extracted from the logs:
Thus, if I understand this correctly we should really wait to emit events on the subscription with operationID 373 until the unstable call has been answered?! Relevant code: https://github.com/paritytech/polkadot-sdk/blob/master/substrate/client/rpc-spec-v2/src/chain_head/chain_head.rs#L361-#L370 |
Thanks Josep for raising this and providing the repro case 🙏 As Niklas pointed out, this is the culprit of the behavior: polkadot-sdk/substrate/client/rpc-spec-v2/src/chain_head/chain_head.rs Lines 361 to 371 in 9db9211
The race is happening because:
It is possible that before the user receives the To mitigate this:
I would opt for option 2, since we already have a precedent in the rpc-v2 spec for the stopOperation. Would love to get your thoughts on this! |
My gut feeling is that 2 is the right approach; we probably can't guarantee that nothing will return on the subscription until after the user has received a response on the separate call. Even if we make sure that messages are sent in the right order, they may not be received in the right order. So the right approach to working with this API is probably to start buffering messages from the subscription before making the call, and then once you get a response from the call with the operation ID, you can look for it in all of the subscription messages inc those that you buffered. (I checked what we do in Subxt here and that's essentially it). |
This is IMO absolutely unacceptable.
WebSocket connections work over TCP, so this statement is false. Messages sent in the right order are guaranteed to be received in the right order over a WebSocket connection. The whole point of having a JSON-RPC API is to be able to guarantee things that can't be guaranteed over lower level protocols like UDP. One of those things is the order in which the messages arrive. It is unreasonable to force the client to keep around (for how long?) some notifications that the client doesn't know to which subscription they belong yet. |
Yeah, I'm here on line with @josepot. From your problem description it reads like the problem is in the server implementation that doesn't wait for the |
The JSON-RPC spec defines how notifications work. It is IMO completely wrong to keep around all the non-identified notifications, because there is no guarantee that a "correlation id" will be make public at a later stage. |
I suppose that since all of this traffic goes down a single websocket connection now, my concern about a race in what message the client receives first doesn't make so much sense. My concern in any case was that if you have (pseudo) code like: // Perform some operation and get back ID
let operationId = await chainHeadCall(..);
// Then, get a handle to your follow subscription events and filter the ones you want
let operationEvents = chainHeadFollowSub.events().filterById(operationId); Then it's possible for an event to arrive between getting the operation ID and starting to watch the follow subscription (because when you receive the operationId, it may already be in flight). I suspect though that in JS this isn't an issue since that task wouldn't yield control to anything else from the moment the operation ID comes back until after it's monitoring the subscription though. With my head in Rust-land, where it's possible for events with that operation to be received on a separate thread between those two lines, I still need to care about potential races here I think. In any case, let's see what we can do to enforce that the messages are sent down the WS connection in the right order! |
Also FWIW the spec doesn't say anything about which order one can expect things in, only saying about the response containing an operationId:
If we want to guarantee that this retuns before any corresponding events on the follow subscription, let's make it clear in the spec too. |
This makes absolutely no sense... Not only because that pseudocode makes no sense (what even is the type of the variable In JS every piece of code that runs, runs as a consequence of a callback triggered by an I/O event-queue. If an event arrives "in between", then the event still gets added to the I/O event-queue. So, as long as there is a listener interested, then the event that came "in between" will trigger the callback of the listener. If your argument is that it's technically possible to attach the the listener "too late", well then: sure, it's possible to write buggy code. However, in JS you will have to try very hard to shoot yourself in the foot in order to attach the listener too late. I mean, sure, it's well stablished that developers can mess up and write buggy code, but the whole idea that in JS you may miss events because you add their listener while the event is coming is absolutely nonsensical, especially in this case where the listener is a "child" listener of an already attached event-listener.
I mean, the spec does say things like:
and
So, sure... Perhaps it could be even more explicit, but IMO it's already mentioning that the order must be preserved. On top of that, the documentation of the event clearly states that:
Ergo, the |
Is there anything stopping the user from having separate WebSocket connections?
In this case, we can only guarantee the |
Different WebSocket connections implies different connections, which implies different subscriptions, which means: how is this even relevant to this discussion?
This makes absolutely no sense. You can not mix the |
It's technically possible to create a subscription on different connection than the Let's get back to main point, for a single connection we should enforce the order of the messages period. We would need a more low-level API in jsonrpsee because the response is sent about when the RPC method returns and we emit these "events at same time" and race-conditions can occur. |
The Users then can make a
There is nothing in the server implementation or the spec; that would stop or restrict this behavior.
To make the call to the
The server doesn't look at the socket connection or the IP of the user etc:
Any change to the spec should take into account this information. Or restrict / clarify user connections. And for the time being, we should be looking at fixing the "single connection" use-case since we have more control there 👍 |
Is it possible to create different subscriptions on different connections? Sure! Can a user use the Obviously, the answer is: no, it can not, because WebSocket connections -unlike REST APIs- are stateful. So, this is a moot point. |
Just to echo Niklas's reply on paritytech/json-rpc-interface-spec#125: The following PR in jsonrpsee should be merged soon, which will give When that is merged, @niklasad1 will cherry pick that into a small jsonrpsee release and open a PR here to address this issue! We are aiming to have this PR ready to merge this week. |
Kudos to @niklasad1 for adding a nice API to handle the single connection race in: paritytech/jsonrpsee#1281. From the substrate perspective, I have created this issue to allow At the time being, we can use the |
Close #2992 Breaking changes: - rpc server grafana metric `substrate_rpc_requests_started` is removed (not possible to implement anymore) - rpc server grafana metric `substrate_rpc_requests_finished` is removed (not possible to implement anymore) - rpc server ws ping/pong not ACK:ed within 30 seconds more than three times then the connection will be closed Added - rpc server grafana metric `substrate_rpc_sessions_time` is added to get the duration for each websocket session
…tech#3230) Close paritytech#2992 Breaking changes: - rpc server grafana metric `substrate_rpc_requests_started` is removed (not possible to implement anymore) - rpc server grafana metric `substrate_rpc_requests_finished` is removed (not possible to implement anymore) - rpc server ws ping/pong not ACK:ed within 30 seconds more than three times then the connection will be closed Added - rpc server grafana metric `substrate_rpc_sessions_time` is added to get the duration for each websocket session
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/polkadot-api-2024-q1-update/7462/1 |
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Very often, when using the
chainHead
group of functions of new JSON-RPC API against a Polkadot-SDK node, there are "subscription" messages that come out of order. E.g:What happens is that the message that announces the "opeartionId" that the client should be paying attention to, comes after the notifications for that "operationId". Since the client does not recognize those messages, then the messages get ignored.
I first experienced this against different public endpoints, and then I tried it against a local development node and I experienced the same issue. In order to further confirm that this was a problem with the implementation on the node, I tried running the same requests against smoldot while using a chainspec that had my local development node as the only bootnode, trying the same requests through smoldot (with the chainspec of a local development node) works flawlessly.
Steps to reproduce
I have setup the following nodejs repo to reproduce the issue. I recommend cloning the repo and trying it out in your local environment. However, you can also try it out using stackblitz.
In order to reproduce/debug what's happening:
pnpm i
pnpm start
Every time that the script detects a message out of order it will print these kinds of messages on the terminal:
The script also generates a file named
wire-logs.txt
, where you will find all the send/received messages that were seen over the WebSocket connection in the order that they happened:If you want to try against a different endpoint, then just edit the following line.
I hope that this repo helps to investigate the issue, because it's taken me a while to set it up. 🤞
The text was updated successfully, but these errors were encountered: