-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed RPC subscriptions leak when subscription stream is finished #4533
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
acatangiu
approved these changes
May 21, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! 💯
serban300
approved these changes
May 21, 2024
@svyatonik nice digging and you got it right. I guess we could maybe remove this subscription limit since the server is backpressured these days |
hitchhooker
pushed a commit
to ibp-network/polkadot-sdk
that referenced
this pull request
Jun 5, 2024
…ritytech#4533) closes paritytech/parity-bridges-common#3000 Recently we've changed our bridge configuration for Rococo <> Westend and our new relayer has started to submit transactions every ~ `30` seconds. Eventually, it switches itself into limbo state, where it can't submit more transactions - all `author_submitAndWatchExtrinsic` calls are failing with the following error: `ERROR bridge Failed to send transaction to BridgeHubRococo node: Call(ErrorObject { code: ServerError(-32006), message: "Too many subscriptions on the connection", data: Some(RawValue("Exceeded max limit of 1024")) })`. Some links for those who want to explore: - server side (node) has a strict limit on a number of active subscriptions. It fails to open a new subscription if this limit is hit: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/server/src/middleware/rpc/layer/rpc_service.rs#L122-L132. The limit is set to `1024` by default; - internally this limit is a semaphore with `limit` permits: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/core/src/server/subscription.rs#L461-L485; - semaphore permit is acquired in the first link; - the permit is "returned" when the `SubscriptionSink` is dropped: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/core/src/server/subscription.rs#L310-L325; - the `SubscriptionSink` is dropped when [this `polkadot-sdk` function](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L58-L94) returns. In other words - when the connection is closed, the stream is finished or internal subscription buffer limit is hit; - the subscription has the internal buffer, so sending an item contains of two steps: [reading an item from the underlying stream](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L125-L141) and [sending it over the connection](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L111-L116); - when the underlying stream is finished, the `inner_pipe_from_stream` wants to ensure that all items are sent to the subscriber. So it: [waits until the current send operation completes](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L146-L148) and then [send all remaining items from the internal buffer](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L150-L155). Once it is done, the function returns, the `SubscriptionSink` is dropped, semaphore permit is dropped and we are ready to accept new subscriptions; - unfortunately, the code just calls the `pending_fut.await.is_err()` to ensure that [the current send operation completes](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L146-L148). But if there are no current send operation (which is normal), then the `pending_fut` is set to terminated future and the `await` never completes. Hence, no return from the function, no drop of `SubscriptionSink`, no drop of semaphore permit, no new subscriptions allowed (once number of susbcriptions hits the limit. I've illustrated the issue with small test - you may ensure that if e.g. the stream is initially empty, the `subscription_is_dropped_when_stream_is_empty` will hang because `pipe_from_stream` never exits.
TarekkMA
pushed a commit
to moonbeam-foundation/polkadot-sdk
that referenced
this pull request
Aug 2, 2024
…ritytech#4533) closes paritytech/parity-bridges-common#3000 Recently we've changed our bridge configuration for Rococo <> Westend and our new relayer has started to submit transactions every ~ `30` seconds. Eventually, it switches itself into limbo state, where it can't submit more transactions - all `author_submitAndWatchExtrinsic` calls are failing with the following error: `ERROR bridge Failed to send transaction to BridgeHubRococo node: Call(ErrorObject { code: ServerError(-32006), message: "Too many subscriptions on the connection", data: Some(RawValue("Exceeded max limit of 1024")) })`. Some links for those who want to explore: - server side (node) has a strict limit on a number of active subscriptions. It fails to open a new subscription if this limit is hit: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/server/src/middleware/rpc/layer/rpc_service.rs#L122-L132. The limit is set to `1024` by default; - internally this limit is a semaphore with `limit` permits: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/core/src/server/subscription.rs#L461-L485; - semaphore permit is acquired in the first link; - the permit is "returned" when the `SubscriptionSink` is dropped: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/core/src/server/subscription.rs#L310-L325; - the `SubscriptionSink` is dropped when [this `polkadot-sdk` function](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L58-L94) returns. In other words - when the connection is closed, the stream is finished or internal subscription buffer limit is hit; - the subscription has the internal buffer, so sending an item contains of two steps: [reading an item from the underlying stream](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L125-L141) and [sending it over the connection](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L111-L116); - when the underlying stream is finished, the `inner_pipe_from_stream` wants to ensure that all items are sent to the subscriber. So it: [waits until the current send operation completes](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L146-L148) and then [send all remaining items from the internal buffer](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L150-L155). Once it is done, the function returns, the `SubscriptionSink` is dropped, semaphore permit is dropped and we are ready to accept new subscriptions; - unfortunately, the code just calls the `pending_fut.await.is_err()` to ensure that [the current send operation completes](https://github.com/paritytech/polkadot-sdk/blob/278486f9bf7db06c174203f098eec2f91839757a/substrate/client/rpc/src/utils.rs#L146-L148). But if there are no current send operation (which is normal), then the `pending_fut` is set to terminated future and the `await` never completes. Hence, no return from the function, no drop of `SubscriptionSink`, no drop of semaphore permit, no new subscriptions allowed (once number of susbcriptions hits the limit. I've illustrated the issue with small test - you may ensure that if e.g. the stream is initially empty, the `subscription_is_dropped_when_stream_is_empty` will hang because `pipe_from_stream` never exits.
This was referenced Aug 21, 2024
This was referenced Oct 4, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes paritytech/parity-bridges-common#3000
Recently we've changed our bridge configuration for Rococo <> Westend and our new relayer has started to submit transactions every ~
30
seconds. Eventually, it switches itself into limbo state, where it can't submit more transactions - allauthor_submitAndWatchExtrinsic
calls are failing with the following error:ERROR bridge Failed to send transaction to BridgeHubRococo node: Call(ErrorObject { code: ServerError(-32006), message: "Too many subscriptions on the connection", data: Some(RawValue("Exceeded max limit of 1024")) })
.Some links for those who want to explore:
1024
by default;limit
permits: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/core/src/server/subscription.rs#L461-L485;SubscriptionSink
is dropped: https://github.com/paritytech/jsonrpsee/blob/a4533966b997e83632509ad97eea010fc7c3efc0/core/src/server/subscription.rs#L310-L325;SubscriptionSink
is dropped when thispolkadot-sdk
function returns. In other words - when the connection is closed, the stream is finished or internal subscription buffer limit is hit;inner_pipe_from_stream
wants to ensure that all items are sent to the subscriber. So it: waits until the current send operation completes and then send all remaining items from the internal buffer. Once it is done, the function returns, theSubscriptionSink
is dropped, semaphore permit is dropped and we are ready to accept new subscriptions;pending_fut.await.is_err()
to ensure that the current send operation completes. But if there are no current send operation (which is normal), then thepending_fut
is set to terminated future and theawait
never completes. Hence, no return from the function, no drop ofSubscriptionSink
, no drop of semaphore permit, no new subscriptions allowed (once number of susbcriptions hits the limit.I've illustrated the issue with small test - you may ensure that if e.g. the stream is initially empty, the
subscription_is_dropped_when_stream_is_empty
will hang becausepipe_from_stream
never exits.