Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate JSON-RPC errors through the Rust subscription #929

Merged
merged 11 commits into from
Jul 19, 2021

Conversation

romac
Copy link
Member

@romac romac commented Jul 16, 2021

Close: #932

Emit JSON-RPC errors sent through the WS subscription through the Subscription channel, so that the Hermes event monitor can detect those (eg. when the WS subscription is closed by the Tendermint node) and attempt to resubscribe.

  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Added entry in .changelog/

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2021

Codecov Report

Merging #929 (aaa9df4) into master (a93a215) will increase coverage by 0.0%.
The diff coverage is 41.3%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master    #929   +/-   ##
======================================
  Coverage    70.4%   70.5%           
======================================
  Files         203     203           
  Lines       16285   16285           
======================================
+ Hits        11479   11483    +4     
+ Misses       4806    4802    -4     
Impacted Files Coverage Δ
rpc/src/client/transport/websocket.rs 62.6% <26.3%> (-0.8%) ⬇️
rpc/src/client/transport/router.rs 71.8% <40.9%> (-8.0%) ⬇️
rpc/src/client/transport/mock.rs 87.0% <100.0%> (ø)
rpc/src/response.rs 63.2% <100.0%> (+0.5%) ⬆️
rpc/src/endpoint/validators.rs 9.0% <0.0%> (-41.0%) ⬇️
rpc/src/version.rs 52.3% <0.0%> (-21.0%) ⬇️
tendermint/src/block/commit_sig.rs 77.2% <0.0%> (-2.3%) ⬇️
rpc/src/error.rs 50.4% <0.0%> (-1.5%) ⬇️
testgen/src/vote.rs 83.7% <0.0%> (-0.9%) ⬇️
testgen/src/header.rs 78.1% <0.0%> (-0.7%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a93a215...aaa9df4. Read the comment docs.

@romac romac marked this pull request as ready for review July 16, 2021 12:22
// We ignore incoming messages whose ID we don't recognize (could be
// relating to a fire-and-forget unsubscribe request - see the
// publish_event() method below).
Ok(())
}

async fn publish_error(&mut self, id: &str, err: Error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of duplicated code between publish_error and publish_event, we should probably clean this up

rpc/src/response.rs Outdated Show resolved Hide resolved
romac and others added 6 commits July 16, 2021 18:44
…rs.md

Co-authored-by: Thane Thomson <thane@informal.systems>
Co-authored-by: Thane Thomson <thane@informal.systems>
* Specify default value for `tendermint::block::Size`

* Add .changelog entry

* Add test for `time_iota_ms` default (#933)

* Bump supported rpc-probe version of Tendermint to v0.34.9
* Regenerate kvstore fixtures using Tendermint v0.34.9
* Update kvstore tests to test #931
* Expose Size::default_time_iota_ms to access from integration test
* Implement test for #931

Co-authored-by: Thane Thomson <thane@informal.systems>
Copy link
Contributor

@thanethomson thanethomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final minor nit, then I'd say we're good to go here!

I think there should be some follow-up work, which I can look into: to avoid all the allow(dead_code) directives, we need to fix the imports according to the features enabled. We should probably find a way to avoid importing the subscription router unless a subscription-enabled transport is required (e.g. the websocket feature).

@thanethomson thanethomson merged commit 370dc02 into master Jul 19, 2021
@thanethomson thanethomson deleted the romac/ws-errors branch July 19, 2021 15:05
romac added a commit to informalsystems/hermes that referenced this pull request Jul 21, 2021
… by Tendermint (#1205)

After some investigation, the culprit for #1196 seems to be that Tendermint is closing the WebSocket connection over which we listen for IBC events whenever more than 100 txs are included in a single block [0], as we are not able to pull the events fast enough over the WebSocket connection to avoid completely filling the event buffer in Tendermint (which currently has a hard-coded capacity of 100 events, hence the issue).

We never noticed this previously since this problem only appears in practice with a high-enough commit/propose timeout (to allow enough txs to be included in a single block), and we were testing with a lower value for the timeouts.

Now that we landed some changes in tendermint-rs [1] which allow us to notice the connection being closed, this PR makes use of this to resubscribe to the events and trigger a packet clear whenever we notice the connection being closed under our feet.

[0] tendermint/tendermint#6729
[1] informalsystems/tendermint-rs#929

---

* Propagate JSON-RPC errors through the Rust subscription

* Use tendermint-rs branch with both fixes

* Fix compilation issue in tests

* Clear pending packets when event subscription is cancelled

* Temp: Update one-chain script to use 10s commit timeout

* Use tendermint-rs master

* Update Cargo.lock

* Update changelog

* Update lockfile

* Increase delay before checking for relaying result in e2e tests

* Add comment explaining who the RPC error is propagated to

* Improve event monitor logs

* Reset `timeout_commit` and `timeout_propose` to 1s
hu55a1n1 pushed a commit to hu55a1n1/hermes that referenced this pull request Sep 13, 2022
… by Tendermint (informalsystems#1205)

After some investigation, the culprit for informalsystems#1196 seems to be that Tendermint is closing the WebSocket connection over which we listen for IBC events whenever more than 100 txs are included in a single block [0], as we are not able to pull the events fast enough over the WebSocket connection to avoid completely filling the event buffer in Tendermint (which currently has a hard-coded capacity of 100 events, hence the issue).

We never noticed this previously since this problem only appears in practice with a high-enough commit/propose timeout (to allow enough txs to be included in a single block), and we were testing with a lower value for the timeouts.

Now that we landed some changes in tendermint-rs [1] which allow us to notice the connection being closed, this PR makes use of this to resubscribe to the events and trigger a packet clear whenever we notice the connection being closed under our feet.

[0] tendermint/tendermint#6729
[1] informalsystems/tendermint-rs#929

---

* Propagate JSON-RPC errors through the Rust subscription

* Use tendermint-rs branch with both fixes

* Fix compilation issue in tests

* Clear pending packets when event subscription is cancelled

* Temp: Update one-chain script to use 10s commit timeout

* Use tendermint-rs master

* Update Cargo.lock

* Update changelog

* Update lockfile

* Increase delay before checking for relaying result in e2e tests

* Add comment explaining who the RPC error is propagated to

* Improve event monitor logs

* Reset `timeout_commit` and `timeout_propose` to 1s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rpc: Propagate JSON-RPC errors through the Rust subscription
3 participants