-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hermes issues found by Dex scalability tests on v0.6.0 #1196
Comments
This was referenced Jul 14, 2021
re-opening as the missing rpc events still needs to be solved |
8 tasks
romac
changed the title
hermes issues found by Dex scalability tests on v0.6.0
Hermes issues found by Dex scalability tests on v0.6.0
Jul 20, 2021
romac
added a commit
that referenced
this issue
Jul 21, 2021
… by Tendermint (#1205) After some investigation, the culprit for #1196 seems to be that Tendermint is closing the WebSocket connection over which we listen for IBC events whenever more than 100 txs are included in a single block [0], as we are not able to pull the events fast enough over the WebSocket connection to avoid completely filling the event buffer in Tendermint (which currently has a hard-coded capacity of 100 events, hence the issue). We never noticed this previously since this problem only appears in practice with a high-enough commit/propose timeout (to allow enough txs to be included in a single block), and we were testing with a lower value for the timeouts. Now that we landed some changes in tendermint-rs [1] which allow us to notice the connection being closed, this PR makes use of this to resubscribe to the events and trigger a packet clear whenever we notice the connection being closed under our feet. [0] tendermint/tendermint#6729 [1] informalsystems/tendermint-rs#929 --- * Propagate JSON-RPC errors through the Rust subscription * Use tendermint-rs branch with both fixes * Fix compilation issue in tests * Clear pending packets when event subscription is cancelled * Temp: Update one-chain script to use 10s commit timeout * Use tendermint-rs master * Update Cargo.lock * Update changelog * Update lockfile * Increase delay before checking for relaying result in e2e tests * Add comment explaining who the RPC error is propagated to * Improve event monitor logs * Reset `timeout_commit` and `timeout_propose` to 1s
hu55a1n1
pushed a commit
to hu55a1n1/hermes
that referenced
this issue
Sep 13, 2022
… by Tendermint (informalsystems#1205) After some investigation, the culprit for informalsystems#1196 seems to be that Tendermint is closing the WebSocket connection over which we listen for IBC events whenever more than 100 txs are included in a single block [0], as we are not able to pull the events fast enough over the WebSocket connection to avoid completely filling the event buffer in Tendermint (which currently has a hard-coded capacity of 100 events, hence the issue). We never noticed this previously since this problem only appears in practice with a high-enough commit/propose timeout (to allow enough txs to be included in a single block), and we were testing with a lower value for the timeouts. Now that we landed some changes in tendermint-rs [1] which allow us to notice the connection being closed, this PR makes use of this to resubscribe to the events and trigger a packet clear whenever we notice the connection being closed under our feet. [0] tendermint/tendermint#6729 [1] informalsystems/tendermint-rs#929 --- * Propagate JSON-RPC errors through the Rust subscription * Use tendermint-rs branch with both fixes * Fix compilation issue in tests * Clear pending packets when event subscription is cancelled * Temp: Update one-chain script to use 10s commit timeout * Use tendermint-rs master * Update Cargo.lock * Update changelog * Update lockfile * Increase delay before checking for relaying result in e2e tests * Add comment explaining who the RPC error is propagated to * Improve event monitor logs * Reset `timeout_commit` and `timeout_propose` to 1s
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Crate
relayer
Summary of Bug
Following up on the Dex scalability test issue reported, @nodebreaker0-0 has provided access to his setup and showed me the setup, config and the commands he was running.
I was able to reproduce this locally and after debugging found that there are a few issues in v0.6.0:
Version
hermes v0.6.0, v0.5.0
Tested with both gaia v4.2.1 and v5.0.0, reproducible with both.
Steps to Reproduce
one-chain
script (seeanca/dex_rpc_event
)dev-env
script:./scripts/dev-env ~/.hermes/config.toml ibc-0 ibc-1
hermes create channel ibc-0 ibc-1 --port-a transfer --port-b transfer
hermes -c hermes_config.toml start
hermes -c ft_transfer_config.toml tx raw ft-transfer ibc-1 ibc-0 transfer channel-0 9999 -o 1000 -n 150 -k user2
hermes -c config_1.toml tx raw ft-transfer ibc-1 ibc-0 transfer channel-0 9999 -o 1000 -n 1 -k user2
anca/dex_rpc_event
for clearing the packets and that part seems ok nowFor the missing rpc events I traced back to
stream_batches()
:https://github.com/informalsystems/ibc-rs/blob/3f0d4bc38a3d737cbd289c2087d09228bdb7b07f/relayer/src/event/monitor.rs#L393
collect_events()
is only called 12-15 times and after this no other event gets through. I also tried with v0.5.0 and that seems to have the same issue. Not sure if the problem is inibc-rs
ortendermint-rs
. The block includes all events (you can check withcurl localhost:26657/block_results?height=533
)Acceptance Criteria
For Admin Use
The text was updated successfully, but these errors were encountered: