-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression bug: Hermes is unable to re-establish monitor connection after node goes down #1026
Closed
5 tasks done
Comments
Merged
5 tasks
5 tasks
A few thoughts (writing them here but we'll need separate issues)
Testing
Other issues I noticed while debugging this:
|
Further work on this tracked in #1035 |
romac
added a commit
that referenced
this issue
Jun 3, 2021
* Added details about the help command in the guide * Bump version to 0.4.0 * Update guide to account for `start-multi` being promoted to `start` * Fix changelog * Document telemetry section of config file * Fixup documentation for global section of configuration file * Document type of each config option * Remove unsused config default method * Guide update for the query clients method. * Typo fix * Re-add Cargo.lock for proto-compiler crate * Document addition of `host` param to telemetry config * Document telemetry service * Update changelog with telemetry * Add changelog entry for #1026 * Channel worker updates * Add missing files * Update feature matrix * Update mdbook to v0.4.7 * Update mdbook to v0.4.9 * Add cosmos-sdk versions supported * Higlight compat info * Write summary of 0.4.0 release Co-authored-by: Romain Ruetschi <romain@informal.systems> Co-authored-by: Anca Zamfir <zamfiranca@gmail.com>
This was referenced Sep 10, 2021
hu55a1n1
pushed a commit
to hu55a1n1/hermes
that referenced
this issue
Sep 13, 2022
* Added details about the help command in the guide * Bump version to 0.4.0 * Update guide to account for `start-multi` being promoted to `start` * Fix changelog * Document telemetry section of config file * Fixup documentation for global section of configuration file * Document type of each config option * Remove unsused config default method * Guide update for the query clients method. * Typo fix * Re-add Cargo.lock for proto-compiler crate * Document addition of `host` param to telemetry config * Document telemetry service * Update changelog with telemetry * Add changelog entry for informalsystems#1026 * Channel worker updates * Add missing files * Update feature matrix * Update mdbook to v0.4.7 * Update mdbook to v0.4.9 * Add cosmos-sdk versions supported * Higlight compat info * Write summary of 0.4.0 release Co-authored-by: Romain Ruetschi <romain@informal.systems> Co-authored-by: Anca Zamfir <zamfiranca@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Crate
ibc-relayer
primarilySummary of Bug
The fault-tolerance mechanism introduced in #895 (and the follow-up PR #903) ensures that
Hermes can cope with a full node's websocket endpoint becoming unreachable, and continues
to function unaffected once the endpoint is reachable again.
We introduced a regression bug (possibly with the batching
feature, as Romain suggested) because this mechanism no longer works.
Version
e4a6543
Steps to Reproduce
scripts/dev-env
with two chains, e.g., ibc-0 and ibc-1hermes create channel ibc-0 ibc-1 --port-a transfer --port-b transfer
and wait for it to finishhermes start
in one terminal, make sure your logging level is at leastdebug
gaia
instances and watch thehermes
outputkill -9 GAIA_PID
scripts/dev-env
again with the same configuration as beforehermes create channel ibc-0 ibc-1 --port-a transfer --port-b transfer
againhermes tx raw ft-transfer ibc-1 ibc-0 transfer channel-0 9999 -o 1000
The problem is that Hermes (the one we started at step 3) should pick up the connection to the two gaia instances and relay the packet. But this does not happen. Instead, Hermes does connect via websocket to the chains, but it does not receive any events from either of the two chains.
What should happen
If using version 20d8fff of Hermes, and running the same recipe as above, at steps 7 and 8 we would see Hermes workers starting and doing active work (relaying). For example, the output should be:
Note that 20d8fff will requite that we use command
hermes start-multi
at step 2 (instead ofhermes start
) and the configuration .toml file should havestrategy = 'naive'
.Acceptance Criteria
For Admin Use
The text was updated successfully, but these errors were encountered: