Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hermes 0.3.2 issues after testing with Gaia manager #972

Closed
7 tasks done
adizere opened this issue May 21, 2021 · 3 comments · Fixed by #974
Closed
7 tasks done

Hermes 0.3.2 issues after testing with Gaia manager #972

adizere opened this issue May 21, 2021 · 3 comments · Fixed by #974
Assignees
Labels
A: bug Admin: something isn't working
Milestone

Comments

@adizere
Copy link
Member

adizere commented May 21, 2021

Crate

mostly ibc-relayer and ibc-relayer-cli

Summary of Bug

This is a collection of issues uncovered with @brapse while getting to understand gm (ref: #928).

  • start-multi panics if all chains are unreachable
  • BUG: client worker has incorrect (reversed) parameters

Version

0.3.2

Steps to Reproduce

start-multi panics if all chains are unreachable

hermes start-multi dies if no chain is running, with the following panic:

May 21 11:56:37.874 ERROR ibc_relayer::supervisor: failed to spawn chain runtime for network1: RPC error to endpoint http://localhost:27000/: error trying to connect: tcp connect error: Connection refused (os error 61) (code: 0)
...
May 21 11:56:37.896 ERROR ibc_relayer::supervisor: skipping workers for chain id network1. reason: failed to spawn chain runtime with error: RPC error to endpoint http://localhost:27000/: error trying to connect: tcp connect error: Connection refused (os error 61) (code: 0)
...
The application panicked (crashed).
Message:  no operations have been added to `Select`
Location: /Users/adi/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/crossbeam-channel-0.5.1/src/select.rs:466

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

A more helpful way would be to exit is to signal the error:

...
Error: supervisor was not able to connect to any chain

BUG: client worker has incorrect (reversed) parameters

The client worker that Hermes creates upon observing the Channel Open Ack or Confirm event has the wrong parameters.

To reproduce:

  • start three chains with no prior state; assume chain identifiers are network1, network2, and network3.
  • create three clients hosted on network1
    • hermes create client network1 network3
    • hermes create client network1 network3
    • hermes create client network1 network3
  • now create two clients hosted on network2
    • hermes create client network2 network3
    • hermes create client network2 network3

This finishes the setup phase. Setting up these multiple clients will help clarify which client belongs to which network in the following steps.

  • now run hermes start-multi from one terminal
  • from a separate terminal, we will create a channel (with new clients and connection) linking network1 <> network2
    • hermes create channel network1 network2 --port-a transfer --port-b transfer

At this point, we go back to the first terminal (where start-multi is running) and observe the output. Hermes should eventually pick up the OpenAckChannelEv event, which network1 emits, as follows:

May 21 15:09:32.433 DEBUG ibc_relayer::supervisor: chain network1 sent events:
OpenAckChannelEv(OpenAck(Attributes { height: Height { revision: 0, height: 39 }, port_id: PortId("transfer"), channel_id: Some(ChannelId("channel-0")), connection_id: ConnectionId("connection-0"), counterparty_port_id: PortId("transfer"), counterparty_channel_id: Some(ChannelId("channel-0")) }))
for object Client(Client { dst_chain_id: ChainId { id: "network1", version: 0 }, dst_client_id: ClientId("07-tendermint-3"), src_chain_id: ChainId { id: "network2", version: 0 } })

This log line says that a client worker will be spawned to handle updating the client with id 07-tendermint-3 hosted on chain network1 with headers from network2. We should be able to observe the bug soon after this, which manifests in the following weird log output:

May 21 15:09:32.658 WARN worker loop{worker=network2->network1:07-tendermint-3}: ibc_relayer::foreign_client: [network1 -> network2:07-tendermint-3] misbehaviour checking result Misbehaviour("failed querying client state on dst chain 07-tendermint-3 with error: Query error occurred (failed to query for client state): error converting message type into domain type: the client state was not found")

The network1 -> network2:07-tendermint-3 bit suggests that the client worker has been instantiated with a ForeignClient that is hosted on chain network2 (this is called the destination chain) and has identifier 07-tendermint-3 and is targeting the source chain network1. But client 07-tendermint-3 has reverse parameters: it is hosted on chain network1 and targets network2.

The same bug manifests for the client worker in the direction network2 -> network1. In this case, the wrong client is being used again.

May 21 15:09:39.399 DEBUG worker loop{worker=network1->network2:07-tendermint-2}: ibc_relayer::foreign_client: [network2 -> network1:07-tendermint-2] checking misbehaviour at 0-0, number of consensus states 1

Acceptance Criteria


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere adizere added the A: bug Admin: something isn't working label May 21, 2021
@adizere adizere added this to the 05.2021 milestone May 21, 2021
@ancazamfir
Copy link
Collaborator

uhhh! nice catch with the client worker! The issue is that we restore the ForeignClient with the chains flipped.

@ancazamfir
Copy link
Collaborator

start-multi panics if all chains are unreachable

could you give more detail? In my case when I start hermes without any chain running I see the error messages but no panic. Not sure how to reproduce this (trying to review/ test #974)

@adizere
Copy link
Member Author

adizere commented May 25, 2021

start-multi panics if all chains are unreachable

could you give more detail? In my case when I start hermes without any chain running I see the error messages but no panic. Not sure how to reproduce this (trying to review/ test #974)

This is weird. I am actually no longer able to reproduce the panic. If I do hermes start with current master (1f498e3), then the executable hangs, with the last error message being:

May 25 10:47:55.820 ERROR ibc_relayer::supervisor: skipping workers for chain id network5. reason: failed to spawn chain runtime with error: RPC error to endpoint http://localhost:27040/: error trying to connect: tcp connect error: Connection refused (os error 61) (code: 0)

It's important that no gaia is running prior to invoking hermes start.

I think this should still be fixed and the executable should quit (instead of hanging on forever) when no chain is reachable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants