Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC connection query at latest blockchain height may fail #1970

Closed
1 of 5 tasks
ancazamfir opened this issue Mar 16, 2022 · 2 comments · Fixed by #2021
Closed
1 of 5 tasks

gRPC connection query at latest blockchain height may fail #1970

ancazamfir opened this issue Mar 16, 2022 · 2 comments · Fixed by #2021
Assignees
Labels
A: bug Admin: something isn't working I: logic Internal: related to the relaying logic
Milestone

Comments

@ancazamfir
Copy link
Collaborator

Summary of Bug

@mircea-c reported this problem:
Hi team! I'm running into a strange issue. Trying to run Hermes between gravity bridge and cosmoshub and Hermes v0.12.0 shows:

2022-03-03T19:40:34.805800Z ERROR ThreadId(01) skipped workers for connection connection-20 on chain gravity-bridge-3, reason: relayer error: connection not found: connection-464
2022-03-03T19:40:34.805834Z  INFO ThreadId(01) channel channel-17 on chain gravity-bridge-3 is: OPEN; state on dest. chain (cosmoshub-4) is: UNINITIALIZED

When I run it manually I get the same issue, but if I run it again immediately after then it seems to work:

mircea@sysmon:/workbench/cephalopodequipment/config/hermes$ /workbench/current/hermes/v0.12.0/hermes -c manual.toml query channel ends gravity-bridge-3 transfer channel-17
2022-03-03T19:40:15.461529Z DEBUG ThreadId(01) Options: QueryChannelEndsCmd { chain_id: ChainId { id: "gravity-bridge-3", version: 3 }, port_id: PortId("transfer"), channel_id: ChannelId("channel-17"), height: None, verbose: false }
Error: connection not found: connection-464
mircea@sysmon:/workbench/cephalopodequipment/config/hermes$ /workbench/current/hermes/v0.12.0/hermes -c manual.toml query channel ends gravity-bridge-3 transfer channel-17
2022-03-03T19:40:17.417384Z DEBUG ThreadId(01) Options: QueryChannelEndsCmd { chain_id: ChainId { id: "gravity-bridge-3", version: 3 }, port_id: PortId("transfer"), channel_id: ChannelId("channel-17"), height: None, verbose: false }
Success: ChannelEndsSummary {
    chain_id: ChainId {
        id: "gravity-bridge-3",
        version: 3,
    },
    client_id: ClientId(
        "07-tendermint-21",
    ),
    connection_id: ConnectionId(
        "connection-20",
    ),
    channel_id: ChannelId(
        "channel-17",
    ),
    port_id: PortId(
        "transfer",
    ),
    counterparty_chain_id: ChainId {
        id: "cosmoshub-4",
        version: 4,
    },
    counterparty_client_id: ClientId(
        "07-tendermint-582",
    ),
    counterparty_connection_id: ConnectionId(
        "connection-464",
    ),
    counterparty_channel_id: ChannelId(
        "channel-281",
    ),
    counterparty_port_id: PortId(
        "transfer",
    ),
}

Version

reported for v0.12.0. Also seen in later versions and master.

Steps to Reproduce

With the following ./query_conn.sh :

#!/bin/bash

addr=$1
chain=$2
conn=$3
for i in {1..100}
do
    read -r height < <(
        curl -s "$addr"/status | jq -r '.result.sync_info.latest_block_height')
    read -r abci_height < <(
        curl -s "$addr"/abci_info | jq -r '.result.response.last_block_height')
    echo "Query Chain ID: ${chain}, Conn ID: ${conn}, Height: ${height}, ABCI_Height: ${abci_height}"
    ./target/debug/hermes query connection end $chain $conn -H $height
done

run:
<rpc-addr> <chain-id> <connection-id>

Acceptance Criteria

Queries should not fail at latest height


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ancazamfir
Copy link
Collaborator Author

ancazamfir commented Mar 16, 2022

The root cause of the problem is that we get the latest chain height from RPC /status/ endpoint and then try to query the application with that height. When the query is done after the block is created but before it is executed and app state updated the query fails with no connection found.

Here is a sample output from the script:

Query Chain ID: gravity-bridge-3, Conn ID: connection-20, Height: 1093726, ABCI_Height: 1093726
Success: ConnectionEnd {
    state: Open,
    client_id: ClientId(
        "07-tendermint-21",
    ),
    counterparty: Counterparty {
        client_id: ClientId(
            "07-tendermint-582",
        ),
        connection_id: Some(
            ConnectionId(
                "connection-464",
            ),
        ),
        prefix: ibc,
    },
    versions: [
        Version {
            identifier: "1",
            features: [
                "ORDER_ORDERED",
                "ORDER_UNORDERED",
            ],
        },
    ],
    delay_period: 0ns,
}
Query Chain ID: gravity-bridge-3, Conn ID: connection-20, Height: 1093727, ABCI_Height: 1093726
Error: connection not found: connection-20
Query Chain ID: gravity-bridge-3, Conn ID: connection-20, Height: 1093727, ABCI_Height: 1093726
Error: connection not found: connection-20
Query Chain ID: gravity-bridge-3, Conn ID: connection-20, Height: 1093727, ABCI_Height: 1093727
Success: ConnectionEnd {
    state: Open,
    client_id: ClientId(
        "07-tendermint-21",
    ),
    counterparty: Counterparty {
        client_id: ClientId(
            "07-tendermint-582",
        ),
        connection_id: Some(
            ConnectionId(
                "connection-464",
            ),
        ),
        prefix: ibc,
    },
    versions: [
        Version {
            identifier: "1",
            features: [
                "ORDER_ORDERED",
                "ORDER_UNORDERED",
            ],
        },
    ],

If we change the cosmos query_latest_height() to get the height from RPC /abci_info (rpc_client.abci_info()) the issue is not seen. See branch anca/app_latest_height.

@ancazamfir
Copy link
Collaborator Author

There might be other ways to repro this but I was able to show this locally by starting gaia chains with longer block times and also changed tendermint to simulate the delay caused by multiple transactions (longer applyBlock times):

diff --git a/state/execution.go b/state/execution.go
index 2cc3db6e7..fae993062 100644
--- a/state/execution.go
+++ b/state/execution.go
@@ -323,6 +323,7 @@ func execBlockOnProxyApp(
                }
        }

+       time.Sleep(time.Millisecond * 500)
        // End block.
        abciResponses.EndBlock, err = proxyAppConn.EndBlockSync(abci.RequestEndBlock{Height: block.Height})
        if err != nil {

(use replace github.com/tendermint/tendermint => ... in gaia's go.mod).

@adizere adizere added this to the v0.14.0 milestone Mar 16, 2022
@adizere adizere added A: bug Admin: something isn't working I: logic Internal: related to the relaying logic P-medium labels Mar 16, 2022
@adizere adizere modified the milestones: v0.14.0, v0.15.0 Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working I: logic Internal: related to the relaying logic
Projects
No open projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

2 participants