Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test packet relaying on ordered channels #1835

Closed
5 tasks done
adizere opened this issue Feb 2, 2022 · 5 comments · Fixed by #1887
Closed
5 tasks done

Test packet relaying on ordered channels #1835

adizere opened this issue Feb 2, 2022 · 5 comments · Fixed by #1887
Assignees
Labels
A: question Admin: further information is requested I: dependencies Internal: related to dependencies I: infrastructure Internal: related to Infrastructure (testing, deployment, etc)
Milestone

Comments

@adizere
Copy link
Member

adizere commented Feb 2, 2022

Summary

In preparation for releasing ICA on the hub, we would like to do some testing that channel ordering works as expected in Hermes.

Proposal

  • set up ordered channels between 2 chains
  • provoke out-of-order packet relaying by either manually instrumenting the relayer code to drop the first event out of a batch, or by evicting transactions from thee node's mempool before they commit to the blockchain

Stretch goal:

  • benchmark the scalability properties of Hermes by deploying it on a server similar to real-world operator use, and assess how many channels can Hermes relay gracefully on such a machine
    • the bottleneck will likely be the chain software, not Hermes: confirm that by isolating the networks on machines separate from the one where Hermes is running.

Acceptance Criteria

  • capture a packet ordering corner-case in an automated test using the Rust framework
    • ensure that Hermes enforces ordering by looking at packet sequence numbers on an ordered channel

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere adizere added A: question Admin: further information is requested I: dependencies Internal: related to dependencies I: infrastructure Internal: related to Infrastructure (testing, deployment, etc) labels Feb 2, 2022
@adizere adizere added this to the v0.11.1 milestone Feb 2, 2022
@adizere adizere added the P-high label Feb 2, 2022
@romac romac modified the milestones: v0.11.1, v0.12.0 Feb 2, 2022
@romac romac added P-critical and removed P-high labels Feb 8, 2022
@mzabaluev
Copy link
Contributor

Testing this in a real setup appears to require a modified Gaia with the transfer module hacked to use ordered channels.

I guess we should go for a test case simulating out-or-order packet relaying in the test framework.

@soareschen
Copy link
Contributor

I have setup a Gaia branch that support ordered transfer channel at https://github.com/informalsystems/gaia/tree/v6.0.1-ordered.

Currently the Nix build at Cosmos.nix is not working due to the replace directive being ignored by the Nix Go builder: informalsystems/cosmos.nix#63

You can still build the Go project manually, and expose the custom gaiad in $PATH. Then replace the bootstrap channel code from Order::Unordered to Order::Ordered.

After this you should see the some tests like test_ibc_transfer still passing, but tests like test_clear_packet failing. In particular, test_clear_packet would keep retrying because worker_stubborn_strategy is being used here.

@adizere
Copy link
Member Author

adizere commented Feb 17, 2022

Great progress!

After this you should see the some tests like test_ibc_transfer still passing, but tests like test_clear_packet failing. In particular, test_clear_packet would keep retrying because worker_stubborn_strategy is being used here.

Is test_clear_packet failing because (as we expected) Hermes submits out-of-order packets on the ordered channel? If so, then I would consider the first part of this issue done:

  • find a way to reliably test ordered channels with Hermes
    • does Hermes relay correctly on ordered channels? Answer seems to be no.

One proposal for next steps that would be sensible is:

  • make Hermes aware of ordered channel errors, specifically:
    • Hermes relaying logic should be aware of channel ordering, namely, it should be aware when a packet relaying fails due to out-of-order submission
    • this can probably be achieved by having a specific error-case matching as we do for account sequence mismatch errors
  • make Hermes avoid retrying upon (out-of-order packet submission) failure occurs,
    • in other words, Hermes should not retry stubbornly
  • instead of stubborn retry, upon encountering the failure, the relayer packet worker should:
    • (a) trigger packet clearing immediately if clear_interval = 0 i.e., if periodic packet clearing is turned off, or
    • (b) drop the packet events and simply wait if clear_interval > 0, until periodic packet clearing kicks in
  • add a test that asserts that Hermes can correctly recover from out-of-order packet submission; this test would be similar to test_clear_packet except both packets that are sent should be relayed.

Any thoughts?

@adizere
Copy link
Member Author

adizere commented Feb 17, 2022

Notes on testing that relaying packets is done in order with @seanchen1991.

  • git clone git@github.com:informalsystems/gaia.git
  • git checkout v6.0.1-ordered
  • make install to install the patched gaiad binary
    • $ gaiad version
      v6.0.1-ordered-34e6f66a3d729a36f51c9a7f5ffd211bcd1cd90c

  • switch to ibc-rs@master
    • using the patch below, all the assertions should pass in the following test
    • cargo test -- test_clear_packet
$ git diff
diff --git a/tools/integration-test/src/bootstrap/binary/channel.rs b/tools/integration-test/src/bootstrap/binary/channel.rs
index a07f9592..79886c79 100644
--- a/tools/integration-test/src/bootstrap/binary/channel.rs
+++ b/tools/integration-test/src/bootstrap/binary/channel.rs
@@ -82,7 +82,7 @@ pub fn bootstrap_channel_with_connection<ChainA: ChainHandle, ChainB: ChainHandl

     let channel = Channel::new(
         connection.connection.clone(),
-        Order::Unordered,
+        Order::Ordered,
         port_a.0.clone(),
         port_b.0.clone(),
         None,
diff --git a/tools/integration-test/src/tests/clear_packet.rs b/tools/integration-test/src/tests/clear_packet.rs
index ec9d0331..a9ff109e 100644
--- a/tools/integration-test/src/tests/clear_packet.rs
+++ b/tools/integration-test/src/tests/clear_packet.rs
@@ -14,7 +14,7 @@ impl TestOverrides for ClearPacketTest {
     fn modify_relayer_config(&self, config: &mut Config) {
         // Disabling clear_on_start should make the relayer not
         // relay any packet it missed before starting.
-        config.mode.packets.clear_on_start = false;
+        config.mode.packets.clear_on_start = true;
         config.mode.packets.clear_interval = 0;
     }

@@ -110,7 +110,7 @@ impl BinaryChannelTest for ClearPacketTest {
         // Wallet on chain B should only receive the second IBC transfer
         chains.node_b.chain_driver().assert_eventual_wallet_amount(
             &wallet_b.as_ref(),
-            amount2,
+            amount2 + amount1,
             &denom_b.as_ref(),
         )?;

@soareschen
Copy link
Contributor

I have created #1887 to test the described steps.

I leave test_clear_packet to fail intentionally, because we want to test the behavior of the relayer when it encounter ordered channel errors. I modified the retry strategy to abort after 5 tries, and that allows the relayer to at least fail gracefully rather than retrying indefinitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: question Admin: further information is requested I: dependencies Internal: related to dependencies I: infrastructure Internal: related to Infrastructure (testing, deployment, etc)
Projects
No open projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

4 participants