Connection worker #1019

cezarad · 2021-05-31T14:27:56Z

worker for connection

Closes: #821

Description

added a worker for connection
added query_connections to the chain handle and runtime
allow entering the destination connection id in hermes tx raw conn-try... and make dst_connection_id and option (used to be ConnectionId::default() when not specified but this is a valid connection id.
added tests to e2e

To setup for the tests:

change strategy in config to all (relay from all events, including connection handshake ones)
execute start the chains using ./scripts/dev-env ~/.hermes/config.toml ibc-0 ibc-1 and then
create clients on ibc-0 and ibc-1 with hermes create client ibc-0 ibc-1 and hermes create client ibc-1 ibc-0

case A:

start the relayer hermes start
send hermes tx raw conn-init ibc-0 ibc-1 07-tendermint-0 07-tendermint-1

Case B:

send hermes tx raw conn-init ibc-0 ibc-1 07-tendermint-0 07-tendermint-1
start the relayer hermes start

Case C:

send hermes tx raw conn-init ibc-0 ibc-1 07-tendermint-0 07-tendermint-1
send hermes tx raw conn-try ibc-1 ibc-0 07-tendermint-1 07-tendermint-0 -s connection-x , with connection-x being the ID of the connection returned in step 1 above.
start the relayer hermes start

Case D:

send hermes tx raw conn-init ibc-0 ibc-1 07-tendermint-0 07-tendermint-1
sendhermes tx raw conn-try ibc-1 ibc-0 07-tendermint-1 07-tendermint-0 -s connection-x,
send hermes tx raw conn-ack ibc-0 ibc-1 07-tendermint-0 07-tendermint-1 -d connection-x -s connection-y , with connection-x being the ID of the connection returned in step 1 and connection-y the ID of the connection returned in step 2 above
start the relayer hermes start

After the last step, wait a couple of seconds for hermes to finish the handshake and verify the connection state on both chains using hermes query connection end ibc-0 connection-x hermes query connection end ibc-0 transfer connection-y. They should both be in open state

TODO:

double-check check the e2e tests
guide update (done in the release PR Release Hermes v0.5.0 #1098)

For contributor use:

Updated the Unreleased section of CHANGELOG.md with the issue.
If applicable: Unit tests written, added test to CI.
Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
Updated relevant documentation (docs/) and code comments.
Re-reviewed Files changed in the Github PR explorer.

ancazamfir

Looks great @cezarad ! Thanks!!!

relayer/src/supervisor.rs

relayer/src/chain/counterparty.rs

Co-authored-by: Romain Ruetschi <romain@informal.systems>

e2e/e2e/connection.py

soareschen · 2021-06-22T09:12:59Z

relayer/src/worker/connection.rs

+                    // Resume handshake on next iteration.
+                    resume_handshake = true;
+                } else {
+                    resume_handshake = false;


From what I can see, in the smooth case that no unexpected errors happen, then once resume_handshake is set to false, then it is not going to be set to true again ever. So does this means that the branch for WorkerCmd::NewBlock is going to always be skipped? Otherwise, are we using some subtle error variants to trigger the Err branch to set resume_handshake to true again?

So does this means that the branch for WorkerCmd::NewBlock is going to always be skipped?

It's going to be executed if step_event() or step_state() have not advanced the handshake after a number of tries.

soareschen · 2021-06-22T09:13:39Z

relayer/src/worker/connection.rs

+                                    a_chain.clone(),
+                                    b_chain.clone(),
+                                    event.clone(),
+                                )?;


Doing a ? here short circuits the entire function and returns from the loop prematurely.

soareschen · 2021-06-22T09:13:46Z

relayer/src/worker/connection.rs

+                                b_chain.clone(),
+                                self.connection.clone(),
+                                height,
+                            )?;


Doing a ? here short circuits the entire function and returns from the loop prematurely.

soareschen · 2021-06-22T09:19:55Z

relayer/src/worker/connection.rs

+        // Set on start or when event based handshake fails.
+        let mut resume_handshake = true;
+
+        loop {


What happens after the full connection handshake has been established? Is the loop supposed to continue forever?

I always find complicated control flow inside loops difficult to follow. I understand you are just following the convention before. It would be great if we can use functions to structure the control flows instead. That would make it much easier to understand the control flow. For example something like follows:

pub(crate) fn run(self) -> Result<(), BoxError> { enum Action { Continue, End, } fn do_run(worker: &ConnectionWorker) -> Result<Action, BoxError> { ... } loop { match do_run(&self) { Ok(Action::End) => return Ok(()), Err(e) => { error!("error: {}. retrying"); // or return Err if unrecoverable } _ => {} } } }

What happens after the full connection handshake has been established? Is the loop supposed to continue forever?

no, see #1115

I always find complicated control flow inside loops difficult to follow. I understand you are just following the convention before. It would be great if we can use functions to structure the control flows instead.
...

Please open an issue for this and we will address in a separate PR.

soareschen · 2021-06-22T09:21:46Z

relayer/src/worker/connection.rs

+        loop {
+            thread::sleep(Duration::from_millis(200));
+
+            if let Ok(cmd) = self.cmd_rx.try_recv() {


If try_recv() returns TryRecvError::Disconnected, the sender side of the channel has been dropped and the loop is going to keep failing forever.

yes this is true. All workers have the same pattern. The supervisor should not disconnect unless it has crashed and in this case not much can be done. But we could indeed check this error and exit. I think we discussed with @romac to have a way for the supervisor to gracefully close the worker, not sure this is done. I think the worker has a way to gracefully exit so the supervisor can clean its worker info.

soareschen · 2021-06-22T09:32:37Z

relayer/src/worker/connection.rs

+    }
+
+    /// Run the event loop for events associated with a [`Connection`].
+    pub(crate) fn run(self) -> Result<(), BoxError> {


Can you add some documentation to explain a bit what is the expected execution flow of the function overall? My understanding is as follows:

Wait for incoming block events for the connection state on the source chain a_chain to be updated.

Match the new state machine and update to the next connection state.

Only cares about the direction from a_chain to b_chain, i.e. (State::Init, State::TryOpen) is not handled.

If the connection state has reached (State::Open, State::Open), loop forever (should have been return right?)

If any error happens, retry until the connection succeed?

The retry is in two stages: retry a number of times for each block event update, and retry again in the next block update.

Comments have been captured in #1118 and will be addressed in a follow-up PR

* under construction * under construction * identified connection end * compiles up to supervisor * Update supervisor.rs * compiles * Update counterparty.rs * without fmt and without clippy * fmt and clippy * option for counterparty in conn open try * Update connection.rs * Update connection.rs * Update supervisor.rs * added some telemetry to connection * update supervisor with connection * Update supervisor.rs * on going * conenction updates * Update supervisor.rs * e2e * Update connection.py * Update connection.py * Update connection.py * update e2e * merge * Review comments * Add user2 to CI * Added new files for gaia 4.2.0 to fix CI * Added config path info to e2e script (informalsystems#1019) * Fix output of hermes query connections * Move binding down to declaration site * Update relayer/src/chain/counterparty.rs Co-authored-by: Romain Ruetschi <romain@informal.systems> * Added reporting for the underlying error in counterparty_state() * Better import of IdentifiedConnectionEnd in cosmos.rs * Masked tonic::Code::NotFound result for query_client_connections. * Remove unwraps * Nit: fix comment & log output * Cleanup * Update changelog * Enable handshake completion on CI Co-authored-by: Anca Zamfir <zamfiranca@gmail.com> Co-authored-by: Andy Nogueira <me@andynogueira.dev> Co-authored-by: Romain Ruetschi <romain@informal.systems> Co-authored-by: Adi Seredinschi <adi@informal.systems>

cezarad added 11 commits May 29, 2021 11:06

under construction

1edc4be

under construction

03519f8

identified connection end

52cf85a

compiles up to supervisor

54589e9

Update supervisor.rs

95e8ddf

compiles

c5a6e82

Update counterparty.rs

63f5ceb

without fmt and without clippy

9625858

fmt and clippy

a372d30

option for counterparty in conn open try

8b2e74f

Update connection.rs

c43fa63

romac changed the title ~~Connection821~~ Connection worker Jun 2, 2021

cezarad added 6 commits June 2, 2021 19:03

Update connection.rs

fed963f

Merge branch 'master' into connection821

cdf10de

Update supervisor.rs

451825b

added some telemetry to connection

b454071

update supervisor with connection

fd13690

Update supervisor.rs

553e60c

cezarad requested a review from ancazamfir June 7, 2021 09:17

cezarad added 4 commits June 9, 2021 17:11

on going

67eeaa8

conenction updates

192c9e7

Update supervisor.rs

af2ed55

e2e

b2dfa66

cezarad marked this pull request as ready for review June 9, 2021 18:42

cezarad requested review from adizere and romac as code owners June 9, 2021 18:42

cezarad added 4 commits June 10, 2021 13:13

Update connection.py

966a1bd

Update connection.py

00f9a34

Update connection.py

4b89134

update e2e

465c0cf

Merge branch 'master' into connection821

0c48215

ancazamfir approved these changes Jun 17, 2021

View reviewed changes

romac added the needs guide update label Jun 18, 2021

romac reviewed Jun 18, 2021

View reviewed changes

relayer/src/supervisor.rs Outdated Show resolved Hide resolved

Move binding down to declaration site

127b530

romac reviewed Jun 18, 2021

View reviewed changes

relayer/src/chain/counterparty.rs Outdated Show resolved Hide resolved

Merge branch 'master' into connection821

ccbcd75

romac mentioned this pull request Jun 21, 2021

Release Hermes v0.5.0 #1098

Merged

8 tasks

adizere and others added 7 commits June 21, 2021 12:41

Update relayer/src/chain/counterparty.rs

e6d4e2a

Co-authored-by: Romain Ruetschi <romain@informal.systems>

Added reporting for the underlying error in counterparty_state()

7fc859b

Better import of IdentifiedConnectionEnd in cosmos.rs

97b4b7b

Masked tonic::Code::NotFound result for query_client_connections.

4115774

Remove unwraps

6ed3a65

Nit: fix comment & log output

fc6e38b

Merge branch 'master' into connection821

27d27c6

This was referenced Jun 22, 2021

Connection & Channel workers never quit #1115

Closed

User-facing Hermes CLIs to initiate channel & connection handshakes #1116

Closed

adizere requested a review from soareschen June 22, 2021 08:06

adizere reviewed Jun 22, 2021

View reviewed changes

e2e/e2e/connection.py Show resolved Hide resolved

soareschen previously requested changes Jun 22, 2021

View reviewed changes

adizere and others added 3 commits June 22, 2021 11:50

Cleanup

5c711ae

Update changelog

bd99f0a

Enable handshake completion on CI

b84772e

romac mentioned this pull request Jun 22, 2021

Follow-up: Connection worker #1118

Closed

11 tasks

romac approved these changes Jun 22, 2021

View reviewed changes

romac merged commit 9d3ffdb into master Jun 22, 2021

romac deleted the connection821 branch June 22, 2021 14:49

adizere removed the needs guide update label Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection worker #1019

Connection worker #1019

cezarad commented May 31, 2021 •

edited by romac

Loading

ancazamfir left a comment

soareschen Jun 22, 2021

ancazamfir Jun 22, 2021

soareschen Jun 22, 2021

ancazamfir Jun 22, 2021

soareschen Jun 22, 2021

ancazamfir Jun 22, 2021

soareschen Jun 22, 2021

soareschen Jun 22, 2021

ancazamfir Jun 22, 2021

ancazamfir Jun 22, 2021

soareschen Jun 22, 2021

ancazamfir Jun 22, 2021

soareschen Jun 22, 2021

Connection worker #1019

Connection worker #1019

Conversation

cezarad commented May 31, 2021 • edited by romac Loading

Description

ancazamfir left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cezarad commented May 31, 2021 •

edited by romac

Loading