feat: Update light client within threshold #1008

boojamya · 2022-09-28T23:35:37Z

The goal of this PR is to ensure light clients don’t expire on low trafficked paths where MsgUpdateClient isn’t being sent regularly.

The utilizes the --time-threshold flag in the rly start command.

If the client is set to expire within the threshold, the relayer will update the client.

Before this PR, this flag was present but had no functionality.

The assembleAndSendMessages function gets called on every block that has available signals to process. This is where we put a check to see if the light client is close to being expired.

We get the trusting period of the client when we create a new cosmos_chain_processor and cache this info so it is not need to make this call every block.
I believe all of the client state info, including the trusting period, will be updated in the relayers cache every time it handles relevant transaction messages.

Thanks for the assistance @agouin

Closes: #853

Test coverage is covered in this PR.

DavidNix · 2022-10-18T15:22:55Z

Because @boojamya went on vacation, I pushed this over the line.

Unfortunately, I could not use a new method in ibctest, cosmos.PollForMessage which requires ibc-go/v6. But v6 is still in alpha, so I did not feel comfortable upgrading all of the relayer to ibc-go/v6.

Due to an unfortunate design flaw in ibc-go, the same codebase cannot import multiple versions of ibc-go. Global state conflicts cause a panic (related to registering codecs.) Therefore, we have to upgrade all of the relayer to ibc-go/v6 in order to use upstream ibctest.

For the record, I do not think ibctest (or any e2e test) is necessary for this client update behavior. I'd rather see this as a unit test or a feature test. Ideally we want to test that update client txs are submitted and accurate. We can trust ibc-go module will update clients appropriately given an accurate tx. We do not need e2e tests for those assertions. However, teasing apart the components to support a unit or feature test would be high effort, so I elected to keep the e2e test for now.

# Conflicts: # _test/relayer_chain_test.go

DavidNix · 2022-10-18T16:43:01Z

ibctest/README.md

+
+If you want a test to be included in CI automatically, name your test with prefix `TestScenario`. For these tests, 
+it's highly recommended to use a simple network of 1 validator and no full nodes.


This convention may make it easier to automatically include new tests.

.github/workflows/ibctest.yml

jackzampolin · 2022-10-19T14:28:42Z

Makefile

@@ -75,8 +75,8 @@ ibctest-legacy:
 ibctest-multiple:
 	cd ibctest && go test -race -v -run TestRelayerMultiplePathsSingleProcess .

-ibctest-path-filter:
-	cd ibctest && go test -race -v -run TestPathFilter .
+ibctest-scenario: ## Scenario tests are suitable for simple networks of 1 validator and no full nodes. They test specific functionality.


jtieri

Great work @boojamya and @DavidNix! Really excited to finally get this functionality added back into the relayer.

I had a question or two and one nit but otherwise this looks good. One thing I wanted to bring up is that #476 seems to hint at the possibility that an update to the light client may fail if the validator set has changed too much within some trust level set when the client was initialized (e.g. 1/3 of the validator set changes since the last update and the current block). It almost sounds like we need to perform some check on the proposed client update and possibly handle some edge cases.

jtieri · 2022-10-19T17:54:27Z

relayer/strategies.go

+	ProcessorEvents              string = "events"
+	ProcessorLegacy                     = "legacy"


Perhaps outside the scope of this PR but should these be some sort of enum since we are only ever expecting events and legacy as valid inputs?

Completely outside the scope of this PR but something that's been on my mind, now that we've more or less deprecated the "legacy" relayer code will we eventually phase this out completely and remove the code relevant to the old legacy relaying logic? cc @jackzampolin @agouin

jtieri · 2022-10-19T20:10:05Z

relayer/processor/path_processor_internal.go

+			)
+		} else {
+			return nil
+		}


So it looks like we are checking if the cached client state has a correlating consensus time cached and if not we query for the block header at the same height as the cached client state to have access to the consensus time for when this cached client state was updated last.

Afterwards we calculate the time since the last client update by taking time.Since(consensusHeightTime), which is saying time.Now() - the last time the client was updated, and subtract that value from the trusting period to determine if the client is going to expire within our update threshold time. And if this value is within our update threshold time we do some logging and signal that a MsgUpdateClient needs to be sent out.

Because this is a critical code path that users will be putting a lot of trust in, I just want to be sure I have a good mental model of how this works. It seems beneficial to document how the relayer is using the client update threshold time to determine when clients need updating. Without reading the code I would first imagine that the relayer is just sending out the MsgUpdateClient once every update threshold period regardless of when the last update occurred, but in reality we are determining if ANY relayer has updated the client and we are ensuring that the client is updated at most once within this update threshold period in relation to the trusting period.

(sorry for the wall of text)

Without reading the code I would first imagine that the relayer is just sending out the MsgUpdateClient once every update threshold period regardless of when the last update occurred

This was my intuition as well. But we're doing it this different way as you pointed out.

Dan authored this part of the code and it was my understanding he paired with Andrew on it. Therefore, I didn't refactor or question it. I only added tests that hopefully cover this feature.

You ask a great question. Given it's critical path, do you think we wait until Dan is back before merging? (I'm fine either way.)

I think the way Dan implemented this probably makes more sense since you occur less fees by only updating the client when it's within some threshold of expiring vs. spamming the update msgs on a timer. We probably just want to document this somewhere though for users as well as any maintainers that may come in further down the line!

I agree. Would you be ok with a quick followup PR for Dan to add documentation? He will be able to write that better than I.

Sounds good to me! PR approved ✅

When @boojamya and I initially paired on this, I was unaware of the update threshold period flag, so we coded it to update the clients if 2/3 of the trusting period had passed since the last update, without any configuration ability.

The flag duration in combination with the time until client expiration seems a bit confusing, so my initial thought was that we should pick one or the other. But, in today's standup, @jtieri brought up a good point that it is desirable to have both approaches since some parties may want the clients updated on a steady frequency while others may just want to ensure the clients don't expire but avoid spending unnecessary fees and only do it if necessary.

With that taken into account, I think this would be easier to understand as:

Update client if time until client expiration is <= 1/3 of trusting period.

Update client if time since last update is greater than update threshold period flag duration (this could default to 0 duration in which case this check would be disabled)

This would enable automatic updating within trusting period without any configuration, but would also enable more frequent client updates if desired. I think this make the flag easier to understand as an operator because, for example, an update threshold period of 2h would update the clients every 2 hours if there is no other IBC traffic and 2 hours is less than 2/3 of the client trusting period. If the operator accidentally configured the update threshold period to be greater than the trusting period, it has the failsafe that wouldn't let the clients expire since it would still update them if not updated within 2/3 of the trusting period.

relayer/processor/path_processor_internal.go

DavidNix · 2022-10-19T22:01:28Z

@jtieri Given #476, do you think that can be a followup PR or is it critical enough to include as part of this feature?

jtieri · 2022-10-19T23:00:33Z

@jtieri Given #476, do you think that can be a followup PR or is it critical enough to include as part of this feature?

I think a follow up PR probably makes sense. I just kinda skimmed that issue I linked so I don't really fully understand the implications of what they were discussing there either.

relayer/processor/path_processor_internal.go

relayer/strategies.go

cmd/flags.go

Co-authored-by: Andrew Gouin <andrew@gouin.io>

- cosmos/relayer#1008 Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>

- cosmos/relayer#1008 Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com> Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>

boojamya mentioned this pull request Sep 28, 2022

Update light client if close to expiration #861

Closed

boojamya marked this pull request as ready for review October 4, 2022 18:23

boojamya requested review from jackzampolin, jtieri and mark-rushakoff as code owners October 4, 2022 18:23

boojamya self-assigned this Oct 4, 2022

boojamya marked this pull request as draft October 13, 2022 00:25

DavidNix changed the title ~~Update light client within threshold~~ feat: Update light client within threshold Oct 18, 2022

boojamya and others added 16 commits October 18, 2022 09:11

implement client threshold

cb1649d

test WIP

26d05f9

fix _test tests

b01d87f

nits + spelling

dd66b4d

fix update client state

19435fb

improve logs

78cf794

add tests

4915b1b

logging nits

d6f7b15

test clarity

4207ee8

add pollForClientStatus Function (WIP)

db618ae

Update tests to query for client status

7dcab0f

go mod tidy

4aea0af

Remove extraneous block database lines

5257b4a

Passing test for msg client update

5b29bb9

Test the negative case

ab3b144

go mod tidy

2effc24

DavidNix force-pushed the dan/update_client2 branch from ed06a66 to 2effc24 Compare October 18, 2022 15:13

DavidNix marked this pull request as ready for review October 18, 2022 15:13

DavidNix added 3 commits October 18, 2022 10:37

Create scenario tests

c214e48

Update ibctest github workflow

fd5bba0

Merge branch 'main' into dan/update_client2

4190e24

# Conflicts: # _test/relayer_chain_test.go

DavidNix reviewed Oct 18, 2022

View reviewed changes

DavidNix added 2 commits October 18, 2022 11:50

Move multiple path test back into its own ci job

80e44b4

Try self hosted for speed

08781f8

DavidNix reviewed Oct 18, 2022

View reviewed changes

.github/workflows/ibctest.yml Outdated Show resolved Hide resolved

DavidNix requested a review from agouin October 18, 2022 18:26

See how GH runners work for tests with simple networks

9c49118

jackzampolin reviewed Oct 19, 2022

View reviewed changes

jtieri reviewed Oct 19, 2022

View reviewed changes

Follow log format convention

bcb0027

jtieri approved these changes Oct 20, 2022

View reviewed changes

agouin reviewed Oct 20, 2022

View reviewed changes

relayer/processor/path_processor_internal.go Outdated Show resolved Hide resolved

Use variable for client update threshold flag default

d30f29f

agouin reviewed Oct 20, 2022

View reviewed changes

relayer/strategies.go Outdated Show resolved Hide resolved

agouin reviewed Oct 20, 2022

View reviewed changes

cmd/flags.go Outdated Show resolved Hide resolved

DavidNix and others added 4 commits October 20, 2022 13:18

Update cmd/flags.go

1d2920c

Co-authored-by: Andrew Gouin <andrew@gouin.io>

Update relayer/processor/path_processor_internal.go

8db3278

Co-authored-by: Andrew Gouin <andrew@gouin.io>

Update relayer/strategies.go

f7ad5d9

Co-authored-by: Andrew Gouin <andrew@gouin.io>

Fix update client tests for new logic

9018ce1

agouin approved these changes Oct 20, 2022

View reviewed changes

DavidNix merged commit b06fceb into main Oct 21, 2022

DavidNix deleted the dan/update_client2 branch October 21, 2022 13:47

boojamya mentioned this pull request Oct 25, 2022

Add documentation around the auto update client feature #1024

Closed

pratikbin added a commit to pratikbin/hermes that referenced this pull request Nov 4, 2022

docs: support for Cl_Refresh in gorelayer

45062c3

- cosmos/relayer#1008 Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>

pratikbin mentioned this pull request Nov 4, 2022

docs: support for Cl_Refresh in gorelayer informalsystems/hermes#2814

Merged

7 tasks

agouin pushed a commit that referenced this pull request Nov 15, 2022

feat: Update light client within threshold (#1008)

88984ae

jonathanpberger mentioned this pull request Mar 17, 2023

Add test to see if relayers keep clients alive on low traffic paths strangelove-ventures/interchaintest#228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Update light client within threshold #1008

feat: Update light client within threshold #1008

boojamya commented Sep 28, 2022 •

edited

Loading

DavidNix commented Oct 18, 2022

DavidNix Oct 18, 2022

jackzampolin Oct 19, 2022

jtieri left a comment

jtieri Oct 19, 2022

jtieri Oct 19, 2022

DavidNix Oct 19, 2022

jtieri Oct 19, 2022

DavidNix Oct 20, 2022

jtieri Oct 20, 2022

agouin Oct 20, 2022

DavidNix commented Oct 19, 2022

jtieri commented Oct 19, 2022


		If you want a test to be included in CI automatically, name your test with prefix `TestScenario`. For these tests,
		it's highly recommended to use a simple network of 1 validator and no full nodes.

feat: Update light client within threshold #1008

feat: Update light client within threshold #1008

Conversation

boojamya commented Sep 28, 2022 • edited Loading

DavidNix commented Oct 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtieri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidNix commented Oct 19, 2022

jtieri commented Oct 19, 2022

boojamya commented Sep 28, 2022 •

edited

Loading