Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update light client within threshold #1008

Merged
merged 28 commits into from
Oct 21, 2022
Merged

feat: Update light client within threshold #1008

merged 28 commits into from
Oct 21, 2022

Conversation

boojamya
Copy link
Contributor

@boojamya boojamya commented Sep 28, 2022

The goal of this PR is to ensure light clients don’t expire on low trafficked paths where MsgUpdateClient isn’t being sent regularly.

The utilizes the --time-threshold flag in the rly start command.

If the client is set to expire within the threshold, the relayer will update the client.

Before this PR, this flag was present but had no functionality.

The assembleAndSendMessages function gets called on every block that has available signals to process. This is where we put a check to see if the light client is close to being expired.

We get the trusting period of the client when we create a new cosmos_chain_processor and cache this info so it is not need to make this call every block.
I believe all of the client state info, including the trusting period, will be updated in the relayers cache every time it handles relevant transaction messages.

Thanks for the assistance @agouin

Closes: #853


Test coverage is covered in this PR.

@boojamya boojamya marked this pull request as ready for review October 4, 2022 18:23
@boojamya boojamya self-assigned this Oct 4, 2022
@boojamya boojamya marked this pull request as draft October 13, 2022 00:25
@DavidNix DavidNix changed the title Update light client within threshold feat: Update light client within threshold Oct 18, 2022
@DavidNix DavidNix marked this pull request as ready for review October 18, 2022 15:13
@DavidNix
Copy link
Contributor

Because @boojamya went on vacation, I pushed this over the line.

Unfortunately, I could not use a new method in ibctest, cosmos.PollForMessage which requires ibc-go/v6. But v6 is still in alpha, so I did not feel comfortable upgrading all of the relayer to ibc-go/v6.

Due to an unfortunate design flaw in ibc-go, the same codebase cannot import multiple versions of ibc-go. Global state conflicts cause a panic (related to registering codecs.) Therefore, we have to upgrade all of the relayer to ibc-go/v6 in order to use upstream ibctest.

For the record, I do not think ibctest (or any e2e test) is necessary for this client update behavior. I'd rather see this as a unit test or a feature test. Ideally we want to test that update client txs are submitted and accurate. We can trust ibc-go module will update clients appropriately given an accurate tx. We do not need e2e tests for those assertions. However, teasing apart the components to support a unit or feature test would be high effort, so I elected to keep the e2e test for now.


If you want a test to be included in CI automatically, name your test with prefix `TestScenario`. For these tests,
it's highly recommended to use a simple network of 1 validator and no full nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This convention may make it easier to automatically include new tests.

@DavidNix DavidNix requested a review from agouin October 18, 2022 18:26
@@ -75,8 +75,8 @@ ibctest-legacy:
ibctest-multiple:
cd ibctest && go test -race -v -run TestRelayerMultiplePathsSingleProcess .

ibctest-path-filter:
cd ibctest && go test -race -v -run TestPathFilter .
ibctest-scenario: ## Scenario tests are suitable for simple networks of 1 validator and no full nodes. They test specific functionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this

Copy link
Member

@jtieri jtieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @boojamya and @DavidNix! Really excited to finally get this functionality added back into the relayer.

I had a question or two and one nit but otherwise this looks good. One thing I wanted to bring up is that #476 seems to hint at the possibility that an update to the light client may fail if the validator set has changed too much within some trust level set when the client was initialized (e.g. 1/3 of the validator set changes since the last update and the current block). It almost sounds like we need to perform some check on the proposed client update and possibly handle some edge cases.

Comment on lines +25 to +26
ProcessorEvents string = "events"
ProcessorLegacy = "legacy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps outside the scope of this PR but should these be some sort of enum since we are only ever expecting events and legacy as valid inputs?

Completely outside the scope of this PR but something that's been on my mind, now that we've more or less deprecated the "legacy" relayer code will we eventually phase this out completely and remove the code relevant to the old legacy relaying logic? cc @jackzampolin @agouin

)
} else {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we are checking if the cached client state has a correlating consensus time cached and if not we query for the block header at the same height as the cached client state to have access to the consensus time for when this cached client state was updated last.

Afterwards we calculate the time since the last client update by taking time.Since(consensusHeightTime), which is saying time.Now() - the last time the client was updated, and subtract that value from the trusting period to determine if the client is going to expire within our update threshold time. And if this value is within our update threshold time we do some logging and signal that a MsgUpdateClient needs to be sent out.

Because this is a critical code path that users will be putting a lot of trust in, I just want to be sure I have a good mental model of how this works. It seems beneficial to document how the relayer is using the client update threshold time to determine when clients need updating. Without reading the code I would first imagine that the relayer is just sending out the MsgUpdateClient once every update threshold period regardless of when the last update occurred, but in reality we are determining if ANY relayer has updated the client and we are ensuring that the client is updated at most once within this update threshold period in relation to the trusting period.

(sorry for the wall of text)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without reading the code I would first imagine that the relayer is just sending out the MsgUpdateClient once every update threshold period regardless of when the last update occurred

This was my intuition as well. But we're doing it this different way as you pointed out.

Dan authored this part of the code and it was my understanding he paired with Andrew on it. Therefore, I didn't refactor or question it. I only added tests that hopefully cover this feature.

You ask a great question. Given it's critical path, do you think we wait until Dan is back before merging? (I'm fine either way.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way Dan implemented this probably makes more sense since you occur less fees by only updating the client when it's within some threshold of expiring vs. spamming the update msgs on a timer. We probably just want to document this somewhere though for users as well as any maintainers that may come in further down the line!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Would you be ok with a quick followup PR for Dan to add documentation? He will be able to write that better than I.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! PR approved ✅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When @boojamya and I initially paired on this, I was unaware of the update threshold period flag, so we coded it to update the clients if 2/3 of the trusting period had passed since the last update, without any configuration ability.

The flag duration in combination with the time until client expiration seems a bit confusing, so my initial thought was that we should pick one or the other. But, in today's standup, @jtieri brought up a good point that it is desirable to have both approaches since some parties may want the clients updated on a steady frequency while others may just want to ensure the clients don't expire but avoid spending unnecessary fees and only do it if necessary.

With that taken into account, I think this would be easier to understand as:

  • Update client if time until client expiration is <= 1/3 of trusting period.
  • Update client if time since last update is greater than update threshold period flag duration (this could default to 0 duration in which case this check would be disabled)

This would enable automatic updating within trusting period without any configuration, but would also enable more frequent client updates if desired. I think this make the flag easier to understand as an operator because, for example, an update threshold period of 2h would update the clients every 2 hours if there is no other IBC traffic and 2 hours is less than 2/3 of the client trusting period. If the operator accidentally configured the update threshold period to be greater than the trusting period, it has the failsafe that wouldn't let the clients expire since it would still update them if not updated within 2/3 of the trusting period.

relayer/processor/path_processor_internal.go Outdated Show resolved Hide resolved
@DavidNix
Copy link
Contributor

@jtieri Given #476, do you think that can be a followup PR or is it critical enough to include as part of this feature?

@jtieri
Copy link
Member

jtieri commented Oct 19, 2022

@jtieri Given #476, do you think that can be a followup PR or is it critical enough to include as part of this feature?

I think a follow up PR probably makes sense. I just kinda skimmed that issue I linked so I don't really fully understand the implications of what they were discussing there either.

relayer/strategies.go Outdated Show resolved Hide resolved
cmd/flags.go Outdated Show resolved Hide resolved
DavidNix and others added 4 commits October 20, 2022 13:18
Co-authored-by: Andrew Gouin <andrew@gouin.io>
Co-authored-by: Andrew Gouin <andrew@gouin.io>
Co-authored-by: Andrew Gouin <andrew@gouin.io>
@DavidNix DavidNix merged commit b06fceb into main Oct 21, 2022
@DavidNix DavidNix deleted the dan/update_client2 branch October 21, 2022 13:47
pratikbin added a commit to pratikbin/hermes that referenced this pull request Nov 4, 2022
- cosmos/relayer#1008

Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>
seanchen1991 pushed a commit to informalsystems/hermes that referenced this pull request Nov 4, 2022
- cosmos/relayer#1008

Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>

Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to ensure clients stay alive
5 participants