-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update light client within threshold #1008
Conversation
ed06a66
to
2effc24
Compare
Because @boojamya went on vacation, I pushed this over the line. Unfortunately, I could not use a new method in ibctest, Due to an unfortunate design flaw in ibc-go, the same codebase cannot import multiple versions of ibc-go. Global state conflicts cause a panic (related to registering codecs.) Therefore, we have to upgrade all of the relayer to ibc-go/v6 in order to use upstream ibctest. For the record, I do not think ibctest (or any e2e test) is necessary for this client update behavior. I'd rather see this as a unit test or a feature test. Ideally we want to test that update client txs are submitted and accurate. We can trust ibc-go module will update clients appropriately given an accurate tx. We do not need e2e tests for those assertions. However, teasing apart the components to support a unit or feature test would be high effort, so I elected to keep the e2e test for now. |
# Conflicts: # _test/relayer_chain_test.go
|
||
If you want a test to be included in CI automatically, name your test with prefix `TestScenario`. For these tests, | ||
it's highly recommended to use a simple network of 1 validator and no full nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This convention may make it easier to automatically include new tests.
@@ -75,8 +75,8 @@ ibctest-legacy: | |||
ibctest-multiple: | |||
cd ibctest && go test -race -v -run TestRelayerMultiplePathsSingleProcess . | |||
|
|||
ibctest-path-filter: | |||
cd ibctest && go test -race -v -run TestPathFilter . | |||
ibctest-scenario: ## Scenario tests are suitable for simple networks of 1 validator and no full nodes. They test specific functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @boojamya and @DavidNix! Really excited to finally get this functionality added back into the relayer.
I had a question or two and one nit but otherwise this looks good. One thing I wanted to bring up is that #476 seems to hint at the possibility that an update to the light client may fail if the validator set has changed too much within some trust level set when the client was initialized (e.g. 1/3 of the validator set changes since the last update and the current block). It almost sounds like we need to perform some check on the proposed client update and possibly handle some edge cases.
ProcessorEvents string = "events" | ||
ProcessorLegacy = "legacy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps outside the scope of this PR but should these be some sort of enum since we are only ever expecting events
and legacy
as valid inputs?
Completely outside the scope of this PR but something that's been on my mind, now that we've more or less deprecated the "legacy" relayer code will we eventually phase this out completely and remove the code relevant to the old legacy relaying logic? cc @jackzampolin @agouin
) | ||
} else { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like we are checking if the cached client state has a correlating consensus time cached and if not we query for the block header at the same height as the cached client state to have access to the consensus time for when this cached client state was updated last.
Afterwards we calculate the time since the last client update by taking time.Since(consensusHeightTime)
, which is saying time.Now() - the last time the client was updated
, and subtract that value from the trusting period to determine if the client is going to expire within our update threshold time. And if this value is within our update threshold time we do some logging and signal that a MsgUpdateClient
needs to be sent out.
Because this is a critical code path that users will be putting a lot of trust in, I just want to be sure I have a good mental model of how this works. It seems beneficial to document how the relayer is using the client update threshold time to determine when clients need updating. Without reading the code I would first imagine that the relayer is just sending out the MsgUpdateClient
once every update threshold period regardless of when the last update occurred, but in reality we are determining if ANY relayer has updated the client and we are ensuring that the client is updated at most once within this update threshold period in relation to the trusting period.
(sorry for the wall of text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without reading the code I would first imagine that the relayer is just sending out the MsgUpdateClient once every update threshold period regardless of when the last update occurred
This was my intuition as well. But we're doing it this different way as you pointed out.
Dan authored this part of the code and it was my understanding he paired with Andrew on it. Therefore, I didn't refactor or question it. I only added tests that hopefully cover this feature.
You ask a great question. Given it's critical path, do you think we wait until Dan is back before merging? (I'm fine either way.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way Dan implemented this probably makes more sense since you occur less fees by only updating the client when it's within some threshold of expiring vs. spamming the update msgs on a timer. We probably just want to document this somewhere though for users as well as any maintainers that may come in further down the line!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Would you be ok with a quick followup PR for Dan to add documentation? He will be able to write that better than I.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me! PR approved ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When @boojamya and I initially paired on this, I was unaware of the update threshold period flag, so we coded it to update the clients if 2/3 of the trusting period had passed since the last update, without any configuration ability.
The flag duration in combination with the time until client expiration seems a bit confusing, so my initial thought was that we should pick one or the other. But, in today's standup, @jtieri brought up a good point that it is desirable to have both approaches since some parties may want the clients updated on a steady frequency while others may just want to ensure the clients don't expire but avoid spending unnecessary fees and only do it if necessary.
With that taken into account, I think this would be easier to understand as:
- Update client if time until client expiration is <= 1/3 of trusting period.
- Update client if time since last update is greater than update threshold period flag duration (this could default to 0 duration in which case this check would be disabled)
This would enable automatic updating within trusting period without any configuration, but would also enable more frequent client updates if desired. I think this make the flag easier to understand as an operator because, for example, an update threshold period of 2h
would update the clients every 2 hours if there is no other IBC traffic and 2 hours is less than 2/3 of the client trusting period. If the operator accidentally configured the update threshold period to be greater than the trusting period, it has the failsafe that wouldn't let the clients expire since it would still update them if not updated within 2/3 of the trusting period.
I think a follow up PR probably makes sense. I just kinda skimmed that issue I linked so I don't really fully understand the implications of what they were discussing there either. |
Co-authored-by: Andrew Gouin <andrew@gouin.io>
Co-authored-by: Andrew Gouin <andrew@gouin.io>
Co-authored-by: Andrew Gouin <andrew@gouin.io>
- cosmos/relayer#1008 Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>
- cosmos/relayer#1008 Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com> Signed-off-by: pratikbin <68642400+pratikbin@users.noreply.github.com>
The goal of this PR is to ensure light clients don’t expire on low trafficked paths where
MsgUpdateClient
isn’t being sent regularly.The utilizes the
--time-threshold
flag in therly start
command.If the client is set to expire within the threshold, the relayer will update the client.
Before this PR, this flag was present but had no functionality.
The
assembleAndSendMessages
function gets called on every block that has available signals to process. This is where we put a check to see if the light client is close to being expired.We get the trusting period of the client when we create a new
cosmos_chain_processor
and cache this info so it is not need to make this call every block.I believe all of the client state info, including the trusting period, will be updated in the relayers cache every time it handles relevant transaction messages.
Thanks for the assistance @agouin
Closes: #853
Test coverage is covered in this PR.