-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ChainId implementation; this is API breaking change #698
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @mina86, this is a much needed cleanup of the type.
split.join("-") | ||
}; | ||
self.version = version; | ||
if self.version != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slightly different check than previously. The previous implementation effectively checks if there's a -
in the id, whereas this checks if version is 0
(which can occur with e.g. chain id "mockChainNotEpochFormat"
).
I suggest we:
- store an
is_epoch_format
boolean in the type itself, so that we only need to parse the chain ID upon construction.
- Use this
is_epoch_format
instead ofself.version != 0
- Change this signature to
with_version(self, version: u64) -> Result<Self, IdentifierError>
- we needlessly require the caller to have a mutable reference to the object
- we should fail if the identifier is not in epoch format, and let the caller decide what to do then. Note that this function is not used anywhere, so this change of signature should be straightforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.version == 0
implies that the id is not in epoch format so
checking it is just an optimisation¹. As far as I can tell the old
and new code behave the same way (module panic if version doesn’t fit
u64
). Neither the old nor the new code will change identifier such
as foo-bar
for example.
I can add is_epoch_format
if you prefer but it will increase the
size of the ChainId
structure which I feel like it could end up
hurting rather than helping.
Returning Result
certainly makes sense though note that changing
mut self
to self
doesn’t actually matter for the caller. self
is moved into the method. What possibly would make most sense is
with_version(self, u64) -> Result<Self, Self>
(or changing the
semantics such that it adds version if not already in epoch format).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.version == 0
implies that the id is not in epoch format so
checking it is just an optimisation¹.
Ah right, I didn't realize "foo-0"
is not considered epoch format. So basically we had it wrong before, and this is correct.
I can add is_epoch_format if you prefer
Nope, this is no longer necessary (see previous comment).
What possibly would make most sense is
with_version(self, u64) -> Result<Self, Self>
(or changing the semantics such that it adds version if not already in epoch format).
The reason I still prefer my suggestion is that when not in epoch format, the concept of version
in a chain ID doesn't exist. In practice, all chains now use the epoch format. I do not see a use case to "convert your chain ID to epoch format"; most likely, if someone called with_version()
on a chain ID that is not in epoch format, it's a bug and they should fix it, as opposed to intended behavior.
Note that this whole "encode the version number in the chain id" evolved as a hack; it wasn't in the original IBC standard, and is pretty messy. Non-epoch format chain IDs are pretty much non-existent.
impl<'a> From<&'a str> for ChainId { | ||
fn from(value: &'a str) -> Self { | ||
let version = Self::chain_version(value); | ||
Self { | ||
id: String::from(value), | ||
version, | ||
} | ||
} | ||
} | ||
|
||
impl From<String> for ChainId { | ||
fn from(value: String) -> Self { | ||
let version = Self::chain_version(&value); | ||
Self { id: value, version } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panicking is not acceptable; then one could easily crash a chain by sending a ClientState
object with a very long version number (here).
However, our previous solution of storing 0
in that case is also not that great. I suggest we simply implement TryFrom<String>
and TryFrom<&'a str>
instead, and bubble the errors up instead of hiding them. In tests, we can just unwrap()
. Note that returning an error is consistent with ibc-go's implementation (where a panic is equivalent to failing to transaction).
Good catch though, I never noticed this behavior before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to clarify, the desired behaviour is:
foo
→ valid not in epoch format,foo-bar
→ valid not in epoch format,foo-0
→ valid not in epoch format (?),foo-1
→ valid in epoch format andfoo-18446744073709551616
→ invalid.
PS. Regarding panicking, I was under the impression that this code is
executed within the VM where panics result in transaction failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to clarify, the desired behaviour is:
That's correct. I use this golang playground (extracted from ibc-go's implementation) to figure out which strings should be considered epoch format.
Note that foo-18446744073709551616
is considered valid given this regexp, but they later fail when trying to parse the version number into a u64
.
PS. Regarding panicking, I was under the impression that this code is executed within the VM where panics result in transaction failure.
We make no such assumption. And as it turns out, most (if not all) of our users don't use a VM at all; everything is compiled directly in the binary. So a panic would make the whole node crash (unless of course they guard against it, but we don't assume they do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the output of ChainId::chain_version("foo-18446744073709551616")
?
What’s the output of ChainId::from(tendermint::chain::Id("foo-18446744073709551616"))
? That conversion is used for serde deserialisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the output of ChainId::chain_version("foo-18446744073709551616")?
That should be an error; so the return type should be changed to Result<u64, IdentifierError>
What’s the output of ChainId::from(tendermint::chain::Id("foo-18446744073709551616"))?
We should implement TryFrom<tendermint::chain::Id> for
ChainIdinstead, and return an
IdentifierError`.
} | ||
let (car, cdr) = version.as_bytes().split_first()?; | ||
if b'1' <= *car && *car <= b'9' && cdr.iter().all(u8::is_ascii_digit) { | ||
Some((name, version.parse().unwrap())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in other comments, we'd need to return an IdentifierError
when the version parsing here fails
/// **Panics** if chain id is in epoch format but the version overflows | ||
/// `u64`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to remove this after removing the panicking behavior
b724ea9
to
d8ce548
Compare
ChainId::new allocates a new String so there’s no point in passing ownership of name argument. The constructor can accept it as a str slice without loss of functionality. This reduces allocations when creating ids since caller no longer has to have an owned string. Replace From implementations with TryFrom implementations. Conversion now fails if string is not in an epoch format. There’s still a way to construct an identifier without epoch format if it comes from tendermint::chain::Id (or by using ChainId::new with zero version; behaviour of that might change). And since there’s From implementation available, also get rid of ChainId::from_string. Optimise epoch format parsing. from_string used to check if argument was in epoch format and then call chain_version which did it again. To avoid duplicating work, introduce split_chain_id method which does the parsing and returns result which includes all the information callers might want. Similarly, calling split and collecting values into a vector is wasteful if all we need is the last token. Furthermore, regexes are quite a heavy machinery for the task of checking if string ends with a number. To address that, use split_last and parse the number to check if it’s valid.
ChainId
implementation
Is #721 meant to replace this PR? |
Not quite. Since this PR is growing I’ve moved some of the changes to separate PR. |
Given that #762 is being merged, this PR gets outdated. Feel free to open an issue, if still any of the above has not been addressed. |
ChainId::new allocates a new String so there’s no point in passing
ownership of name argument. The constructor can accept it as a str
slice without loss of functionality. This reduces allocations when
creating ids since caller no longer has to have an owned string.
Replace From implementations with TryFrom implementations. Conversion
now fails if string is not in an epoch format. There’s still a way to
construct an identifier without epoch format if it comes from
tendermint::chain::Id (or by using ChainId::new with zero version;
behaviour of that might change). And since there’s From
implementation available, also get rid of ChainId::from_string.
Optimise epoch format parsing. from_string used to check if argument
was in epoch format and then call chain_version which did it again.
To avoid duplicating work, introduce split_chain_id method which does
the parsing and returns result which includes all the information
callers might want.
Similarly, calling split and collecting values into a vector is
wasteful if all we need is the last token. Furthermore, regexes are
quite a heavy machinery for the task of checking if string ends with
a number. To address that, use split_last and parse the number to
check if it’s valid.
PR author checklist:
unclog
.docs/
).Reviewer checklist:
Files changed
in the GitHub PR explorer.