Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ChainId implementation; this is API breaking change #698

Closed
wants to merge 1 commit into from

Conversation

mina86
Copy link
Contributor

@mina86 mina86 commented Jun 1, 2023

ChainId::new allocates a new String so there’s no point in passing
ownership of name argument. The constructor can accept it as a str
slice without loss of functionality. This reduces allocations when
creating ids since caller no longer has to have an owned string.

Replace From implementations with TryFrom implementations. Conversion
now fails if string is not in an epoch format. There’s still a way to
construct an identifier without epoch format if it comes from
tendermint::chain::Id (or by using ChainId::new with zero version;
behaviour of that might change). And since there’s From
implementation available, also get rid of ChainId::from_string.

Optimise epoch format parsing. from_string used to check if argument
was in epoch format and then call chain_version which did it again.
To avoid duplicating work, introduce split_chain_id method which does
the parsing and returns result which includes all the information
callers might want.

Similarly, calling split and collecting values into a vector is
wasteful if all we need is the last token. Furthermore, regexes are
quite a heavy machinery for the task of checking if string ends with
a number. To address that, use split_last and parse the number to
check if it’s valid.

PR author checklist:

  • Added changelog entry, using unclog.
  • [n/a] Added tests.
  • [n/a] Linked to GitHub issue.
  • Updated code comments and documentation (e.g., docs/).
  • Tagged one reviewer who will be the one responsible for shepherding this PR.

Reviewer checklist:

  • Reviewed Files changed in the GitHub PR explorer.
  • Manually tested (in case integration/unit/mock tests are absent).

Copy link
Contributor

@plafer plafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @mina86, this is a much needed cleanup of the type.

split.join("-")
};
self.version = version;
if self.version != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly different check than previously. The previous implementation effectively checks if there's a - in the id, whereas this checks if version is 0 (which can occur with e.g. chain id "mockChainNotEpochFormat").

I suggest we:

  1. store an is_epoch_format boolean in the type itself, so that we only need to parse the chain ID upon construction.
  • Use this is_epoch_format instead of self.version != 0
  1. Change this signature to with_version(self, version: u64) -> Result<Self, IdentifierError>
  • we needlessly require the caller to have a mutable reference to the object
  • we should fail if the identifier is not in epoch format, and let the caller decide what to do then. Note that this function is not used anywhere, so this change of signature should be straightforward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.version == 0 implies that the id is not in epoch format so
checking it is just an optimisation¹. As far as I can tell the old
and new code behave the same way (module panic if version doesn’t fit
u64). Neither the old nor the new code will change identifier such
as foo-bar for example.

I can add is_epoch_format if you prefer but it will increase the
size of the ChainId structure which I feel like it could end up
hurting rather than helping.

Returning Result certainly makes sense though note that changing
mut self to self doesn’t actually matter for the caller. self
is moved into the method. What possibly would make most sense is
with_version(self, u64) -> Result<Self, Self> (or changing the
semantics such that it adds version if not already in epoch format).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.version == 0 implies that the id is not in epoch format so
checking it is just an optimisation¹.

Ah right, I didn't realize "foo-0" is not considered epoch format. So basically we had it wrong before, and this is correct.

I can add is_epoch_format if you prefer

Nope, this is no longer necessary (see previous comment).

What possibly would make most sense is with_version(self, u64) -> Result<Self, Self> (or changing the semantics such that it adds version if not already in epoch format).

The reason I still prefer my suggestion is that when not in epoch format, the concept of version in a chain ID doesn't exist. In practice, all chains now use the epoch format. I do not see a use case to "convert your chain ID to epoch format"; most likely, if someone called with_version() on a chain ID that is not in epoch format, it's a bug and they should fix it, as opposed to intended behavior.

Note that this whole "encode the version number in the chain id" evolved as a hack; it wasn't in the original IBC standard, and is pretty messy. Non-epoch format chain IDs are pretty much non-existent.

Comment on lines 164 to 209
impl<'a> From<&'a str> for ChainId {
fn from(value: &'a str) -> Self {
let version = Self::chain_version(value);
Self {
id: String::from(value),
version,
}
}
}

impl From<String> for ChainId {
fn from(value: String) -> Self {
let version = Self::chain_version(&value);
Self { id: value, version }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panicking is not acceptable; then one could easily crash a chain by sending a ClientState object with a very long version number (here).

However, our previous solution of storing 0 in that case is also not that great. I suggest we simply implement TryFrom<String> and TryFrom<&'a str> instead, and bubble the errors up instead of hiding them. In tests, we can just unwrap(). Note that returning an error is consistent with ibc-go's implementation (where a panic is equivalent to failing to transaction).

Good catch though, I never noticed this behavior before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to clarify, the desired behaviour is:

  • foo → valid not in epoch format,
  • foo-bar → valid not in epoch format,
  • foo-0 → valid not in epoch format (?),
  • foo-1 → valid in epoch format and
  • foo-18446744073709551616 → invalid.

PS. Regarding panicking, I was under the impression that this code is
executed within the VM where panics result in transaction failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to clarify, the desired behaviour is:

That's correct. I use this golang playground (extracted from ibc-go's implementation) to figure out which strings should be considered epoch format.

Note that foo-18446744073709551616 is considered valid given this regexp, but they later fail when trying to parse the version number into a u64.

PS. Regarding panicking, I was under the impression that this code is executed within the VM where panics result in transaction failure.

We make no such assumption. And as it turns out, most (if not all) of our users don't use a VM at all; everything is compiled directly in the binary. So a panic would make the whole node crash (unless of course they guard against it, but we don't assume they do).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the output of ChainId::chain_version("foo-18446744073709551616")?

What’s the output of ChainId::from(tendermint::chain::Id("foo-18446744073709551616"))? That conversion is used for serde deserialisation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the output of ChainId::chain_version("foo-18446744073709551616")?

That should be an error; so the return type should be changed to Result<u64, IdentifierError>

What’s the output of ChainId::from(tendermint::chain::Id("foo-18446744073709551616"))?

We should implement TryFrom<tendermint::chain::Id> for ChainIdinstead, and return anIdentifierError`.

}
let (car, cdr) = version.as_bytes().split_first()?;
if b'1' <= *car && *car <= b'9' && cdr.iter().all(u8::is_ascii_digit) {
Some((name, version.parse().unwrap()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other comments, we'd need to return an IdentifierError when the version parsing here fails

Comment on lines 133 to 134
/// **Panics** if chain id is in epoch format but the version overflows
/// `u64`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to remove this after removing the panicking behavior

@mina86 mina86 force-pushed the a branch 2 times, most recently from b724ea9 to d8ce548 Compare June 10, 2023 13:15
ChainId::new allocates a new String so there’s no point in passing
ownership of name argument.  The constructor can accept it as a str
slice without loss of functionality.  This reduces allocations when
creating ids since caller no longer has to have an owned string.

Replace From implementations with TryFrom implementations.  Conversion
now fails if string is not in an epoch format.  There’s still a way to
construct an identifier without epoch format if it comes from
tendermint::chain::Id (or by using ChainId::new with zero version;
behaviour of that might change).  And since there’s From implementation
available, also get rid of ChainId::from_string.

Optimise epoch format parsing.  from_string used to check if argument
was in epoch format and then call chain_version which did it again.
To avoid duplicating work, introduce split_chain_id method which does
the parsing and returns result which includes all the information
callers might want.

Similarly, calling split and collecting values into a vector is
wasteful if all we need is the last token.  Furthermore, regexes are
quite a heavy machinery for the task of checking if string ends with
a number.  To address that, use split_last and parse the number to
check if it’s valid.
@mina86 mina86 changed the title Improve ChainId implementation Improve ChainId implementation; this is API breaking change Jun 10, 2023
@plafer
Copy link
Contributor

plafer commented Jun 21, 2023

Is #721 meant to replace this PR?

@mina86
Copy link
Contributor Author

mina86 commented Jun 21, 2023

Is #721 meant to replace this PR?

Not quite. Since this PR is growing I’ve moved some of the changes to separate PR.

@Farhad-Shabani
Copy link
Member

Given that #762 is being merged, this PR gets outdated. Feel free to open an issue, if still any of the above has not been addressed.

@mina86 mina86 deleted the a branch October 31, 2023 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants