Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement builder DSL for light client initialization, and other things #583

Merged
merged 31 commits into from
Sep 30, 2020

Conversation

romac
Copy link
Member

@romac romac commented Sep 24, 2020

Apologies for this huge, many-headed, multi-faced PR which would deserve to be split up but that has become quite difficult as I have been evolving it alongside the work on the ibc-rs side. Let me know if it's too much to handle, and I'll try to do it by the end of the week.


Here's what it does:

  • Expose a builder DSL for initializing light client instances and their supervisor (partially addresses light-node: Initialization Follow up #484)
  • Adapt the light node commands to make use of the new DSL
  • Refactor the Io component a bit to enable the aforementioned DSL
  • Improves error handling in the light node, just a bit (partially addresses light-node: Error Handling and Exiting #486)
  • Improves the definition of the supervisor Handle for use within ibc-rs

Disclaimer 0: Documentation for the builder DSL is really minimal at the moment.

Disclaimer 1: This PR does NOT address #497.

Disclaimer 2: This PR is lacking unit tests for the new builder DSL.

Disclaimer 3: I have not tested whether the light node CLI still works.

Disclaimer 4: The builder DSL could definitely still be improved, but I'd first like to use it within ibc-rs before making other changes.


Given all this, I think that the easiest way to review this PR is to:

  • Start with the first 6 commits (up to, and including, 07a2427).
  • Then review the builders themselves in light-client/src/builder.rs and light-client/src/builder/.
  • Then focus on the rest of the changes within the light-client directory.
  • And finally review the changes within the light-node directory.

I won't assign any reviewers given the poor state of the PR, so feel free to assign yourselves.


  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG.md

@codecov-commenter
Copy link

Codecov Report

Merging #583 into master will decrease coverage by 0.9%.
The diff coverage is 6.4%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #583     +/-   ##
========================================
- Coverage    43.0%   42.1%   -1.0%     
========================================
  Files         157     161      +4     
  Lines       10254   10472    +218     
  Branches     2502    2544     +42     
========================================
- Hits         4415    4413      -2     
- Misses       5482    5696    +214     
- Partials      357     363      +6     
Impacted Files Coverage Δ
light-client/examples/light_client.rs 0.0% <0.0%> (ø)
light-client/src/builder.rs 0.0% <ø> (ø)
light-client/src/builder/error.rs 0.0% <0.0%> (ø)
light-client/src/builder/light_client.rs 0.0% <0.0%> (ø)
light-client/src/builder/supervisor.rs 0.0% <0.0%> (ø)
light-client/src/components/io.rs 0.0% <0.0%> (-2.8%) ⬇️
light-client/tests/integration.rs 0.0% <0.0%> (ø)
light-node/src/commands/initialize.rs 0.0% <0.0%> (ø)
light-node/src/commands/start.rs 0.0% <0.0%> (ø)
light-node/src/rpc.rs 52.1% <0.0%> (-5.0%) ⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 856084e...ee4bbaa. Read the comment docs.

) -> Result<LightClientBuilder<HasTrustedState>, Error> {
self.validate(&trusted_state)?;

// TODO(liamsi, romac): it is unclear if this should be Trusted or only Verified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be a good idea to cross-verify the first header to check if all witnesses have the same view.

but in the end, you have to trust the state I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good idea to cross-verify the first header to check if all witnesses have the same view.

Sounds like a good idea, indeed!

@ebuchman What do you think? Should we address this in a follow-up PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, if anyone disagrees should probably throw an error and refuse to start

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree. We currently just check that the chain ID's are the same but it would make more sense to cross-verify the first header. This way we can also check for responsiveness i.e. if a witness doesn't respond we drop them, and for correctness i.e. if one of the headers are different we halt and return the error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have collected the outcome of the discussions regarding initialization into
tendermint/spec#159

We have to trust the given initial header. Cross-check doesn't do anything useful. It only opens the possibility that one fault peer can prevent the light node to start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well catch that error at startup and get the user to provide a different witness rather than catch it later and just drop the witness without replacement, no ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well catch that error at startup and get the user to provide a different witness rather than catch it later and just drop the witness without replacement, no ?

We can do that. Still, the user might need to do that several times, if the user is unlucky and always chooses another faulty peer. And these faulty peers don't even need to be clever but just return garbage with the right height. Still, if it makes sense in the practical setting we can do that. It is definitely safe.

/// Run a future to completion on a new thread, with the given timeout.
///
/// This function will block the caller until the given future has completed.
pub fn block_on<F>(f: F, peer: PeerId, timeout: Option<Duration>) -> Result<F::Output, IoError>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're going to solve the sync/async conundrum in the short term (and definitely not in this PR), but to get the two to play nicely together we should try to figure out a simple, low-effort way to spawn a single Tokio runtime in a separate thread to which we can submit async tasks from synchronous code.

thanethomson
thanethomson previously approved these changes Sep 29, 2020
Copy link
Contributor

@thanethomson thanethomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally say that this looks good enough to merge if it works for ibc-rs 👍

Co-authored-by: Thane Thomson <thane@informal.systems>
@ebuchman
Copy link
Member

@romac what about the failing integration test?

@romac
Copy link
Member Author

romac commented Sep 29, 2020

Weird! Will take a look asap

ebuchman
ebuchman previously approved these changes Sep 29, 2020
Copy link
Member

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. A bunch of comments but don't think any of them need to block, just need to figure out / fix the failing test


/// Builder for a light client [`Instance`]
#[must_use]
pub struct LightClientBuilder<State> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't generic parameters usually just one letter? How does one decide whether this should be S or State ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are clear guidelines on that in the Rust community, it's true that single letter parameter are more common overall, but I thought that spelling it out in this case would make it more obvious what the type param is about, especially since it this case it represents the state of the builder and not any other type, like eg. with Vec<T> where T just stands for any type. Happy to change it to a single letter regardless if you think that's more consistent.

light_store: Box<dyn LightStore>,

#[allow(dead_code)]
state: State,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the field here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a field here because in Rust type parameters of a struct have to appear in its fields.

We could use PhantomData<State>, but since the two type states are zero-sized anyway (same as PhantomData), this saves us an import here and some potential confusion wrt to PhantomData.

state: State,
}

impl<Current> LightClientBuilder<Current> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Current is another generic parameter, and this private method lets us go from any arbitrary state to any other arbitrary state.

It feels a bit like we're jumping through a few hoops to do this with the type parameters and seems like it might be cleaner to just have structs for each type that wrap the underlying LightClientBuilder, like we discussed, but maybe that's just me or I'm not used to seeing some of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern can definitely be expressed both ways. I think it's just a matter of preference. I prefer the type param approach as it to me it expresses more clearly that the purpose of the struct stays the same, it's just in a different state. Perhaps it's just because I am more used to this pattern from Haskell or Scala, but in the end I don't feel that strongly either way. So if you feel like the wrapper structs would be more readable, I am happy to make the change.

/// Private method to move from one state to another
fn with_state<Next>(self, state: Next) -> LightClientBuilder<Next> {
LightClientBuilder {
peer_id: self.peer_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use .. for everything except state ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, using ..self yields the following error because self does not have the right type:

error[E0308]: mismatched types
  --> light-client/src/builder/light_client.rs:53:39
   |
50 | impl<Current> LightClientBuilder<Current> {
   |      ------- found type parameter
51 |     /// Private method to move from one state to another
52 |     fn with_state<Next>(self, state: Next) -> LightClientBuilder<Next> {
   |                   ---- expected type parameter
53 |         LightClientBuilder { state, ..self }
   |                                       ^^^^ expected type parameter `Next`, found type parameter `Current`
   |
   = note: expected struct `builder::light_client::LightClientBuilder<Next>`
              found struct `builder::light_client::LightClientBuilder<Current>`
   = note: a type parameter was expected, but a different one was found; you might be missing a type parameter or trait bound
   = note: for more information, visit https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol ok. I think this and the need to have a dummy state field both motivate using the wrapper approach as I think that would solve both, but no biggie

) -> Result<LightClientBuilder<HasTrustedState>, Error> {
self.validate(&trusted_state)?;

// TODO(liamsi, romac): it is unclear if this should be Trusted or only Verified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, if anyone disagrees should probably throw an error and refuse to start

}

/// Fetch and set the latest block from the primary peer as the trusted state.
pub fn trust_primary_latest(self) -> Result<LightClientBuilder<HasTrustedState>, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit dangerous and should either come with a big warning or be removed all together

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it since we don't have any use for it at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

/// Set the given light block as the initial trusted state.
pub fn trust_light_block(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public?

Copy link
Member Author

@romac romac Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It should not, as it is quite dangerous and is only currently used in the light client example. The latter could be rewritten to use trust_from_store instead so it's not needed at all. Will make it private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO(ismail, romac): actually verify more predicates of light block?

self.predicates
.validator_sets_match(light_block, &*self.hasher)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important thing is actually the next_validators!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

fn validate(&self, light_block: &LightBlock) -> Result<(), Error> {
// TODO(ismail, romac): actually verify more predicates of light block?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably check:

  • is_within_trust_period
  • is_header_from_past
  • validator_sets_match (though this doesn't really matter since we will never use these vals)
  • next_validators_match

Everything else depends on the commit and/or previous blocks. We could verify the commit but it's not really necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut builder = PeerList::builder();
builder.primary(primary, primary_instance);
builder.witness(witness, witness_instance);
builder.build()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't these still be chained?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not since the API change in 26d1efe.

I figured I would follow the current recommendations of the popular derive_builder crate. I agree that the chained version looks a lot better, so I would definitely not mind going back on that.

@romac
Copy link
Member Author

romac commented Sep 29, 2020

@thanethomson @ebuchman Thanks a lot for your inputs!

Just pushed a few commits which address some of the comments above. Will address the rest + CI failure tomorrow morning.

@romac
Copy link
Member Author

romac commented Sep 30, 2020

@ebuchman @thanethomson This PR is ready for merge imho, now that we have figured out where the CI failure came from (#600).

Copy link
Member

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants