Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution driver uses NodeSyncState for certificate execution #3323

Merged
merged 5 commits into from
Jul 21, 2022

Conversation

mystenmark
Copy link
Contributor

@mystenmark mystenmark commented Jul 20, 2022

This takes care of the performance TODOs in execution driver (for now, at least).

It also makes it acceptable for checkpoints to pend digests for which it doesn't have certs (they will be fetched).

Note that this does not automatically fetch parent certs, and assumes that they will be enqueued as well (or are already executed). If we're not comfortable with this assumption we can certainly add parent fetching to NodeSyncState.

@mystenmark mystenmark changed the title Mlogan exec driver Execution driver uses NodeSyncState for certification execution Jul 20, 2022
@mystenmark mystenmark force-pushed the mlogan-exec-driver branch from 7c6b4a1 to ec51269 Compare July 20, 2022 05:23
@mystenmark mystenmark requested review from gdanezis and lxfind July 20, 2022 05:24
@mystenmark mystenmark changed the title Execution driver uses NodeSyncState for certification execution Execution driver uses NodeSyncState for certificate execution Jul 20, 2022
@mystenmark mystenmark marked this pull request as ready for review July 20, 2022 05:24
@gdanezis
Copy link
Collaborator

Note that this does not automatically fetch parent certs, and assumes that they will be enqueued as well (or are already executed). If we're not comfortable with this assumption we can certainly add parent fetching to NodeSyncState.

Unsure what the result of this is: can e guarantee that all previous certs will be enqueued or we need extra logic for this. I am particularity concerned about the precursor transactions to shared objects, that if not present may block the execution of a shared object cert, slowing everyone down.

Copy link
Collaborator

@gdanezis gdanezis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a serious amount of stuff in the PR which is quite complex -- due to the nature of what we do. Do try to provide a block interface to drive / request execution rather than a tx by tx interface -- it will make our job down the line easier. I am still unclear if for shared object trasnactions we fetch deps -- but I think not.

.handle_execution_request(pending_transactions.iter().map(|(_, digest)| *digest))
// zip results back together with seq
.zip(stream::iter(pending_transactions.iter()))
// filter out errors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not do something about the errors? This is the place where if some dependent previous transaction has not been processed, objects will not be available etc. We should probably print / log something about these? And also ensure we process them using full sync?

// this pattern for limiting concurrency is from
// https://github.com/tokio-rs/tokio/discussions/2648
let limit = Arc::new(Semaphore::new(MAX_NODE_SYNC_CONCURRENCY));
let mut stream = Box::pin(stream);

while let Some(DigestsMessage { digests, peer, tx }) = stream.next().await {
while let Some(DigestsMessage { sync_arg, peer, tx }) = stream.next().await {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up: efficient parallel execution can be best implemented if you pass in blocks of transactions rather than passing the transactions in one by one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your argument here for efficiency? I can imagine some theoretical benefits (e.g. cache coherency, less synchronization overhead in the channels) that might be obtained from better batching, but I wouldn't expect that to have a noticeable effect here.

Also - keep in mind that even if we pass in TXes in blocks, we are going to want to farm the execution out to multiple tasks for execution parallelism anyway. (This is a TODO right now - i'm waiting until I have a working devnet to do that so I can measure how much of a speedup it is).

})?;
}
trace!(?parent, ?digest, "waiting for parent");
// Since we no longer hold the semaphore permit, can be sure that our parent will be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is good enough to prevent a deadlock, what if something else takes this permit and also blocks? Is that possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you read from the top of process_digest, you'll see that the only blocking task we do while holding the permit is downloading the cert and effects, after which we drop the permit. Downloading will always complete (successfully or otherwise), so permits cannot be held indefinitely.

.forget_effects(&effects.effects.digest());
}
CertAndEffects::Validator(cert) => {
self.state.handle_certificate(cert).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work of course if we do not have the dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - an error will be reported to the caller.

) -> impl Stream<Item = SuiResult> {
let futures: FuturesOrdered<_> = checkpoint_contents
.iter()
.map(|digests| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honesty I would make the base case the case where we send and process the whole block of transaction, and the special case the case where we have a block of 1.

digests: impl Iterator<Item = TransactionDigest>,
) -> impl Stream<Item = SuiResult> {
let futures: FuturesOrdered<_> = digests
.map(|digest| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, better to handle blocks of transactions.

where
A: AuthorityAPI + Send + Sync + 'static + Clone,
{
async fn handle_digest(&self, follower: &Follower<A>, digests: ExecutionDigests) -> SuiResult {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, I would send blocks of transactions from the follower on follower block boundaries.

@mystenmark
Copy link
Contributor Author

There is a serious amount of stuff in the PR which is quite complex -- due to the nature of what we do.

Yes, my apologies - this code got fairly complex when dealing with the "wait for finality / trustworthy effects" problem, and it hasn't gotten any simpler since then. If I had time I would probably love to rewrite it and make it simpler, but we may be stuck with it for now.

@mystenmark
Copy link
Contributor Author

Note that this does not automatically fetch parent certs, and assumes that they will be enqueued as well (or are already executed). If we're not comfortable with this assumption we can certainly add parent fetching to NodeSyncState.

Unsure what the result of this is: can e guarantee that all previous certs will be enqueued or we need extra logic for this. I am particularity concerned about the precursor transactions to shared objects, that if not present may block the execution of a shared object cert, slowing everyone down.

That's a good point - I was thinking more about checkpoints (since that's what motivated this work), and in that case i'm pretty sure it's impossible to have an orphaned tx in a fragment.

I will add support for parent fetching/execution.

@mystenmark mystenmark force-pushed the mlogan-exec-driver branch from ec51269 to b6ac1c4 Compare July 20, 2022 20:25
@mystenmark
Copy link
Contributor Author

I thought some more about fetching parents. I think this issue was somewhat underexplored in the old code. Let's restrict the discussion to shared-object TXes that have been scheduled on a validator (checkpoints and nodesync are simple because we have final effects in hand in those cases):

  • Its not actually guaranteed that anyone else knows what the parent certs are yet - someone has to be the first validator to try executing the cert and determining whether there are lock errors - which, if there are, will not tell us what the missing certs are.
  • We do know that the signers of the cert can execute the cert, of which at least f+1 are honest.
  • So we could ask the signers of the cert to execute it for us - all honest signers must succeed and return effects.
  • However, we can't actually trust the effects given to us by any given signer (as sync_authority_source_to_destination does), we need to observe f+1 identical effects before we can trust them.
    • A byzantine validator can tell us about a made up cert digest that never existed and send us chasing our tail.
    • More worryingly, if the point of doing this work is to avoid deadlocking (or suffering high latency), we shouldn't open the door for a byzantine validator to feed us a set of parents that it believes will trigger a deadlock (or high latency)

Ok, so, all that said, it seems like to do this correctly, after encountering owned object lock errors, we should execute the cert on other validators until we get an f+1 quorum that all give us identical effects in response. Then we can ask those same validators for the missing certs (recursively). This could potentially cause O(n^2) network traffic, so it seems we should be cautious about doing this. Also, this is a decent chunk of additional code.

Alternatively, the entire problem can be avoided by putting the responsibility on the client to push certs to as many validators as it can (which is what QuorumDriver already does) - if it fails to do so, and if gossip also fails to propogate the certs in question, then yes, some additional latency is possible. But I feel like going with the simpler, client-driven model is "the Sui way" and avoids a lot of potential pitfalls.

Currently the only user of the execution driver that doesn't also have a true effects to start with is the shared object tx case. So my questions are:

  1. Do we expect any other users?
  2. What is the likely impact of not doing parent fetching in this case? I believe it should be small - we already have two completely independent mechanisms for mitigating this issue (gossip and quorum driver). I think before we add a third mechanism we should wait for some evidence that the existing mechanisms are insufficient.

@mystenmark mystenmark requested a review from gdanezis July 20, 2022 21:05
@mystenmark
Copy link
Contributor Author

Discussed parent syncing with @gdanezis - decision is to add parent syncing so that execution driver is guaranteed to make progress and acts as a backstop to the fallible processes of gossip and quorum driver.

@mystenmark
Copy link
Contributor Author

Re processing transactions in blocks - will do some benchmarking or profiling to determine whether that is necessary. Checkpoint sync (when it is working) may also serve as a good test case since it will eliminate much of the bookkeeping overhead.

@mystenmark mystenmark enabled auto-merge (squash) July 21, 2022 17:04
@mystenmark mystenmark merged commit 6e78098 into main Jul 21, 2022
@mystenmark mystenmark deleted the mlogan-exec-driver branch July 21, 2022 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants