-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Execution] Fix stop control #5327
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5327 +/- ##
=======================================
Coverage 55.70% 55.70%
=======================================
Files 1004 1003 -1
Lines 97080 97017 -63
=======================================
- Hits 54082 54048 -34
+ Misses 38929 38899 -30
- Partials 4069 4070 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
49c4415
to
407cd94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code looks OK. given this has been working, I'm assuming this is a corner case. can you document this some more either in the comments or in this PR
@@ -205,7 +205,8 @@ func (e *Engine) BlockProcessable(b *flow.Header, _ *flow.QuorumCertificate) { | |||
|
|||
// TODO: this should not be blocking: https://github.com/onflow/flow-go/issues/4400 | |||
|
|||
// skip if stopControl tells to skip | |||
// skip if stopControl tells to skip, so that we can avoid fetching collections | |||
// for this block | |||
if !e.stopControl.ShouldExecuteBlock(b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, this is the only place we check.
@@ -363,6 +364,12 @@ func (e *Engine) executeBlock( | |||
ctx context.Context, | |||
executableBlock *entity.ExecutableBlock, | |||
) { | |||
|
|||
// don't execute the block if the stop control says no | |||
if !e.stopControl.ShouldExecuteBlock(executableBlock.Block.Header) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop control should check again here, because otherwise there is an edge case that might execute blocks above the stop height. See the comment on L457
The edge case is that, when a block becomes processable, the stop height might not be set, so it's added to the queue, however, when it becomes executable and is able to execute, the stop height is set, and should not be executed. Without this check, it will be executed, but should not.
@@ -454,8 +463,6 @@ func (e *Engine) executeBlock( | |||
e.executionDataPruner.NotifyFulfilledHeight(executableBlock.Height()) | |||
} | |||
|
|||
e.stopControl.OnBlockExecuted(executableBlock.Block.Header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopControl.OnBlockExecuted is to report a block has been executed and triggers stop control to check whether it should stop/crash or not.
However, this is currently called after e.onBlockExecuted
is called, which checks if the child block (next block) can be executed and executes if completed. So imaging the following case:
Given two blocks: A <- B and we set stop height as B, meaning stop as soon as A is executed, and don't execute B. OK, now both block A and B's collections have been received, after A is executed, B becomes ready to be executed, if onBlockExecued(A)
is called before stopControl.OnBlockExecuted
, then it will execute block B in another go routine (which shouldn't). But the reason we never ran into this, is probably because before B is executed, stop.OnBlockExecuted
might have been called and crashed the node. However, B technically still have a chance to be executed, I didn't notice this until looking into a flakey test TestStopAtHeightWhenExecutedBeforeFinalized
which captured this edge case.
So my solution here is to 1) move this line up before the onBlockExecuted
, so that we let stop control check stop condition first. And 2) add ShouldExecuteBlock
into the executeBlock
function, so that even if stop control decides not to crash, and the function onBlockExecuted
found the next block has received its collection and can be executed, and we can still stop the next block from being executed.
@peterargue sorry, I just realized my comments in the PR was in "pending". I just published them. |
407cd94
to
c21cda3
Compare
This PR fixes the stop control when ingestion engine is about to execute next block. See details in the comments of the PR.