Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-node/rollup/sequencing: Fix temporary engine error handling #12258

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

sebastianst
Copy link
Member

Description

Do schedule a next sequencer action even if the building state is empty in onEngineTemporaryError.
This can happen if the previous successful payload cleared the building state and then when startBuildingBlock is entered, it hits a temp error but never modified the cleared building state, so the temp error handler would never schedule the next action.

Tests

Additional context

Metadata

Towards #12240 #12041 (but not necessarily fixing yet, if the current solution is wrong)

@bearpebble
Copy link

@sebastianst just my two cents for a potential minimally invasive fix.

Add a StartedSequencing bool member to the BuildingState struct and then assign true right before the PreparePayloadAttributes() call, i.e.

d.log.Info("Started sequencing new block", "parent", l2Head, "l1Origin", l1Origin)
d.latest.StartedSequencing = true

This assumes setting d.latest.Started earlier than currently in onBuildStarted() is not an option. Because if that's fine as well, it can just be assigned in the same place with the same result I think.

@sebastianst
Copy link
Member Author

@sebastianst just my two cents for a potential minimally invasive fix.

Add a StartedSequencing bool member to the BuildingState struct and then assign true right before the PreparePayloadAttributes() call, i.e.

d.log.Info("Started sequencing new block", "parent", l2Head, "l1Origin", l1Origin)
d.latest.StartedSequencing = true

This assumes setting d.latest.Started earlier than currently in onBuildStarted() is not an option. Because if that's fine as well, it can just be assigned in the same place with the same result I think.

Yep that would also have been my 2nd try. But removing the early return might just be the better fix.

@mdehoog
Copy link
Contributor

mdehoog commented Oct 11, 2024

@sebastianst to add some data to the testing story: we've been running this change on some internal L3 testnets and it's working well

Copy link
Contributor

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change itself is valid, to get the sequencer unstuck, if it ever does encounter a combination of nextActionOK == false and temporary engine errors.

But I am a little puzzled why nextActionOK is false to begin with. It should only be false if the sequencer isn't able to sequence (during sync from L1, during reset of L1 sync, during extraordinary safe-head lag, or when meant to be inactive). So that may be worth investigating more, and I think we can drop nextActionOK state altogether, if we capture the above no-sequence cases individually.

Without the nextActionOK reason, I don't see how we can put together a coherent test. So I am in favor of merging this, based on what we have seen, and then refactoring it to not have a nextActionOK at all anymore.

@protolambda protolambda marked this pull request as ready for review October 16, 2024 11:28
Do schedule a next sequencer action even if the building state is empty
in `onEngineTemporaryError`.
This can happen if the previous successful payload cleared the building
state and then when `startBuildingBlock` is entered, it hits a temp
error but never modified the cleared building state, so the temp error
handler would never schedule the next action.

Towards #12240 #12041
@sebastianst sebastianst added this pull request to the merge queue Oct 16, 2024
Merged via the queue into develop with commit 86e5f63 Oct 16, 2024
49 checks passed
@sebastianst sebastianst deleted the seb/sequencer-temp-err-fix branch October 16, 2024 13:26
samlaf pushed a commit to samlaf/optimism that referenced this pull request Nov 10, 2024
…eum-optimism#12258)

Do schedule a next sequencer action even if the building state is empty
in `onEngineTemporaryError`.
This can happen if the previous successful payload cleared the building
state and then when `startBuildingBlock` is entered, it hits a temp
error but never modified the cleared building state, so the temp error
handler would never schedule the next action.

Towards ethereum-optimism#12240 ethereum-optimism#12041
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants