core/txpool, eth/catalyst: fix racey simulator due to txpool background reset #28837

karalabe · 2024-01-18T16:15:03Z

This PR fixes an issues in the new simulated backend. The root cause is the fact that the transaction pool has an internal reset operation that runs on a background thread.

When a new transaction is added to the pool via the RPC, the transaction is added to a non-executable queue and will be moved to its final location on a background thread. If the machine is overloaded (or simply due to timing issues), it can happen that the simulated backend will try to produce the next block, whilst the pool has not yet marked the newly added transaction executable. This will cause the block to not contain the transaction. This is an issue because we want determinism from the simulator: add a tx, mine a block. It should be in there.

The PR fixes it by adding a Sync function to the txpool, which waits for the current reset operation (if any) to finish, and then runs an entire round of reset on top. The new round is needed because resets are only triggered by new head events, so newly added transactions will not trigger the outer resets that we can wait on. The transaction pool would eventually internally do a reset even on transaction addition, but there's no easy way to wait on that and there's no meaningful reason to bubble that across everything. A clean outer reset will at worse be a small noop goroutine.

MariusVanDerWijden

LGTM, we only ever send nil over the resetWaiter channel, but I guess its okay

lightclient

couple spelling nits, but LGTM

eth/catalyst/api.go

core/txpool/txpool.go

rjl493456442

nitpick, otherwise lgtm

…nd reset

Co-authored-by: lightclient <14004106+lightclient@users.noreply.github.com>

holiman

LGTM

…d reset (ethereum#28837) This PR fixes an issues in the new simulated backend. The root cause is the fact that the transaction pool has an internal reset operation that runs on a background thread. When a new transaction is added to the pool via the RPC, the transaction is added to a non-executable queue and will be moved to its final location on a background thread. If the machine is overloaded (or simply due to timing issues), it can happen that the simulated backend will try to produce the next block, whilst the pool has not yet marked the newly added transaction executable. This will cause the block to not contain the transaction. This is an issue because we want determinism from the simulator: add a tx, mine a block. It should be in there. The PR fixes it by adding a Sync function to the txpool, which waits for the current reset operation (if any) to finish, and then runs an entire round of reset on top. The new round is needed because resets are only triggered by new head events, so newly added transactions will not trigger the outer resets that we can wait on. The transaction pool would eventually internally do a reset even on transaction addition, but there's no easy way to wait on that and there's no meaningful reason to bubble that across everything. A clean outer reset will at worse be a small noop goroutine.

karalabe added this to the 1.13.11 milestone Jan 18, 2024

karalabe requested review from gballet, holiman and rjl493456442 as code owners January 18, 2024 16:15

MariusVanDerWijden approved these changes Jan 19, 2024

View reviewed changes

lightclient approved these changes Jan 21, 2024

View reviewed changes

eth/catalyst/api.go Outdated Show resolved Hide resolved

core/txpool/txpool.go Outdated Show resolved Hide resolved

rjl493456442 reviewed Jan 22, 2024

View reviewed changes

core/txpool/txpool.go Outdated Show resolved Hide resolved

rjl493456442 approved these changes Jan 22, 2024

View reviewed changes

lightclient added the status:triage label Jan 23, 2024

karalabe and others added 3 commits January 23, 2024 16:09

core/txpool, eth/catalyst: fix racey simulator due to txpool backgrou…

d06e2bb

…nd reset

Apply suggestions from code review

86d1778

Co-authored-by: lightclient <14004106+lightclient@users.noreply.github.com>

core/txpool: notify sync waiter if txpool is closed

3a73fb6

holiman force-pushed the sim-tx-lockstep branch from 8e94e14 to 3a73fb6 Compare January 23, 2024 15:10

holiman approved these changes Jan 23, 2024

View reviewed changes

holiman merged commit 542c861 into ethereum:master Jan 23, 2024
2 of 3 checks passed

holiman mentioned this pull request Jan 29, 2024

ethclient/simulated: add timeout to fix flaky test #28821

Closed

nikitagashkov mentioned this pull request Apr 11, 2024

Fix deadlock in 'SimulatedBeacon.loop' #29476

Closed

ganeshvanahalli mentioned this pull request Apr 26, 2024

Merge v1.13.11 OffchainLabs/go-ethereum#311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/txpool, eth/catalyst: fix racey simulator due to txpool background reset #28837

core/txpool, eth/catalyst: fix racey simulator due to txpool background reset #28837

karalabe commented Jan 18, 2024 •

edited

Loading

MariusVanDerWijden left a comment

lightclient left a comment

rjl493456442 left a comment

holiman left a comment

core/txpool, eth/catalyst: fix racey simulator due to txpool background reset #28837

core/txpool, eth/catalyst: fix racey simulator due to txpool background reset #28837

Conversation

karalabe commented Jan 18, 2024 • edited Loading

MariusVanDerWijden left a comment

Choose a reason for hiding this comment

lightclient left a comment

Choose a reason for hiding this comment

rjl493456442 left a comment

Choose a reason for hiding this comment

holiman left a comment

Choose a reason for hiding this comment

karalabe commented Jan 18, 2024 •

edited

Loading