Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longer epoch transition time since capella #5409

Closed
dapplion opened this issue Apr 24, 2023 · 10 comments
Closed

Longer epoch transition time since capella #5409

dapplion opened this issue Apr 24, 2023 · 10 comments
Assignees
Labels
meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-medium Resolve this some time soon (tm).

Comments

@dapplion
Copy link
Contributor

Epoch transition avg time on mainnet up from 700ms, 900ms

image

Also noting that there's significant variability on the times looking as histogram. The variability is previous to capella tho, would be cool to ensure it's a temporary issue (high GC) or that some states do indeed take more time to process

image

@dapplion dapplion added the prio-medium Resolve this some time soon (tm). label Apr 24, 2023
@matthewkeil
Copy link
Member

Found bug attempting to resolve this. Put up issue #5440 and am attempting to debug that to pull the historical data to run flamegraph and benchmarks for the pre-capella state on 4/1

@matthewkeil
Copy link
Member

matthewkeil commented May 5, 2023

@dapplion @tuyennhv On initial investigation it does not appear that the transitions should be taking significantly longer post capella. The full transition is actually testing at 2% faster than pre-merge. There are some inner functions that are running much hotter but the overall speed doesn't speak to the 700-900ms jump.

Before Capella (slot 6123583):

Test Name Ops/Second Time per Op Runs Duration
altair processEpoch - mainnet_e191362 2.013578 496.6283 ms 12 6.49 s
mainnet_e191362 - altair beforeProcessEpoch 12.52470 79.84222 ms 33 3.27 s
mainnet_e191362 - altair processJustificationAndFinalization 51203.28 19.53000 us 5183 7.32 s
mainnet_e191362 - altair processInactivityUpdates 93.13199 10.73745 ms 159 2.47 s
mainnet_e191362 - altair processRewardsAndPenalties 15.85619 63.06684 ms 89 6.30 s
mainnet_e191362 - altair processRegistryUpdates 32946.76 30.35200 us 4820 7.00 s
mainnet_e191362 - altair processSlashings 464468.2 2.153000 us 3742 5.51 s
mainnet_e191362 - altair processEth1DataReset 1600000 625.0000 ns 445 1.68 s
mainnet_e191362 - altair processEffectiveBalanceUpdates 460.9181 2.169583 ms 268 1.72 s
mainnet_e191362 - altair processSlashingsReset 112981.6 8.851000 us 3504 5.33 s
mainnet_e191362 - altair processRandaoMixesReset 102280.9 9.777000 us 6139 8.37 s
mainnet_e191362 - altair processHistoricalRootsUpdate 484261.5 2.065000 us 4162 6.11 s
mainnet_e191362 - altair processParticipationFlagUpdates 98765.43 10.12500 us 1357 2.80 s
mainnet_e191362 - altair processSyncCommitteeUpdates 454338.9 2.201000 us 858 2.19 s
mainnet_e191362 - altair afterProcessEpoch 4.684762 213.4580 ms 12 3.58 s

After Capella (slot 6260415):

Test Name Ops/Second Time per Op Runs Duration
altair processEpoch - mainnet_e195638 2.066960 483.8023 ms 17 8.81 s
mainnet_e195638 - altair beforeProcessEpoch 11.24121 88.95836 ms 42 4.39 s
mainnet_e195638 - altair processJustificationAndFinalization 50443.91 19.82400 us 1630 3.15 s
mainnet_e195638 - altair processInactivityUpdates 100.6657 9.933875 ms 51 1.14 s
mainnet_e195638 - altair processRewardsAndPenalties 16.76064 59.66359 ms 140 9.17 s
mainnet_e195638 - altair processRegistryUpdates 30983.73 32.27500 us 3022 4.82 s
mainnet_e195638 - altair processSlashings 598802.4 1.670000 us 1820 3.29 s
mainnet_e195638 - altair processEth1DataReset 669344.0 1.494000 us 1829 3.29 s
mainnet_e195638 - altair processEffectiveBalanceUpdates 360.4157 2.774574 ms 79 1.00 s
mainnet_e195638 - altair processSlashingsReset 91684.24 10.90700 us 2374 4.14 s
mainnet_e195638 - altair processRandaoMixesReset 52273.92 19.13000 us 509 1.82 s
mainnet_e195638 - altair processHistoricalRootsUpdate 461042.0 2.169000 us 4367 6.34 s
mainnet_e195638 - altair processParticipationFlagUpdates 169520.3 5.899000 us 3086 4.84 s
mainnet_e195638 - altair processSyncCommitteeUpdates 453103.8 2.207000 us 2250 3.79 s
mainnet_e195638 - altair afterProcessEpoch 4.751082 210.4784 ms 12 3.55 s

Diff Between Runs

Test Name % Δ Ops/Second % Δ Time per Op Δ Runs Δ Duration
altair processEpoch 2.586% -2.558% 5 2.32 s
beforeProcessEpoch -10.245% 11.431% 9 1.12 s
processJustificationAndFinalization -1.504% 1.505% -3553 -4.17 s
processInactivityUpdates 8.107% -7.470% -108 -1.33 s
processRewardsAndPenalties 5.707% -5.413% 51 2.87 s
processRegistryUpdates -5.964% 6.337% -1798 -2.18 s
processSlashings 28.924% -22.289% -1922 -2.22 s
processEth1DataReset -58.274% 139.891% 1384 1.61 s
processEffectiveBalanceUpdates -21.791% 27.916% -189 -0.72 s
processSlashingsReset -18.839% 30.323% -1130 -1.19 s
processRandaoMixesReset -48.854% 95.383% -5630 -6.55 s
processHistoricalRootsUpdate -4.804% 5.016% 205 0.23 s
processParticipationFlagUpdates 72.031% -41.807% 1729 2.04 s
processSyncCommitteeUpdates -0.273% 0.272% 1392 1.60 s
afterProcessEpoch 1.419% -1.389% 0 -0.03 s

@dapplion
Copy link
Contributor Author

dapplion commented May 5, 2023

@matthewkeil got it! Thanks for looking into this. Closing issue since them it's likely caused for other factors on hardware. We can revisit latter if we have more evidence

@dapplion dapplion closed this as completed May 5, 2023
@twoeths
Copy link
Contributor

twoeths commented Jul 24, 2023

As of Jul 2023, sometimes it takes more than 3s to process epoch, this consistently happens on all mainnet nodes:

Screenshot 2023-07-24 at 09 30 23

The average epoch transition time is way higher than what was described originally in this issue:

Screenshot 2023-07-24 at 09 31 28

@twoeths
Copy link
Contributor

twoeths commented Aug 4, 2023

a lot of time could be for gc, v1.10.0 has way less gc and the epoch transition time was improved (this is from a mainnet node)

Screenshot 2023-08-04 at 15 50 45 Screenshot 2023-08-04 at 15 51 46

@philknows philknows added the meta-investigate Issues found that require further investigation and may not have a specific resolution/fix label Nov 7, 2023
@philknows
Copy link
Member

@twoeths
Copy link
Contributor

twoeths commented Nov 9, 2023

@twoeths
Copy link
Contributor

twoeths commented Dec 22, 2023

an idea to improve processRewardsAndPenalties(): #6229

@wemeetagain
Copy link
Member

closing since we've done a lot of work to reduce epoch transition time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-medium Resolve this some time soon (tm).
Projects
None yet
Development

No branches or pull requests

6 participants