Consider increasing max_semi_space_size #2115

dapplion · 2021-02-28T19:47:29Z

According to GC metrics on master the beacon node spends 10% of the time doing Scavenge GC runs.

If the metrics are correct this article suggests increasing max_semi_space_size with the flag --max_semi_space_size to reduce the frequency of these runs.

https://www.alibabacloud.com/blog/better-node-application-performance-through-gc-optimization_595119

The text was updated successfully, but these errors were encountered:

stale · 2021-06-02T17:19:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale · 2022-09-21T03:12:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

wemeetagain · 2023-08-07T22:23:47Z

cc @matthewkeil re #5829

matthewkeil · 2023-08-07T22:40:23Z

@wemeetagain I had this set on beta when we discovered the memory leak in #5851 so it was hard to see full effect but it did cut the Scavenge time in line with the worker new space adjustment before things started to get wonky when the heap grew from the leak. It will most definitely be a nice tune-up though and I will get this set correctly once we get the leak addressed and can see the fruits of the change!!

matthewkeil · 2023-08-07T22:44:46Z

As a note, the docs are not super clear about what "semi" space is. But after digging in the codebase its related to young generation and kNewLargeObjectSpaceToSemiSpaceRatio which always equals 1;

size_t Heap::YoungGenerationSizeFromSemiSpaceSize(size_t semi_space_size) {
  return semi_space_size * (2 + kNewLargeObjectSpaceToSemiSpaceRatio);
}

matthewkeil · 2023-08-10T19:28:21Z

Memory leak #5851 resolved. Deploying unstable to feat2 with --max-semi-space-size=64. The max scavenge witnessed currently was on mainnet in the unstable group. All other instances were lower and some substantially lower.
Unsure how the larger value will affect performance on the smaller instances so will let run for a few days without the worker to see what turns up.

unstable-mainnet-hzax41

unstable-novc-ctvpss

unstable-sm1v-ctvpss

matthewkeil · 2023-08-13T04:50:45Z

Results

feat2 with --max-semi-space-size=64
feat1 with --max-semi-space-size=128
beta with --max-semi-space-size=256

all runs are with use_worker=false

`group-mainnet-hzax41`

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat2-mainnet-hzax41&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat1-mainnet-hzax41&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691832600000&to=1691904600000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cbeta-mainnet-hzax41&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

`feat2`

`feat1`

`beta`

`group-lg1k-hzax41`

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat2-lg1k-hzax41&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat1-lg1k-hzax41&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691832600000&to=1691904600000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cbeta-lg1k-hzax41&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

`feat2`

`feat1`

`beta`

`feat2-md16-ctvpsm`

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat2-md16-ctvpsm&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat1-md16-ctvpsm&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691832600000&to=1691904600000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cbeta-md16-ctvpsm&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

`feat2`

`feat1`

`beta`

`feat2-sm1v-ctvpss`

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat2-sm1v-ctvpss&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691814600000&to=1691901000000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cfeat1-sm1v-ctvpss&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

https://grafana-lodestar.chainsafe.io/d/lodestar_vm_host/lodestar-vm-host?from=1691832600000&to=1691904600000&var-DS_PROMETHEUS=default&var-rate_interval=1h&var-Filters=instance%7C%3D%7Cbeta-sm1v-ctvpss&orgId=1&var-beacon_job=$%7BVAR_BEACON_JOB%7D

`feat2`

`feat1`

`beta`

matthewkeil · 2023-08-16T06:00:59Z

deployed 512mb to beta at timestamp below. Will pull metrics in a day after it stabilizes to see how they compare to 256mb that was also on beta

matthewkeil · 2023-08-22T09:52:55Z

Moving the new space to 512mb was detrimental. On mainnet the scavenge dropped but for some reason the mark-and-sweep started to climb considerably. I am not sure of the phenomena here but it's not worth investigating.

The sweet spot for setting new space is at a value similar to the net rate at which objects are created/collected such that they exist in only the from space and do not get moved to the to space (as a net average) when collection occurs. GC tends to drop a bit further (as a % of CPU time) up to roughly as threshold of two times the rate of object creation. At this point though the space is so large that it tends to affect performance of the node, likely do to searching for objects during runtime.

Current creation rate on unstable-mainnet-hzax41on a 30 day timeline with 7d $rate_interval is roughly 150mb. To keep the number to an even page interval a setting of 148mb or 152mb is recommended.

Using a 6h $rate_interval to confirm makes those values seem appropriate for a first go. We can reassess in another couple of weeks after setting to see how things proceed.

This value is set at the command line however it is possible to programmatically adjust it during startup from historical data. The same is true of the new space adjustment for the network worker and assumptions made here apply to the worker as most of the scavenged garbage is network related. This investigation started with #5829 and that fact was proven out there. As a note once the heap size is set at startup it is not possible to change the value in either case.

matthewkeil · 2023-08-23T22:33:51Z

Final not for reference:

A value that is too low results in excess scavenge GC time.
A value that is marginally too high results in node performance degradation from increased variable lookup time
A value that is very high results in mark-and-sweep collection (unknown reason because not researched further)

To set this value in the future run the node without setting a maxYoungGeneration size and see what the net scavenge collection rate is and set this to the same value. See example in images above. That value seems to be near the sweet spot. If this methodology is refined in the future another note will be added below.

matthewkeil · 2023-08-24T14:13:09Z

Closed via #5829

stale bot added the bot:stale label Jun 2, 2021

dapplion self-assigned this Jun 3, 2021

stale bot removed the bot:stale label Jun 3, 2021

dapplion added the scope-performance Performance issue and ideas to improve performance. label May 12, 2022

dapplion removed their assignment May 12, 2022

stale bot added the meta-stale label Sep 21, 2022

philknows removed the meta-stale label Sep 23, 2022

dapplion added the prio-low This is nice to have. label Sep 29, 2022

This was referenced Aug 16, 2023

Low peers in test mainnet node #5891

Closed

Big NodeJS Event Loop Lag after v1.10.0 #5893

Closed

matthewkeil mentioned this issue Aug 22, 2023

feat(beacon-node): network worker new space adjustment #5829

Merged

matthewkeil closed this as completed Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider increasing max_semi_space_size #2115

Consider increasing max_semi_space_size #2115

dapplion commented Feb 28, 2021 •

edited

Loading

stale bot commented Jun 2, 2021

stale bot commented Sep 21, 2022

wemeetagain commented Aug 7, 2023

matthewkeil commented Aug 7, 2023

matthewkeil commented Aug 7, 2023

matthewkeil commented Aug 10, 2023 •

edited

Loading

matthewkeil commented Aug 13, 2023 •

edited

Loading

matthewkeil commented Aug 16, 2023

matthewkeil commented Aug 22, 2023

matthewkeil commented Aug 23, 2023

matthewkeil commented Aug 24, 2023

Consider increasing max_semi_space_size #2115

Consider increasing max_semi_space_size #2115

Comments

dapplion commented Feb 28, 2021 • edited Loading

stale bot commented Jun 2, 2021

stale bot commented Sep 21, 2022

wemeetagain commented Aug 7, 2023

matthewkeil commented Aug 7, 2023

matthewkeil commented Aug 7, 2023

matthewkeil commented Aug 10, 2023 • edited Loading

matthewkeil commented Aug 13, 2023 • edited Loading

Results

group-mainnet-hzax41

feat2

feat1

beta

group-lg1k-hzax41

feat2

feat1

beta

feat2-md16-ctvpsm

feat2

feat1

beta

feat2-sm1v-ctvpss

feat2

feat1

beta

matthewkeil commented Aug 16, 2023

matthewkeil commented Aug 22, 2023

matthewkeil commented Aug 23, 2023

matthewkeil commented Aug 24, 2023

dapplion commented Feb 28, 2021 •

edited

Loading

matthewkeil commented Aug 10, 2023 •

edited

Loading

matthewkeil commented Aug 13, 2023 •

edited

Loading

`group-mainnet-hzax41`

`feat2`

`feat1`

`beta`

`group-lg1k-hzax41`

`feat2`

`feat1`

`beta`

`feat2-md16-ctvpsm`

`feat2`

`feat1`

`beta`

`feat2-sm1v-ctvpss`

`feat2`

`feat1`

`beta`