-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme I/O lag on overloaded nodes #4002
Labels
epic
Issues used as milestones and tracking multiple issues.
scope-performance
Performance issue and ideas to improve performance.
Comments
This was referenced May 10, 2022
This was referenced May 12, 2022
Contabo machines
There is huge difference between a regular beacon node of 1 validator and one with 1 validator and
Hetzner machines
|
|
This was referenced May 27, 2022
Right now we use sync apis in some places which block the event loop
|
dapplion
added
the
scope-performance
Performance issue and ideas to improve performance.
label
Jun 29, 2022
Closed
As of Oct 2023, we already implemented network thread and batch attestation validation so this is not an issue anymore |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
epic
Issues used as milestones and tracking multiple issues.
scope-performance
Performance issue and ideas to improve performance.
NodeJS due to its single threaded nature experiences exponential degradation of performance under heavy load. We have circumstantial evidence that in low power machines, when attempting to run too many keys performance suffers. Overall time to perform tasks increases, but any network call (vc call beacon api) time increases by x10, x100 respective to internal function times (block processing)
By running large amounts of keys on not shitty servers the issue is not apparent. However this issue still poses a future risk
Step 1: quantify
Circumstantial evidence is not enough, we need proper documented metrics on when this issue manifest and how bad it is. We will run Lodestar in different configurations and different servers:
Validator configurations:
Servers:
Data to collect:
vc_rest_api_client_request_time_seconds_bucket{routeId="produceAttestationData"}
. Average, median, p95, p99.Step 2: reproduce
If the data above proves that this issue of sufficient severity we need to have a reproducible simple case to collect help and potential solutions. A reproducible case must be:
Step 3: mitigate
Provided that a Lodestar is under heavy load we want to minimize this I/O lag to acceptable levels, such that validator performance doesn't degrade. If an VC request time increases from 50ms to 500ms, it's tolerable. However if it increases from 50ms to 5000ms, the validator may miss an attestation and reduce profitability.
Related issues
May be the cause of this issues:
The text was updated successfully, but these errors were encountered: