Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collector seems to have an impact on webserver responsiveness #2757

Closed
sanderegg opened this issue Jan 24, 2022 · 1 comment
Closed
Assignees
Labels
bug buggy, it does not work as expected High Priority a totally crucial bug/feature to be fixed asap

Comments

@sanderegg
Copy link
Member

As discovered today with DK, it seems the interval of the garbage collector has an influence on the fact that the webserver becomes periodically unable to answer the healthcheck within 1 second.

DK used a script to bombard the webserver with healtcheck using curl from the simcore_traefik node, showing that there is a periodic request that returns in more than 1 second. (reference here)

By changing the WEBSERVER_GARBAGE_COLLECTION_INTERVAL_SECONDS variable from 30 seconds to 300 made the error completely disappear.
Now it is set to 90seconds.

But it seems that whatever the garbage collector is doing, that it completely prevents the webserver from doing its work correctly.

@sanderegg sanderegg added the bug buggy, it does not work as expected label Jan 24, 2022
@mrnicegyu11
Copy link
Member

Ill quickly add that when the garbage collector is running, this blocks all endpoints, and thus affects the /v0/health and /v0/ endpoints specifically as well. If traefik checks these endpoints at this time, it will determine the webserver to be unhealthy and stop forwarding any requests to the container. Instead, traefik will answer every request to the webserver with a 503. With high confidence, this is the origin of the statusping failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug buggy, it does not work as expected High Priority a totally crucial bug/feature to be fixed asap
Projects
None yet
Development

No branches or pull requests

4 participants