-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conda-store pods get evicted on AWS #738
Comments
This is most likely related to #735 |
Some more logging
|
This is due to to the default node size on AWS, which is 20GB. We need to make this customisable. For GCP its 100 GB IIRC |
@aktech does this mean that the eks provisioned nodes need to have a larger disk space? Or is it the size of the shared filesystem. |
It's the size of the eks nodes need to have a larger disk space. |
After my redeployment - and installing from the latest commit of qhub (
Deploying |
I think you would need a new deployment, if the volume node is already spun up in a conflicting zone then its very less likely that it will be moved after updating it to the latest version. |
That makes sense. Using the AWS console to confirm, the Availability Zones for the To get back to a working state, I drained the
And then I will manually kill any pods that won't be forced drained. This will put the node in a "cordoned" state and a new node should soon after spin up (and if you're lucky and the node is launched in the same AZ as your volume mounts), then the pods that were drained will be spun up on the new node. |
Describe the bug
Seems like conda-store pods get evicted on AWS on a fresh deployment. This is tested with latest main commit: e799211
Describe on the pod:
The text was updated successfully, but these errors were encountered: