Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Batch Worker Manager #3549

Merged
merged 14 commits into from
Jun 20, 2021
Merged

Kubernetes Batch Worker Manager #3549

merged 14 commits into from
Jun 20, 2021

Conversation

teetone
Copy link
Collaborator

@teetone teetone commented May 26, 2021

The GCP approach for GKE authentication doesn't seem to work for long running processes. I fixed this by authenticating with Kubernetes directly. I also made it general by converting the GCP worker manager to a Kubernetes worker manager.

@teetone teetone requested a review from epicfaace May 27, 2021 01:26
@teetone teetone marked this pull request as ready for review May 27, 2021 01:26
@teetone teetone changed the title Fix GCP Kubernetes cluster authentication Kubernetes Batch Worker Manager Jun 4, 2021
Copy link
Member

@epicfaace epicfaace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I guess it's concerning that we don't know why exactly the GKE auth failed, but this is a good step anyway to directly connect to k8s so that it can be made more general.

codalab/worker_manager/kubernetes_batch_worker_manager.py Outdated Show resolved Hide resolved
codalab_service.py Outdated Show resolved Hide resolved
@teetone teetone requested a review from epicfaace June 19, 2021 16:34
@mergify mergify bot merged commit ea31035 into master Jun 20, 2021
@mergify mergify bot deleted the gcp-auth branch June 20, 2021 19:46
@teetone teetone mentioned this pull request Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants