-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
containerMode
option to allow running jobs in k8's instead of docker
#1546
Conversation
Added container mode "kubernetes" envs
update errors
…ctions-runner-controller into nikola-jokic/work-volume-mounts
update manifests
…pods added concurrent cleanup before runner pod is deleted
…od-delete Check linked pod is in deleting phase status to skip it
@@ -76,6 +76,16 @@ func (r *Runner) Validate() error { | |||
errList = append(errList, field.Invalid(field.NewPath("spec", "repository"), r.Spec.Repository, err.Error())) | |||
} | |||
|
|||
err = r.Spec.ValidateWorkVolumeClaimTemplate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adopting the existing pattern of ValidateFoo
here!
This made me realize that we might want to introduce another centralized function like RunnerSpec.Validate()
that calls out to the three Validate*
functions in the future, so that it's a bit more maintainable(in terms of less chance to accidentally miss adding another new ValidateFoo
call to one out of three places)
// Just take this as a pure comment, not a request for change. It's out of the scope of this pull request. I'd greatly appreciate it if you could submit another pr for that though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense, I'll follow up with a separate pr for that if that works for you!
controllers/runner_pod_controller.go
Outdated
r.List(ctx, &runnerLinkedPodList, client.MatchingLabels( | ||
map[string]string{ | ||
"runner-pod": pod.ObjectMeta.Name, | ||
}, | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to handle the potential error here. Also, the lack of client.InNamespace
seems to result in errors like Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:actions-runner-system:actions-runner-controller" cannot list resource "secrets" in API group "" at the cluster scope
.
I also noticed that our helm chart doesn't provide the rbac resources to let ARC list and get secrets, which resulted in the runnerpod reconcile hanging after this line(perhaps it panicked silently? Not sure, but anyway this function didn't either return any error or continue processing other terimnating pods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9cd1272 fixes these issues!
Let me add a fix for this small nit: 758c2a3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you've enabled the feature flag on the backend, I was finally able to verify that it's working!
Here's my workflow definition:
jobs:
test0:
runs-on: test-fzowlprkxh
container: "golang:1.18"
steps:
- uses: actions/checkout@v2
- run: go version
- run: go build .
name: E2E TestE2E fzowlprkxh
"on":
push:
branches:
- main
It's interesting to see that, even though I've set container: "golang:1.18"
for running various go commands, somehow nodejs-based actions like actions/checkout@v2
seem to be working as well! In the workflow logs I see it called out to /runner/k8s/index.js
so perhaps it created a dedicated nodejs pod for running the action?
Anyway, it did work as advertised and I love it! Thank you for all your efforts to make it happen 🎉
we're currently using gke workload identity to give our runners access to gcp. with this new containerMode there doesn't seem like there is a way to pass a ksa to the pod that runs the container job? |
@tedchang77 Yep! I think so. Honestly, I'm even unsure how it could work or how should ARC support that. There's a similar feature for AWS and it won't work. All the data that the runner and job pods share should be contained within the PV-backed work dir. Can you copy the identity-related file(s) to the work dir and point the K8s client (or kubectl run within a container-based job step) to it? |
@tedchang77 BTW, could you clarify a bit on why you need the workload identity in the kubernetes container mode? If all you need is to run e.g. kubectl with the workload identity, you might be able to just disable dind and set |
Our jobs need to deploy gcp resources like vpcs, gke, cloud sql, etc.. which require gcp credentials and we don't want to store and pass in long lived SA keys through Env variables. For deployments to gke even if we download kubectl or have it baked in the image we stil need to have gcp creds. |
One workaround that I can think of -not sure how good it is- is to create a separate namespace for each runner and use workload identity to map the default ksa to a gcp sa. If the runner creates the pod and doesnt specify a Ksa it will be assigned the default ksa for the namespace and have the correct gcp permissions. The runner pod wouldn't use workload identity and it's ksa would only have permissions to create the job container pods. No need to have any long lived gcp credentials with this solution. |
@tedchang77 Yeah that might work if you need to use this kubernetes container mode. Or perhaps you don't even need a new namaepsace. The updated RunnerDeployment spec accepts But it would just work if you used |
@mumoshu my preference is to separate the runner image from the container job image but your suggestion should work. we'll try this next week and report back. thanks for all the help! |
@tedchang77 Makes sense. Thanks for the clarification and your support! I'm looking forward to hearing back from you |
@mumoshu our custom runner image has all of our tool installed now and we can deploy using the custom runner image and the SA that we configured using GKE workload identity. i think it's still possible to use k8s container mode but would require us to:
if the runner-container-hooks project implements: actions/runner-container-hooks#1 it would solve the second issue. the challenge with k8s container mode is that some of the k8s configurations, like k8s SAs, don't have an equivalent in the docker container options: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idcontaineroptions for now, we will use custom runner but if we implement container mode i'll report back on the results. thanks! |
@tedchang77 Hey! Thanks for the feedback.
I'm interested in this part! May I ask your goal for that once again? Do you want job container pods to share the SA with the runner pod, and/or want to separate SA per each job container pod? My impression was your goal is the former. And, the former could be implemented using some mechanism to copy the SA token or any identity-related information from the runner pod to the job container pods. Probably that's within the scope of ARC and can be a valid feature request to ARC if we could agree on the exact requirements of the feature. If you could try building a PoC of it, you could modify The "work" directory is shared across the runner and the job container pods. In your workflow definition, you could prepend a "cp" to copy the files from the work dir to |
@thboop Could you elaborate what you mean by this? I currently have the following workflow: My goal is to avoid running any |
apologies @mumoshu but I had to put this on hold as some other priorities came up. i found this PR which looks like it will solve the problem once its merged. |
Description
This PR adds the ability to use the runner container hooks to invoke container jobs and steps via kubernetes, rather then relying on the docker implementation. This allows us to avoid the use of
privileged
containers while still being able to run container scenarios. By running the job in a separate pod (not on the runner pod), we can safely pass a service account with elevated permissions to the runner pod, so it can dynamically spin up pods and k8s jobs to run the workflow.If you set
containerMode
tokubernetes
, you will be required to provide a service account with the appropriate permissions, and a storage class we can provision the runner working directory from. More details are in the readme updates in this PR.Volume mount
A work volume mount is used to pass the working directory between pods. By default, we require that the mounts we spin up for jobs exist on the same worker node as the runner pod. If you use a storage class with ReadWriteMany, we will schedule it on any available worker node.
Decisions