-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to use the kubernetes scheduler for workflow pods #111
Conversation
This should only be used when rwx volumes are supported or when using a single node cluster.
If the kube scheduler is used to hold jobs until sufficient resources are available, then prepare job needs to wait for a longer period until the workflow pod is running. This timeout will mostly need an increase in cases where many jobs are triggered which together exceed the resources available in the cluster. The workflows can then be gracefully handled later when sufficient resources become available again.
@nikola-jokic addressed the comments. I had the idea yesterday that this should maybe be taken further with pod affinity (for a separate PR?). ReadWriteOnce volumes will still always fail early as the workflow pod will still always get the node name assigned. But we can go through the scheduler and get on the same node with pod affinity by attracting the workflow pod to the runner pod. The problem now still is that a unique label should be added to the runner pod (e.g. Let me know if this is something I should look into a bit further 😄 |
FTR I've been testing this MR for the last couple of days and, except some fiddling with EFS permissions in RWX mode, it works just fine and solves my issue. Can't wait to see this merged, thanks a lot 🙏 If you're looking for some examples to document, here is my EFS config in EKS that seems to be bulletproof :
uid/gid to "0" is required by the workflow pod (when using action/checkout@v3 for example, which expects filesystem to be owned by user, so root) |
Excellent contrib. I will wait for the merge/release before testing on GKE. This solution is much more elegant than what I would have suggested, which was to attach the This PR essentially decouples the runner from the container workflow, and IMO, should be the default configuration, with a caveat of having the RWM as a prerequisite. Looking forward to this release. Thanks! |
I've recently come to use Actions after being in Gitlab for a few years. The way Gitlab handles self-hosted runners in k8s is simple. You can specify whatever docker image you like in the pipeline code, and that's the one it uses for the runner. There's no concept of runner+worker, no 2nd pod at all. If you want a Service, like postgres, or redis, or even DinD, it's a sidecar in that pod. So there's no need for ephemeral storage, and no worries about scheduling because everything happens in a single pod. I hope that doesn't come off as coming in here and complaining "why can't you be like Gitlab". That's not my intention at all. I would just like to understand why things are this way with Actions runners. The runner+worker pattern makes configuring and using self-hosted k8s runners a lot more complicated than I was expecting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great change thank you!
Hey @johnjeffers, Thank you for raising the question! The way GitHub runner is structured is to have a listener and the worker. The RunnerListener picks up the job, evaluates it, and then passes it down to the worker. That way, the runner can be ready to receive a job before the job is known to the runner (while the runner is idle). What I understood from your message would be equivalent to pulling out the listener's responsibilities inside the controller and then creating a runner worker with side-car containers. Since the listener is part of the runner, the same runner can be configured to run docker or hook commands. Having the listener accept the job allows us to pre-emptively scale up or down (min-runners field in ARC), so as soon as the job is acquired, the runner would be able to pick it up. It is a trade-off with volumes, but the runner is designed to work on any kind of machine, being inside the virtual machine, or inside the k8s. |
@Wielewout as for the suggestion here, it does sound good to me, but it will slightly complicate the hook. The hook was meant to be a simple starting point. However, the suggestion is great in my opinion. So I will propose it to the team and see if there are any objections. Again, thank you for these awesome contributions! |
@Wielewout @nikola-jokic regarding the suggestion here, is there a way to do this with the existing implementation? I'm afraid RWX is not an option for us, as the minimum size for a single dynamic GCP filestore csi persistent volume is 1TiB 😞. Can the > k get po --show-labels
NAME READY STATUS RESTARTS AGE LABELS
gke-runner-default-rp68p-runner-vw49d 1/1 Running 0 36m actions-ephemeral-runner=True,actions.github.com/organization=companyb,actions.github.com/scale-set-name=gke-runner-default,actions.github.com/scale-set-namespace=gh-actions-runners,app.kubernetes.io/component=runner,app.kubernetes.io/part-of=gha-runner-scale-set,app.kubernetes.io/version=0.7.0,pod-template-hash=74647967fb |
With the current implementation of the hook, it is not. We are setting the node name explicitly. I think that having this feature would require a trivial change on the ARC side to set the label of the pod and potentially an env. Then with the hook extension I suppose it could be done, but I have to say it would be tricky. I will bring this solution up to the team |
I'm in the middle of trying to switch a self-hosted k8s CI setup from GitLab to GitHub myself. While I certainly don't know why ARC is architected exactly how it is, I can say that the requirements imposed by the feature set of GitHub actions demand a more complex implementation. The biggest requirement that I can see is that a single job can involve multiple containerized actions executed in series (in the job's steps), each working on the same filesystem (which belongs to the job). And with the kubernetes hooks in this repo, this is done with PersistentVolumes. As far as I know, GitLab has nothing like this. I've seen many more usages of PersistentVolumes in the k8s hooks implementation, but that one seems to be the most onerous, from an implementer's perspective. I am dreaming of an implementation of the hooks that simply doesn't support features like that one, so that PVs could be ditched 🌞. It seems that the maintainers have tried to find alternatives to using PVs, to no avail. As it is, I agree with you that it seems pretty hard to achieve the same kinds of outcomes with ARC's k8s mode as you could with GitLab's k8s runner, with respect to things like autoscaling and scheduling. |
As described in #112 workflow pods aren't going via the kubernetes scheduler, resulting in failed "initialize containers" steps if the already selected node is out of resources. With the following changes I was able to get this working.
This PR adds two new environment variables:
ACTIONS_RUNNER_USE_KUBE_SCHEDULER
andACTIONS_RUNNER_PREPARE_JOB_TIMEOUT_SECONDS
.When set to
true
, the former skips setting the node name and let's the kubernetes scheduler handle the workflow pod. This would only work on a single node cluster (important for the newly added test) or whenReadWriteMany
volumes are used.The latter can be used to increase the timeout to for example 6 hours from the default 10 minutes. This will be necessary for scenarios where the cluster runs out of resources and we want to handle this more gracefully. When using the kubernetes scheduler the pod will remain pending until resources are available somewhere in the cluster.
Also a quick fix was added to skip the name override warning which is always there when using the pod template.