Add option to use the kubernetes scheduler for workflow pods #111

Wielewout · 2023-10-23T07:59:32Z

As described in #112 workflow pods aren't going via the kubernetes scheduler, resulting in failed "initialize containers" steps if the already selected node is out of resources. With the following changes I was able to get this working.

This PR adds two new environment variables: ACTIONS_RUNNER_USE_KUBE_SCHEDULER and ACTIONS_RUNNER_PREPARE_JOB_TIMEOUT_SECONDS.

When set to true, the former skips setting the node name and let's the kubernetes scheduler handle the workflow pod. This would only work on a single node cluster (important for the newly added test) or when ReadWriteMany volumes are used.

The latter can be used to increase the timeout to for example 6 hours from the default 10 minutes. This will be necessary for scenarios where the cluster runs out of resources and we want to handle this more gracefully. When using the kubernetes scheduler the pod will remain pending until resources are available somewhere in the cluster.

Also a quick fix was added to skip the name override warning which is always there when using the pod template.

This should only be used when rwx volumes are supported or when using a single node cluster.

If the kube scheduler is used to hold jobs until sufficient resources are available, then prepare job needs to wait for a longer period until the workflow pod is running. This timeout will mostly need an increase in cases where many jobs are triggered which together exceed the resources available in the cluster. The workflows can then be gracefully handled later when sufficient resources become available again.

packages/k8s/tests/prepare-job-test.ts

packages/k8s/src/k8s/utils.ts

packages/k8s/src/k8s/index.ts

Wielewout · 2023-10-25T05:41:18Z

@nikola-jokic addressed the comments.

I had the idea yesterday that this should maybe be taken further with pod affinity (for a separate PR?). ReadWriteOnce volumes will still always fail early as the workflow pod will still always get the node name assigned. But we can go through the scheduler and get on the same node with pod affinity by attracting the workflow pod to the runner pod.

The problem now still is that a unique label should be added to the runner pod (e.g. actions.github.com/runner-name=<ephemeral-runner-name-also-pod-name>) within actions-runner-controller when an ephemeral runner is creating its pod. The hooks can access the name via the environment (ACTIONS_RUNNER_POD_NAME) to inject the name in the affinity expression (which then needs to be underneath requiredDuringSchedulingIgnoredDuringExecution for ReadWriteOnce volumes). Then the kube scheduler can use the label to schedule the workflow pod on the same node as the runner or it will keep the pod pending until resources are available.

Let me know if this is something I should look into a bit further 😄

guillaumevillemont · 2023-10-25T14:36:33Z

FTR I've been testing this MR for the last couple of days and, except some fiddling with EFS permissions in RWX mode, it works just fine and solves my issue. Can't wait to see this merged, thanks a lot 🙏

If you're looking for some examples to document, here is my EFS config in EKS that seems to be bulletproof :

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-dynamic-provisioning
  annotations:
    storageclass.kubernetes.io/is-default-class: 'false'
provisioner: efs.csi.aws.com
parameters:
  basePath: /dynamic_provisioning
  directoryPerms: '777'
  ensureUniqueDirectory: 'true'
  fileSystemId: fs-xxxxxxxxxx
  uid: '0'
  gid: '0'
  provisioningMode: efs-ap
  reuseAccessPoint: 'false'
  subPathPattern: ${.PVC.namespace}/${.PVC.name}
reclaimPolicy: Delete
volumeBindingMode: Immediate

uid/gid to "0" is required by the workflow pod (when using action/checkout@v3 for example, which expects filesystem to be owned by user, so root)
and perms 777 so runner pod can write to _work while running with uid:1001 gid:123.

israel-morales · 2023-10-27T21:48:50Z

Excellent contrib. I will wait for the merge/release before testing on GKE.

This solution is much more elegant than what I would have suggested, which was to attach the -workflow pod to the runner via a sidecar. The sidecar approach would also address the volumeMount dependency and ensure both containers are properly scheduled by the kubeScheduler as a single pod, respecting container-level requests/limits. However, I'm sure there are pitfalls with the sidecar logic.

This PR essentially decouples the runner from the container workflow, and IMO, should be the default configuration, with a caveat of having the RWM as a prerequisite.

Looking forward to this release. Thanks!

johnjeffers · 2023-10-27T22:02:48Z

I've recently come to use Actions after being in Gitlab for a few years. The way Gitlab handles self-hosted runners in k8s is simple. You can specify whatever docker image you like in the pipeline code, and that's the one it uses for the runner. There's no concept of runner+worker, no 2nd pod at all. If you want a Service, like postgres, or redis, or even DinD, it's a sidecar in that pod. So there's no need for ephemeral storage, and no worries about scheduling because everything happens in a single pod.

I hope that doesn't come off as coming in here and complaining "why can't you be like Gitlab". That's not my intention at all. I would just like to understand why things are this way with Actions runners. The runner+worker pattern makes configuring and using self-hosted k8s runners a lot more complicated than I was expecting.

nikola-jokic

LGTM! Great change thank you!

nikola-jokic · 2023-10-31T11:40:52Z

Hey @johnjeffers,

Thank you for raising the question! The way GitHub runner is structured is to have a listener and the worker. The RunnerListener picks up the job, evaluates it, and then passes it down to the worker. That way, the runner can be ready to receive a job before the job is known to the runner (while the runner is idle). What I understood from your message would be equivalent to pulling out the listener's responsibilities inside the controller and then creating a runner worker with side-car containers. Since the listener is part of the runner, the same runner can be configured to run docker or hook commands. Having the listener accept the job allows us to pre-emptively scale up or down (min-runners field in ARC), so as soon as the job is acquired, the runner would be able to pick it up. It is a trade-off with volumes, but the runner is designed to work on any kind of machine, being inside the virtual machine, or inside the k8s.

nikola-jokic · 2023-10-31T11:51:05Z

@Wielewout as for the suggestion here, it does sound good to me, but it will slightly complicate the hook. The hook was meant to be a simple starting point. However, the suggestion is great in my opinion. So I will propose it to the team and see if there are any objections. Again, thank you for these awesome contributions!
And thank you, @guillaumevillemont, for raising this issue and testing it out!

israel-morales · 2023-11-23T11:34:54Z

@Wielewout @nikola-jokic regarding the suggestion here, is there a way to do this with the existing implementation?

I'm afraid RWX is not an option for us, as the minimum size for a single dynamic GCP filestore csi persistent volume is 1TiB 😞.

Can the actions.github.com/scale-set-name and pod-template-hash labels be used somehow? Thanks!

> k get po --show-labels
NAME                                             READY   STATUS    RESTARTS   AGE   LABELS
gke-runner-default-rp68p-runner-vw49d            1/1     Running   0          36m   actions-ephemeral-runner=True,actions.github.com/organization=companyb,actions.github.com/scale-set-name=gke-runner-default,actions.github.com/scale-set-namespace=gh-actions-runners,app.kubernetes.io/component=runner,app.kubernetes.io/part-of=gha-runner-scale-set,app.kubernetes.io/version=0.7.0,pod-template-hash=74647967fb

nikola-jokic · 2023-11-24T09:53:03Z

With the current implementation of the hook, it is not. We are setting the node name explicitly. I think that having this feature would require a trivial change on the ARC side to set the label of the pod and potentially an env. Then with the hook extension I suppose it could be done, but I have to say it would be tricky. I will bring this solution up to the team

isker · 2024-04-06T00:45:59Z

I would just like to understand why things are this way with Actions runners. The runner+worker pattern makes configuring and using self-hosted k8s runners a lot more complicated than I was expecting.

I'm in the middle of trying to switch a self-hosted k8s CI setup from GitLab to GitHub myself. While I certainly don't know why ARC is architected exactly how it is, I can say that the requirements imposed by the feature set of GitHub actions demand a more complex implementation.

The biggest requirement that I can see is that a single job can involve multiple containerized actions executed in series (in the job's steps), each working on the same filesystem (which belongs to the job). And with the kubernetes hooks in this repo, this is done with PersistentVolumes. As far as I know, GitLab has nothing like this.

I've seen many more usages of PersistentVolumes in the k8s hooks implementation, but that one seems to be the most onerous, from an implementer's perspective. I am dreaming of an implementation of the hooks that simply doesn't support features like that one, so that PVs could be ditched 🌞. It seems that the maintainers have tried to find alternatives to using PVs, to no avail.

As it is, I agree with you that it seems pretty hard to achieve the same kinds of outcomes with ARC's k8s mode as you could with GitLab's k8s runner, with respect to things like autoscaling and scheduling.

Wielewout added 3 commits October 21, 2023 22:35

Add option to use kube scheduler

2a2bbcc

This should only be used when rwx volumes are supported or when using a single node cluster.

Skip name override warning when names match or job extension

b000347

Wielewout requested review from a team as code owners October 23, 2023 07:59

nikola-jokic reviewed Oct 24, 2023

View reviewed changes

packages/k8s/tests/prepare-job-test.ts Outdated Show resolved Hide resolved

nikola-jokic reviewed Oct 24, 2023

View reviewed changes

packages/k8s/src/k8s/utils.ts Show resolved Hide resolved

nikola-jokic reviewed Oct 24, 2023

View reviewed changes

packages/k8s/src/k8s/index.ts Outdated Show resolved Hide resolved

Wielewout added 2 commits October 24, 2023 18:45

Add guard for positive timeouts with a warning

dffadc5

Write out ReadWriteMany in full

7e74560

Wielewout requested a review from nikola-jokic October 27, 2023 17:10

ragwell approved these changes Oct 30, 2023

View reviewed changes

nikola-jokic approved these changes Oct 31, 2023

View reviewed changes

nikola-jokic merged commit 90a6236 into actions:main Oct 31, 2023
3 checks passed

Wielewout mentioned this pull request Nov 14, 2023

ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE warns about container name #116

Closed

nikola-jokic mentioned this pull request Nov 23, 2023

Executing the custom container implementation failed due to Node out of resources #112

Closed

4 tasks

dongho-jung mentioned this pull request Dec 12, 2023

Workflow pods are still not scaling out. Are they working well for others? #121

Closed

nikola-jokic mentioned this pull request Jun 25, 2024

ARC should handle OOM killed runners #143

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to use the kubernetes scheduler for workflow pods #111

Add option to use the kubernetes scheduler for workflow pods #111

Wielewout commented Oct 23, 2023

Wielewout commented Oct 25, 2023

guillaumevillemont commented Oct 25, 2023 •

edited

Loading

israel-morales commented Oct 27, 2023 •

edited

Loading

johnjeffers commented Oct 27, 2023

nikola-jokic left a comment

nikola-jokic commented Oct 31, 2023

nikola-jokic commented Oct 31, 2023

israel-morales commented Nov 23, 2023 •

edited

Loading

nikola-jokic commented Nov 24, 2023

isker commented Apr 6, 2024

Add option to use the kubernetes scheduler for workflow pods #111

Add option to use the kubernetes scheduler for workflow pods #111

Conversation

Wielewout commented Oct 23, 2023

Wielewout commented Oct 25, 2023

guillaumevillemont commented Oct 25, 2023 • edited Loading

israel-morales commented Oct 27, 2023 • edited Loading

johnjeffers commented Oct 27, 2023

nikola-jokic left a comment

Choose a reason for hiding this comment

nikola-jokic commented Oct 31, 2023

nikola-jokic commented Oct 31, 2023

israel-morales commented Nov 23, 2023 • edited Loading

nikola-jokic commented Nov 24, 2023

isker commented Apr 6, 2024

guillaumevillemont commented Oct 25, 2023 •

edited

Loading

israel-morales commented Oct 27, 2023 •

edited

Loading

israel-morales commented Nov 23, 2023 •

edited

Loading