Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minRunners not being respected #3619

Open
4 tasks done
gfrid opened this issue Jun 23, 2024 · 5 comments
Open
4 tasks done

minRunners not being respected #3619

gfrid opened this issue Jun 23, 2024 · 5 comments
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@gfrid
Copy link

gfrid commented Jun 23, 2024

Checks

Controller Version

0.9.2

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Deploy the latest ARC runners and latest ARC system, k8s 1.29

Describe the bug

Deploying the official helm chart with custom values as bellow with oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

## maxRunners is the max number of runners the autoscaling runner set will scale up to.
maxRunners: 100

## minRunners is the min number of idle runners. The target number of runners created will be
## calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 6

containerMode:
   type: "dind"

  spec:
    securityContext:
      fsGroup: 1000
    imagePullSecrets:
      - name: regcred
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/bin/bash","-c","sudo apt-get update && sudo apt-get install curl unzip jq wget python3-pip git-all -y && /home/runner/run.sh"]
        resources:
          requests:
            memory: 2Gi
            cpu: 1.0
          limits:
            cpu: 4.0
            memory: 8Gi
        volumeMounts:
          - name: docker-secret
            mountPath: /home/runner/config.json
            subPath: config.json
          - name: docker-config-volume
            mountPath: /home/runner/.docker
    initContainers:
      - name: dockerconfigwriter
        image: alpine
        command:
          - sh
          - -c
          - cat /home/runner/config.json > /home/runner/.docker/config.json
        volumeMounts:
          - name: docker-secret
            mountPath: /home/runner/config.json
            subPath: config.json
          - name: docker-config-volume
            mountPath: /home/runner/.docker
    volumes:
      - name: docker-secret
        secret:
          secretName: regcred
          items:
            - key: .dockerconfigjson
              path: config.json
      - name: docker-config-volume
        emptyDir: {}

after initial installation there are 6 minimum runner pods, after some time it becomes 2 without any reason.

Describe the expected behavior

runners respect MIN flag

Additional Context

arc runners does not respect minRunners flag

arc-runner-set-dk6ts-runner-dkbtd   2/2     Running   0          60m
arc-runner-set-dk6ts-runner-dpdzv   2/2     Running   0          61m

 listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6}

Controller Logs

-0-

Runner Pod Logs

2024-06-23T10:41:35Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6}                                                       │
│ 2024-06-23T10:41:35Z    INFO    listener-app.listener    Getting next message    {"lastMessageID": 0}                                                                                                                                     │
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count    {"assigned job": 0, "decision": 6, "min": 6, "max": 100, "currentRunnerCount": 6, "jobsCompleted": 0}                           │
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Compare    {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTim │
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update    {"json": "{\"spec\":{\"patchID\":0,\"replicas\":6}}"}                                                                      │
│ W0623 10:42:26.096882       1 warnings.go:70] unknown field "spec.patchID"
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6}                                                       │
│ 2024-06-23T10:42:26Z    INFO    listener-app.listener    Getting next message    {"lastMessageID": 0}                                                                                                                                     │
│ 2024-06-23T10:43:16Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count    {"assigned job": 0, "decision": 6, "min": 6, "max": 100, "currentRunnerCount": 6, "jobsCompleted": 0}                           │
│ 2024-06-23T10:43:16Z    INFO    listener-app.worker.kubernetesworker    Compare    {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\
@gfrid gfrid added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jun 23, 2024
@nikola-jokic
Copy link
Contributor

Hi @gfrid,

Can you please inspect the ephemeralrunners? Are they in a failed state maybe?

@gfrid
Copy link
Author

gfrid commented Jun 24, 2024

Hi @gfrid,

Can you please inspect the ephemeralrunners? Are they in a failed state maybe?

going to check that, i removed the arc-runners and then found out some sets in ephemeralrunners, removed them manually and reinstalled the sets, will take a look at those

@casey-robertson-paypal
Copy link

Our minRunners started doing something similar last night/today. We have 2 scalesets set with 1 for minRunners. Jobs are still being scheduled and running in actions but if I look into the arc-runners namespace at the pods, with minRunners = 1 I see no pods. I bumped minRunners to 2. Now I see 1 running pod. Logs from the listener say 2 replicas but from the Kubernetes perspective there is only 1.

@nikola-jokic
Copy link
Contributor

Can you please post the controller and the listener log? And can you please show the describe of ephemeralrunners resource?
Maybe the pod had more than 5 failures, which means that it is going to be cleaned up. The ephemeral runner resource is kept so you can inspect the latest failure. That might be the disconnect. The ARC counts the number of ephemeral runners, not the count of pods

@gfrid
Copy link
Author

gfrid commented Jul 2, 2024

update, i spoke with AWS support and we found some jobs stuck in CRDs, this might be related to working with SPOT OCEAN. we moved to OD nodes and cleaned all the CRD stuck jobs updated finalizers ( --merge ) after removing all helm charts. Now its seems stable for a week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

3 participants