-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscaler do not scale down #248
Comments
kubectl get RunnerReplicaSet
NAME DESIRED CURRENT READY
starcoin-runner-deployment-tqxmv 1 6 6 If I delete the RunnerReplicaSet, it will scale down to the desired runner count. |
@jolestar Hey! Thanks for reporting.
Does this mean that the autoscaler did updated the desired count for your RunnerDeployment as expected, but it didn't update RunnerReplicaSet's desired count? |
the RunnerReplicaSet's DESIRED count is right, but the CURRENT & READY is always 6. How can I get more info for diagnostic it? |
@jolestar Thanks! That's helpful.
So perhaps your runnerreplicaset controller isn't working somehow? To diagnose, I can suggest you to grep and investigate logs from actions-runner-controller for lines containing |
There some error:
kubectl get runner
NAME ORGANIZATION REPOSITORY LABELS STATUS
starcoin-runner-deployment-kgbb6-78776 starcoinorg/starcoin Running
starcoin-runner-deployment-kgbb6-9w2bc starcoinorg/starcoin Running
starcoin-runner-deployment-kgbb6-cltr7 starcoinorg/starcoin Pending
starcoin-runner-deployment-kgbb6-ffzbz starcoinorg/starcoin Running
starcoin-runner-deployment-kgbb6-h4bh9 starcoinorg/starcoin Running
starcoin-runner-deployment-kgbb6-wr88j starcoinorg/starcoin Running
starcoin-runner-deployment-kgbb6-xfggn starcoinorg/starcoin Running kubectl --context do get RunnerReplicaSet
NAME DESIRED CURRENT READY
starcoin-runner-deployment-kgbb6 1 5 4 |
@jolestar Thanks! The error message is definitely misleading - Maybe the runner named Would you mind browsing |
What does running |
The starcoin-runner-deployment-kgbb6-cltr7 is pending, so no log output. I try to delete a runner, cltr7 is running, and output error "token expired".
|
I try to delete the RunnerReplicaSet kubectl delete RunnerReplicaSet starcoin-runner-deployment-kgbb6 and all runners are rebuilt.
|
@jolestar Thanks for your help. I might be a bit confused but the issue can be that our "runner controller" has a possible bug that results in leaving runner pods with expired registration token forever, which prevents the corresponding runner replicaset to work(?) It's a bit involved, but it SHOULD NOT happen as our runner controller periodically (sync period) checks the registration token: and once expired it replaces the token and the pod: Perhaps it isn't working as expected or there's edge-case(s). Out of curiosity, does creating new RunnerDeployment and decreasing the desired count immediately work? In other words, does your issue happen only when you scale-down after an hour or so? |
It's strange that it's now scale down. Let me watch it for a while longer. kubectl --context do get RunnerReplicaSet
NAME DESIRED CURRENT READY
starcoin-runner-deployment-vt5bk 1 1 1 I remove the old offline runner |
Error again: kubectl --namespace actions-runner-system logs controller-manager-5879594668-wp7mn manager
kubectl logs starcoin-runner-deployment-vt5bk-2tvk8 runner
Http response code: Unauthorized from 'POST https://api.github.com/actions/runner-registration'
{"message":"Token expired.","documentation_url":"https://docs.github.com/rest"}
Response status code does not indicate success: 401 (Unauthorized). |
@jolestar Thanks. I think we're close. Would you mind sharing me the result of |
There another error:
kubectl get po -o yaml starcoin-runner-deployment-68gdh-wwgzm apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2021-01-06T18:05:55Z"
labels:
pod-template-hash: 749fd4569f
runner-template-hash: 6d59d7cd4b |
@jolestar Hey! Are you still using actions-runner-controller? FYI, I've recently summarized how our controller can get stuck due to runners being unable to be registered for various reasons. We're far from "fixing" all the root causes because they vary a lot but the universal fix can be #297 which is I'm working on currently. |
I try the actions-runner-controller of v0.18.2, this bug is resolved. |
@jolestar Thanks for reporting! Glad to hear it worked. |
runner config:
But if the runer autoscale to 6, it does not scale down, even I delete a runner pod manual, it will auto-create a new runner pod.
The text was updated successfully, but these errors were encountered: