Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid values for the metrics gha_registered_runners and gha_idle_runners in ghalistener #3546

Open
4 tasks done
verdel opened this issue May 26, 2024 · 3 comments
Open
4 tasks done
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode

Comments

@verdel
Copy link

verdel commented May 26, 2024

Checks

Controller Version

0.9.2

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

-

Describe the bug

We have a GitHub Action that runs once a day. A special type of runners is allocated specifically for it. During the execution of the GitHub Action, we receive the latest batch of messages about task execution. In this message, the statistics.totalIdleRunners and statistics.totalRegisteredRunners contain non-zero values.

These values are published by the controller as a prometheus metrics. After this last message, the metric values do not change until the next runner execution the following day.

Is it possible to fix this behavior, or does it require changes on the GitHub side?

Describe the expected behavior

The value of the Prometheus metrics ghalistener should reflect the actual state of the runners.

Additional Context

-

Controller Logs

2024-05-26T07:24:29Z	INFO	listener-app.listener	Getting next message	{"lastMessageID": 1089}
2024-05-26T07:24:37Z	INFO	listener-app.listener	Processing message	{"messageId": 1090, "messageType": "RunnerScaleSetJobMessages"}
2024-05-26T07:24:37Z	INFO	listener-app.listener	New runner scale set statistics.	{"statistics": {"totalAvailableJobs":0,"totalAcquiredJobs":3,"totalAssignedJobs":3,"totalRunningJobs":3,"totalRegisteredRunners":4,"totalBusyRunners":3,"totalIdleRunners":0}}
2024-05-26T07:24:37Z	INFO	listener-app.listener	Job completed message received.	{"RequestId": 669571, "Result": "succeeded", "RunnerId": 83622, "RunnerName": "terraform-drift-checker-hxppm-runner-9q2hx"}
2024-05-26T07:24:37Z	INFO	listener-app.listener	Deleting last message	{"lastMessageID": 1090}
2024-05-26T07:24:38Z	INFO	listener-app.worker.kubernetesworker	Calculated target runner count	{"assigned job": 3, "decision": 3, "min": 0, "max": 30, "currentRunnerCount": 3, "jobsCompleted": 1}
2024-05-26T07:24:38Z	INFO	listener-app.worker.kubernetesworker	Compare	{"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":3,\"patchID\":6917,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-05-26T07:24:38Z	INFO	listener-app.worker.kubernetesworker	Preparing EphemeralRunnerSet update	{"json": "{\"spec\":{\"patchID\":6917,\"replicas\":3}}"}
2024-05-26T07:24:38Z	INFO	listener-app.worker.kubernetesworker	Ephemeral runner set scaled.	{"namespace": "github-actions-runner", "name": "terraform-drift-checker-hxppm", "replicas": 3}
2024-05-26T07:24:38Z	INFO	listener-app.listener	Getting next message	{"lastMessageID": 1090}
2024-05-26T07:24:50Z	INFO	listener-app.listener	Processing message	{"messageId": 1091, "messageType": "RunnerScaleSetJobMessages"}
2024-05-26T07:24:50Z	INFO	listener-app.listener	New runner scale set statistics.	{"statistics": {"totalAvailableJobs":0,"totalAcquiredJobs":0,"totalAssignedJobs":0,"totalRunningJobs":0,"totalRegisteredRunners":2,"totalBusyRunners":0,"totalIdleRunners":1}}
2024-05-26T07:24:50Z	INFO	listener-app.listener	Job completed message received.	{"RequestId": 669572, "Result": "succeeded", "RunnerId": 83625, "RunnerName": "terraform-drift-checker-hxppm-runner-6dmvl"}
2024-05-26T07:24:50Z	INFO	listener-app.listener	Job completed message received.	{"RequestId": 669573, "Result": "succeeded", "RunnerId": 83623, "RunnerName": "terraform-drift-checker-hxppm-runner-lcc6k"}
2024-05-26T07:24:50Z	INFO	listener-app.listener	Job completed message received.	{"RequestId": 669574, "Result": "succeeded", "RunnerId": 83624, "RunnerName": "terraform-drift-checker-hxppm-runner-d2bv7"}
2024-05-26T07:24:50Z	INFO	listener-app.listener	Deleting last message	{"lastMessageID": 1091}

Runner Pod Logs

-
@verdel verdel added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels May 26, 2024
@verdel verdel changed the title Invalid values for the metrics gha_registered_runners and gha_idle_runners in ghalistener<Please write what didn't work for you here> Invalid values for the metrics gha_registered_runners and gha_idle_runners in ghalistener May 26, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@nikola-jokic nikola-jokic removed the needs triage Requires review from the maintainers label May 30, 2024
@nikola-jokic
Copy link
Contributor

Hey @verdel,

You are right, we receive an empty batch if no activity is needed, so the metric would be incorrect when the cluster becomes idle.
Ideally, to reflect the correct metric, the changes should be made on the API side. However, we can optimistically set this metric to the desired count when the cluster becomes idle. Let me discuss it with the team, and I'll get back to you with more information ☺️

@nicolas-laduguie
Copy link

Hello team, we would also be interested in this fix.
Do you have an update on this by any chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode
Projects
None yet
Development

No branches or pull requests

3 participants