Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listener metrics unavailable with 0.8.1 #3190

Closed
4 tasks done
julien-michaud opened this issue Dec 26, 2023 · 5 comments · Fixed by #3193
Closed
4 tasks done

listener metrics unavailable with 0.8.1 #3190

julien-michaud opened this issue Dec 26, 2023 · 5 comments · Fixed by #3193
Assignees
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode

Comments

@julien-michaud
Copy link

julien-michaud commented Dec 26, 2023

Checks

Controller Version

0.8.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Use chart 0.8.1

Describe the bug

With the 0.8.1 version, the listener pod dont expose all the available metrics

Here are the metrics available on /metrics

# HELP gha_max_runners Maximum number of runners.
# TYPE gha_max_runners gauge
gha_max_runners{enterprise="",name="test-hosted-beta-2n8tv",namespace="actions-runner-system",organization="test",repository=""} 100
# HELP gha_min_runners Minimum number of runners.
# TYPE gha_min_runners gauge
gha_min_runners{enterprise="",name="test-hosted-beta-2n8tv",namespace="actions-runner-system",organization="test",repository=""} 3
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0

I had no issues with the 0.7.0 version

Describe the expected behavior

being able to list all the metrics from the listener, see https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#available-metrics-for-arc

Additional Context

gha-runner-scale-set-controller:
  podLabels:
    finops.test.net/cloud_provider: gcp
    finops.test.net/cost_center: compute
    finops.test.net/product: tools
    finops.test.net/service: actions-runner-controller
    finops.test.net/region: europe-west1
  replicaCount: 3
  podAnnotations:
    ad.datadoghq.com/manager.checks: |
      {
        "openmetrics": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8080/metrics",
              "histogram_buckets_as_distributions": true,
              "namespace": "actions-runner-system",
              "metrics": [".*"]
            }
          ]
        }
      }
  metrics:
    controllerManagerAddr: ":8080"
    listenerAddr: ":8080"
    listenerEndpoint: "/metrics"


gha-runner-scale-set:
  githubConfigUrl: https://github.com/test
  githubConfigSecret:
    github_token: <path:secret/github_token/actions_runner_controller#token>

  maxRunners: 100
  minRunners: 3

  containerMode:
    type: "dind"  ## type can be set to dind or kubernetes

  listenerTemplate:
    metadata:
      labels:
        finops.test.net/cloud_provider: gcp
        finops.test.net/cost_center: compute
        finops.test.net/product: tools
        finops.test.net/service: actions-runner-controller
        finops.test.net/region: europe-west1
      annotations:
        ad.datadoghq.com/listener.checks: |
          {
            "openmetrics": {
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8080/metrics",
                  "histogram_buckets_as_distributions": true,
                  "namespace": "actions-runner-system",
                  "metrics": [".*"]
                }
              ]
            }
          }
    spec:
      containers:
      - name: listener
        securityContext:
          runAsUser: 1000
  template:
    metadata:
      labels:
        finops.test.net/cloud_provider: gcp
        finops.test.net/cost_center: compute
        finops.test.net/product: tools
        finops.test.net/service: actions-runner-controller
        finops.test.net/region: europe-west1
    spec:
      imagePullSecrets:
        - name: test-prod-registry
      containers:
        - name: runner
          image: eu.gcr.io/test-production/devex/gha-runners:v1.0.0-snapshot5
          command: ["/home/runner/run.sh"]

  controllerServiceAccount:
    namespace: actions-runner-system
    name: actions-runner-controller-gha-rs-controller

Controller Logs

https://gist.github.com/julien-michaud/3b25f14e678a365362a5270c4101acab

Runner Pod Logs

https://gist.github.com/julien-michaud/812ec630f1b63867a3be2bc23888ad38
@julien-michaud julien-michaud added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Dec 26, 2023
@julien-michaud julien-michaud changed the title <Please write what didn't work for you here> listener metrics unavailable with 0.8.1 Dec 26, 2023
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@nikola-jokic nikola-jokic removed the needs triage Requires review from the maintainers label Dec 27, 2023
@nikola-jokic nikola-jokic self-assigned this Dec 27, 2023
@nikola-jokic
Copy link
Contributor

Hey @julien-michaud,

Thank you so much for bringing this to our attention! Until this is fixed, and if you require metrics quickly, could you please follow the instruction I commented here.

@Flasheh
Copy link

Flasheh commented Jan 24, 2024

Hello @nikola-jokic

I'm running into the same issue on 0.8.1 but still unable to get metrics on the listener after falling back to the old listener.

I've enabled the metrics in the controller chart, I can see the args in the controller pod description and can get metrics from the port on the controller

    Args:
      --auto-scaling-runner-set-only
      --log-level=debug
      --log-format=text
      --update-strategy=immediate
      --listener-metrics-addr=:8080
      --listener-metrics-endpoint=/metrics
      --metrics-addr=:8080

However the listener doesn't seem to respond on the port. Not that I am missing certain metrics but it's not responding entirely.

I'm not sure if this is relevant but digging a bit further I found the config file /etc/gha-listener/config.json. In that json I see "metricsAddr":"","metricsEndpoint":"" being empty. So perhaps the metrics aren't being enabled for whatever reason?

The listener did fallback to the old version as the command in the pod changed to

Command:             
      /github-runnerscaleset-listener

If you need any more input let me know.

@Flasheh
Copy link

Flasheh commented Jan 24, 2024

UPDATE: after removing the runner set helm release and redeploying it, the listener does seem to respond on the port and returns metrics. I guess there's some config "sticking" somewhere?

@Flasheh
Copy link

Flasheh commented Jan 26, 2024

Hi @nikola-jokic

Not sure if I should open a new ticket for this but after upgrading to 0.8.2 with the #3193 fix for metrics it seems some metrics are returning incorrect values. I have not checked them all but I've noticed it with gha_registered_runners specifically.

For the listeners which are deployed with non-zero minRunners it does not seem to count those minimum available runners as registered runners.

Runners that get spun up on-demand do seem to get counted in the metrics however after the replicas get scaled down the metric value does not go down again.

EDIT: just to add, the value does seem to go down sometimes, but it still does not match the actual current registered runners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants