Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener randomly turns non-responsive after some time #3100

Closed
4 tasks done
audunsolemdal opened this issue Nov 23, 2023 · 8 comments · Fixed by #3103
Closed
4 tasks done

Listener randomly turns non-responsive after some time #3100

audunsolemdal opened this issue Nov 23, 2023 · 8 comments · Fixed by #3103
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@audunsolemdal
Copy link

audunsolemdal commented Nov 23, 2023

Checks

Controller Version

0.6.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Create a runner group, e.g. `my-group`
Create self hosted runner via ARC, link it to `my-group`
In github actions yaml, make your jobs target the runner group


runs-on:
  group: my-group

Run some jobs
wait patiently
This will work fine for some days, then all of a sudden the listener seems to freeze for no obvious reason

Describe the bug

The listener works fine to start with and jobs are assigned as usual. After some time (days) it seems that the listener freezes. It seems random when this occurs. This results in requested jobs not being assigned.

Once the listener pod is deleted/recreated, jobs are instantly assigned as expected. Without restarting the listener pod it seems that the listener pod stops working without any output indicating that it failed. I do not see any blocked network traffic, so I do not believe that to be the issue.

Describe the expected behavior

Business as usual

Additional Context

The listener does not detect new jobs for some time. Restarting the listener pod solves the issue

YAML for helm chart


securityContext:
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000

labels:
  app: actions-runner-controller

priorityClassName: platform-high

tolerations:
  - key: CriticalAddonsOnly
    operator: Exists



yaml for runnerset:

githubConfigUrl: "https://github.com/myorg"
githubConfigSecret: "pre-defined-secret"

runnerGroup: mygroup

template:
  spec:
    containers:
      - name: runner
        image: mycustomrunnerimage:latest
        command: ["/home/runner/run.sh"]
        resources:
          limits:
            memory: "1Gi"
            cpu: "700m"
          requests:
            cpu: "75m"
            memory: "250Mi"

minRunners: 0

Controller Logs

Logs are completely empty in the relevant period

Runner Pod Logs

Listener logs:
--- several previous successful refreshing_client runs on an hourly basis before this
2023-11-22T18:35:24Z   INFO    refreshing_client    message queue token is expired during GetNextMessage, refreshing...
2023-11-22T18:35:24Z    INFO    refreshing token        {"githubConfigUrl": "https://github.com/myorg"}
2023-11-22T18:35:24Z    INFO    getting access token for GitHub App auth        {"accessTokenURL": "https://api.github.com/app/installations/myappid/access_tokens"}
2023-11-23T08:35:29Z    INFO    getting runner registration token       {"registrationTokenURL": "https://api.github.com/orgs/myorg/actions/runners/registration-token"}
2023-11-23T08:35:30Z    INFO    getting Actions tenant URL and JWT      {"registrationURL": "https://api.github.com/actions/runner-registration"}
-- No further output until pod is restarted
@audunsolemdal audunsolemdal added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Nov 23, 2023
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@antoninguyot
Copy link

Also experiencing this on 0.7.0

@audunsolemdal audunsolemdal changed the title Listener turns randomly turns non-responsive after some time Listener randomly turns non-responsive after some time Nov 23, 2023
@nikola-jokic
Copy link
Contributor

Hey @audunsolemdal @antoninguyot,

Can you please create support tickets ☺️?

@audunsolemdal
Copy link
Author

Created ticket here #3102

@nikola-jokic
Copy link
Contributor

@audunsolemdal
Copy link
Author

Ok, I created a regular support ticket now. Notice the header which popped up though, this should be updated if this is the way to go for official support
image

@nikola-jokic
Copy link
Contributor

Thank you! The message is right, configuring kubernetes cluster and/or ARC is not within the scope of the support. However, investigating this issue may require providing information that shouldn't be visible to the public (such as private repositories, etc) and that is why I asked you to create a support ticket. This issue does not fall into the scope of configuring ARC.

@audunsolemdal
Copy link
Author

Ran multiple listeners on version 0.8.1 for 14 days now without issues. Seems to be working fine in tag 0.8.x onwards, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants