Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner stuck at BrokerMigration message received. Polling Broker for messages... #3371

Open
martijnbastiaan opened this issue Jul 4, 2024 · 21 comments
Labels
bug Something isn't working

Comments

@martijnbastiaan
Copy link

Describe the bug
As of ~last night (4th of July CEST) self hosted runners do not pick up any new jobs. Runners do emit a message ~every minute:

Runner stuck at `BrokerMigration message received. Polling Broker for messages...`

This is similar to what's reported here: ChristopherHX/github-act-runner#186 (comment).

Expected behavior
Runners should pick up jobs

Runner Version and Platform

Runner

v2.317.0

OS:

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

Runner and Worker's Diagnostic Logs

[2024-07-04 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-04 11:43:31Z INFO MessageListener] Session created.
[2024-07-04 11:43:31Z INFO Terminal] WRITE LINE: Current runner version: '2.317.0'
[2024-07-04 11:43:31Z INFO Terminal] WRITE LINE: 2024-07-04 11:43:31Z: Listening for Jobs
[2024-07-04 11:43:31Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-04 11:43:31Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:43:31Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-04 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-04 11:44:22Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:45:12Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:46:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:46:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:47:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:48:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:49:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:50:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:51:04Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

Question

Is there anything we can do to get more debug information out of the runners?

@martijnbastiaan martijnbastiaan added the bug Something isn't working label Jul 4, 2024
@benjyw
Copy link

benjyw commented Jul 5, 2024

We're seeing the same thing on our MacOS self-hosted runner, also as of that point in time.

EDIT: Also on v2.317.0

@leonschoorl
Copy link

It looks like the BrokerMigration message has stopped.
But our runners still aren't picking up jobs.

Now logs keep repeating:

[2024-07-05 09:26:16Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 09:56:20Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 10:16:23Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-05 10:16:23Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-05 10:26:25Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 10:56:29Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 11:11:32Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-05 11:11:32Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown

This is using v2.317.0

@luketomlinson
Copy link
Contributor

@leonschoorl Do you have an example run that's not being picked up?

@martijnbastiaan
Copy link
Author

https://github.com/bittide/bittide-hardware/actions/runs/9798992881

This is the runner page:

Screenshot 2024-07-05 a

(I'm a colleague of @leonschoorl)

@luketomlinson
Copy link
Contributor

Hey @martijnbastiaan I noticed your runners no longer have the compute label, so those jobs won't match. Can you try adding those runners with the compute label or changing the workflow to only require self-hosted?

@luketomlinson
Copy link
Contributor

@benjyw You mentioned having the same issue, do you have any more logs or run links you can share?

@martijnbastiaan
Copy link
Author

Oh wow, yeah I totally forgot to apply the labels when reinstalling the runners to see if that would fix this issue. Re-adding the labels works.

@luketomlinson
Copy link
Contributor

For the earlier issue, @martijnbastiaan, you weren't getting jobs assigned when you were receiving the BrokerMigration message. Is your runner the standard actions/runner (this repo, not a fork)? Do your runners have any kind if firewall settings that might be preventing it from hitting the Broker? Trying to make sure I'm capturing the original issue too.

@martijnbastiaan
Copy link
Author

Standard runner, not a fork. No firewall except NAT.

@matbee-eth
Copy link

matbee-eth commented Jul 5, 2024

As of yesterday my self hosted runner is also not picking up any new jobs, latest _diag message:
https://gist.github.com/matbee-eth/b3c2bdca9f560b9b9788d1b1fbedef02

<trunc>
[2024-07-05 16:12:02Z INFO Terminal] WRITE LINE: Current runner version: '2.317.0'
[2024-07-05 16:12:02Z INFO Terminal] WRITE LINE: 2024-07-05 16:12:02Z: Listening for Jobs
[2024-07-05 16:12:02Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-05 16:12:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:12:02Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/fatimaapi/backend/.credentials_rsaparams
[2024-07-05 16:12:02Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-05 16:12:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:13:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:14:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:15:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:16:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:17:03Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

@luketomlinson
Copy link
Contributor

@matbee-eth do you have a session ID for your runner? Or an example link to a run that isn't being picked up?

@matbee-eth
Copy link

It's a private repo, not sure if GH staff can access anything, but https://github.com/FytechCorp/AI-Backend/actions/runs/9799514721

Unsure of how to get a Runner Session ID specifically though

@luketomlinson
Copy link
Contributor

All good @matbee-eth. That's helpful. Thanks!

@benjyw
Copy link

benjyw commented Jul 5, 2024

Hi @luketomlinson, thanks for looking into this. Things seem to be working now, with no changes on our end. So presumably some server-side issue that was resolved?

@luketomlinson
Copy link
Contributor

Yep @benjyw, I'm looking into some server side changes.

@ajax-ryzhyi-r
Copy link

Hi @luketomlinson,

Any updates regarding this issue? We faced it as well. Original runner with no special firewall rules.
If you need any additional information, I will be glad to provide it.

@fwosar
Copy link

fwosar commented Jul 17, 2024

We have seen this issue as well. Same as for the other people here: Started about 2 weeks ago around the 4th of July. Then it was working again (albeit a bit spotty), but since yesterday complete outage of all our self-hosted runners on both Windows as well as Linux. This is really making me regret using Github Actions as a CI/CD pipeline right now. Identical log as for the previous people as well. For example:

[2024-07-17 13:07:35Z INFO Terminal] WRITE LINE: 2024-07-17 13:07:35Z: Listening for Jobs
[2024-07-17 13:07:35Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-17 13:07:35Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:07:35Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 13:07:35Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-17 13:08:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:09:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:10:06Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:10:57Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:11:47Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:12:38Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:13:28Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:14:18Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:15:08Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:15:59Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:16:49Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:17:39Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:18:30Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:19:20Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:20:10Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:21:01Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:21:51Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:22:41Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:23:32Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:24:22Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:25:12Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:26:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:26:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:27:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:28:34Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:29:24Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:30:15Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:31:05Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:31:55Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:32:45Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:33:36Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:34:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:35:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

@TingluoHuang
Copy link
Member

@fwosar can you share the full runner diag log with me? tingluohuang@github.com 🙇
Sorry for the trouble, I can help you take a look when i got the diag log. 😄

@fwosar
Copy link

fwosar commented Jul 18, 2024

Actually, I think this particular problem might be related to #3393. I went through all the logs again, and we hit the same error as in #3393 on our runners:

[2024-07-17 12:59:03Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 12:59:03Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 12:59:03Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-17 12:59:04Z INFO JobDispatcher] Shutting down JobDispatcher. Make sure all WorkerDispatcher has finished.
[2024-07-17 12:59:04Z INFO Runner] Deleting Runner Session...
[2024-07-17 12:59:05Z ERR  Terminal] WRITE ERROR: An error occurred: Object reference not set to an instance of an object.
[2024-07-17 12:59:05Z ERR  Listener] System.NullReferenceException: Object reference not set to an instance of an object.
   at GitHub.Runner.Listener.JobDispatcher.Run(AgentJobRequestMessage jobRequestMessage, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.ExecuteCommand(CommandSettings command)
   at GitHub.Runner.Listener.Program.MainAsync(IHostContext context, String[] args)

After hitting these errors, the runner was effectively dead. Removing and redeploying the runner seems to have resolved the problem.

@marko-k0
Copy link

We are experiencing this behavior now:

[RUNNER 2024-07-25 15:15:45Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[RUNNER 2024-07-25 15:15:45Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:15:45Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:15:45Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2024-07-25 15:16:36Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:17:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:18:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:19:07Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:19:57Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:20:47Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:21:37Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:22:28Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:23:18Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:24:08Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:24:59Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:26:03Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:26:54Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:27:44Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:28:34Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
...
[RUNNER 2024-07-25 15:30:22Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:30:22Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:30:22Z ERR  GitHubActionsService] POST request to https://pipelinesghubeus21.actions.githubusercontent.com/izjmlKLllI8mQvN3RNUEys667oyyTtmTTtNj33gFgdUfkp3yzp/_apis/oauth2/token failed. HTTP Status: BadRequest
[RUNNER 2024-07-25 15:30:22Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener] Catch exception during create session.
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener] GitHub.Services.OAuth.VssOAuthTokenRequestException: Registration c8051515-ad19-4910-a34e-36559f1f57d8 was not found.
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.OAuth.VssOAuthTokenProvider.OnGetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.IssuedTokenProvider.GetTokenOperation.GetTokenAsync(VssTraceActivity traceActivity)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.IssuedTokenProvider.GetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token)
[RUNNER 2024-07-25 15:30:22Z ERR  Terminal] WRITE ERROR: Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
[RUNNER 2024-07-25 15:30:22Z INFO Listener] Runner execution has finished with return code 1
Runner listener exit with terminated error, stop the service, no retry needed.
Exiting runner...

@sby-yu-katou
Copy link

In my case, after adding a self-hosted runner from the GitHub Enterprise Settings, I encountered the same issue as described. However, when I created the self-hosted runner from the GitHub Organization Settings, it worked correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants