Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BrokerMigration message received is rolling out that might cause a downtime until implementation #186

Closed
ChristopherHX opened this issue Jun 29, 2024 · 13 comments · Fixed by #188
Labels
breaking external Out of control of this Project protocol

Comments

@ChristopherHX
Copy link
Owner

To reliable implement this, I need access to the rollout of the service update.

Found in this behavior change issue: actions/runner#3366 (comment)

@igagis
Copy link
Contributor

igagis commented Jul 4, 2024

Starting from yesterday evening, my self-hosted runners do not pick up the jobs anymore, e.g. see https://github.com/cppfw/opros/actions/runs/9785008298

Could this be related to this breaking change?

@ChristopherHX
Copy link
Owner Author

Yes this is indeed possible, but I need the runner log to be certain

If this is the case access to an repo of your org would help me, otherwise I'm still waiting beeing affected.

I assume all repos of your org has that feature enabled on the backend even if you would create a temporary one for me

@ChristopherHX
Copy link
Owner Author

The log might contain a Ignoring incoming message of type: line that would confirm that this is preventing job from beeing run

@ChristopherHX
Copy link
Owner Author

ChristopherHX commented Jul 4, 2024

My minecraft-linux org also got the update [2024-07-04 09:06:11Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

However not my self-hosted runners

@igagis
Copy link
Contributor

igagis commented Jul 4, 2024

For one of my runners, these are the only logs I have for today:

Jul 04 09:07:12 stahl runner[2790276]: Failed to get message, waiting 10 sec before retry: http failure: Http GET Request finished 503 https://pipelines.actions.githubusercontent.com/eMc1GyGdYp1Pn3AIiLBt1AQy39VBhA0ak6WnGv2vWqR163Rx43/_apis/distributedtask/pools/1/messages?api-version=5.1-preview&sessionId=542f6715-7fe1-4f62-90b9-e0b3aff904d1
Jul 04 09:07:12 stahl runner[2790276]: Headers:
Jul 04 09:07:12 stahl runner[2790276]: Cache-Control: no-store
Jul 04 09:07:12 stahl runner[2790276]: Content-Length: 231
Jul 04 09:07:12 stahl runner[2790276]: Content-Type: text/html
Jul 04 09:07:12 stahl runner[2790276]: Date: Thu, 04 Jul 2024 09:07:12 GMT
Jul 04 09:07:12 stahl runner[2790276]: X-Msedge-Ref: Ref A: C7C25762D71C45169960299F1093BFA5 Ref B: STOEDGE1707 Ref C: 2024-07-04T09:07:12Z
Jul 04 09:07:12 stahl runner[2790276]: Body: `{ "message": "GitHub Actions is temporarily unavailable. Please visit https://www.githubstatus.com/ for the status of our services.", "ref": "Ref A: C7C25762D71C45169960299F1093BFA5 Ref B: STOEDGE1707 Ref C: 2024-07-04T09:07:12Z" }`

@ChristopherHX
Copy link
Owner Author

Recovering from GitHub Outage might require runner service restart, not shure otherwise

@igagis
Copy link
Contributor

igagis commented Jul 4, 2024

I restarted the runner service, but it didn't help

@ChristopherHX
Copy link
Owner Author

I have no idea what happend on your end.

Are newly registred runners also broken for you?

The runner has an --trace flag for the run command that enables real verbosity of almost all http traffic, but these contain credentials that needs to be manually removed

Registering runners and running jobs is still working on my user and org, so this change mentioned in my original post has not been fully rolled out to me yet

@igagis
Copy link
Contributor

igagis commented Jul 4, 2024

Maybe the breaking change is not involved here and the problem is something else... I'll try to observe more.

@martijnbastiaan
Copy link

@igagis FWIW We're seeing the same thing. We only see BrokerMigration message received. Polling Broker for messages... and the symptoms (runners not picking up any jobs). This is however with actions/runner. We'll see if we can enable some debugging flags too.

@igagis
Copy link
Contributor

igagis commented Jul 4, 2024

My problem is gone now, perhaps it was some temporary outage of something, not related to this breaking change.

@martijnbastiaan
Copy link

No such luck here. Fingers crossed it fixes itself for us too.

@ChristopherHX
Copy link
Owner Author

Thanks to @igagis this could be solved just before my cron job alert system has been impacted and sent an alert

As of one hour ago I see these log entries of v0.8.0, this means this has rolled out to my private test repo a few hours after I finished the update

golang_2c7158bb-9e5a-4316-b6fb-0f5e3b5550ec ( https://github.com/ChristopherHX/github-act-runner-test ): Warning: TaskAgentMessage.MessageType is RunnerJobRequest, which has not been properly tested due to missing access to test servers of the new protocol before rollout. Please report any failures to https://github.com/ChristopherHX/github-act-runner/issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking external Out of control of this Project protocol
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants