Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow run is blocked at the automatic update for the self-hosted runner application #422

Closed
BrightRan opened this issue Apr 13, 2020 · 5 comments
Labels
bug Something isn't working Runner Auto-update 😞

Comments

@BrightRan
Copy link

In this ticket, the user has installed a self-hosted macOS runner and also set the proxy configuration.
when he uses this runner to run the workflow, the workflow run is blocked at the automatic update for the self-hosted runner application.
The following is the debug logs shared by the user:

[2020-04-09 12:48:42Z INFO RSAFileKeyManager] Loading RSA key parameters from file /Volumes/beyourself/dev_tools/github_action/actions-runner/.credentials_rsaparams
[2020-04-09 12:48:42Z INFO MessageListener] Message '8' received from session 'd4def077-beb1-4999-a88c-4a5333940ce5'.
[2020-04-09 12:48:43Z INFO Runner] Refresh message received, kick-off selfupdate background process.
[2020-04-09 12:48:45Z INFO SelfUpdater] Version '2.168.0' of 'agent' package available in server.
[2020-04-09 12:48:45Z INFO SelfUpdater] Current running runner version is 2.165.2
[2020-04-09 12:48:45Z INFO SelfUpdater] An update is available.
[2020-04-09 12:48:45Z INFO Terminal] WRITE LINE: Runner update in progress, do not shutdown runner.
[2020-04-09 12:48:46Z INFO Terminal] WRITE LINE: Downloading 2.168.0 runner
[2020-04-09 12:48:46Z INFO HostContext] Well known directory 'Bin': '/Volumes/beyourself/dev_tools/github_action/actions-runner/bin'
[2020-04-09 12:48:46Z INFO HostContext] Well known directory 'Root': '/Volumes/beyourself/dev_tools/github_action/actions-runner'
[2020-04-09 12:48:46Z INFO HostContext] Well known directory 'Work': '/Volumes/beyourself/dev_tools/github_action/actions-runner/_work'
[2020-04-09 12:48:47Z INFO SelfUpdater] Attempt 1: save latest runner into /Volumes/beyourself/dev_tools/github_action/actions-runner/_work/_update/runner1.tar.gz.
[2020-04-09 12:48:47Z INFO SelfUpdater] Download runner: begin download
[2020-04-09 13:03:47Z WARN SelfUpdater] Runner download has timed out after 900 seconds
[2020-04-09 13:03:47Z WARN SelfUpdater] Failed to get package '/Volumes/beyourself/dev_tools/github_action/actions-runner/_work/_update/runner1.tar.gz' from 'https://github.com/actions/runner/releases/download/v2.168.0/actions-runner-osx-x64-2.168.0.tar.gz'. Exception System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
---> System.Net.Sockets.SocketException (89): Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Security.SslStream.<FillBufferAsync>g__InternalFillBufferAsync|215_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
at System.Net.Http.HttpConnection.FillAsync()
at System.Net.Http.HttpConnection.CopyToContentLengthAsync(Stream destination, UInt64 length, Int32 bufferSize, CancellationToken cancellationToken)
at System.Net.Http.HttpConnection.ContentLengthReadStream.CompleteCopyToAsync(Task copyTask, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.ContentLengthReadStream.CompleteCopyToAsync(Task copyTask, CancellationToken cancellationToken)
at GitHub.Runner.Listener.SelfUpdater.DownloadLatestRunner(CancellationToken token)
[2020-04-09 13:03:47Z INFO SelfUpdater] Attempt 2: save latest runner into /Volumes/beyourself/dev_tools/github_action/actions-runner/_work/_update/runner1.tar.gz.
[2020-04-09 13:03:47Z INFO SelfUpdater] Download runner: begin download

Looks like, there are some problems cause the download of the latest runner application package to be failed, and then it goes to try the download again and again.

@ericsciple
Copy link
Collaborator

Was it an intermittent issue? Or is it still failing?

Try downloading https://github.com/actions/runner/releases/download/v2.168.0/actions-runner-osx-x64-2.168.0.tar.gz using curl. If proxy, then may be failing due to redirect to different hostname.

@TingluoHuang TingluoHuang added bug Something isn't working runner Runner Auto-update 😞 and removed runner labels Jun 8, 2020
@BrightRan
Copy link
Author

BrightRan commented Jul 20, 2020

In this ticket, the customer reported that his workflows always run failed at the automatic update for the self-hosted runner application.
https://git.luolix.topmunity/t/auto-updating-of-actions-runner-is-failing/120799

Last time, when the runner application was trying to update to version '2.169.0' during the workflow run, he get the error like as below:

Runner update in progress, do not shutdown runner.
Downloading 2.169.0 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
An error occurred: The operation was canceled.

I suggested the customer to manually download the latest version '2.263.0' of the runner application at that time, and re-install the runner on his machine.
But now, when the runner application is trying to update to the current latest version '2.267.1', the workflow run is failed again with the same error.

@krissmarteragent
Copy link

I'm not the original reporter that BrightRan opened this issue for, but i'm having similar issues when my self-hosted agents try to auto-update. This has happened on 2 successive version updates, -> 2.267.1 & -> 2.273.0 both times from the previous versions.
What I've noticed is that the auto-update feature seems to be pulling down the new version and creating 2 new folders, in the last instance bin.2.273.0 & externals.2.273.0, but the origin bin & external folders from the previous version remain. From there my jobs just queue up. If I restart the runners with systemctl or init script, then restart my jobs, the runners just go back into an auto-update cycle & the new jobs queue up, as if waiting for the auto-update to complete.

seems to me there's a final step either missing or unable to run in the auto-update process where the old bin & external folders are not replaced by the new versions, whether through some symlinking or straight up overwriting

@jhonny-me
Copy link

Same here, old version is actions-runner-osx-x64-2.274.2.tar.gz and newly release versino is actions-runner-osx-x64-2.275.0.tar.gz.

Had to completely remove the old version then re-config the new version to make it work.

Also for thoes who get stuck by Apple's GateKeeper, remember to close it before run the config script.
sudo spctl --master-disable

fgalind1 added a commit to fgalind1/runner that referenced this issue Oct 28, 2021
When using infrastructure as code, containers and recipes, pinning
versions for reproducibility is a good practice. Usually docker
container images contain static tools and binaries and the docker image
is tagged with a specific version.

Constructing a docker image of a runner with a specific runner version
and then self-updating itself doesn't seem that natural, instead the
docker image should use whatever that binary was built/tagged to.

Additionally to this - this concept doesn't play well when using
ephemeral runners and kuberentes. First of all, we need to pay the price
of downloading/self-updating every single ephemeral pod for every single
job which causes delays in execution. Secondly this doesn't work well
and containers may get stuck

Related issues that will be solved with this:
- actions#1396
- actions#246
- actions#485
- actions#422
- actions#442
fgalind1 added a commit to fgalind1/runner that referenced this issue Oct 28, 2021
When using infrastructure as code, containers and recipes, pinning
versions for reproducibility is a good practice. Usually docker
container images contain static tools and binaries and the docker image
is tagged with a specific version.

Constructing a docker image of a runner with a specific runner version
and then self-updating itself doesn't seem that natural, instead the
docker image should use whatever that binary was built/tagged to.

Additionally to this - this concept doesn't play well when using
ephemeral runners and kuberentes. First of all, we need to pay the price
of downloading/self-updating every single ephemeral pod for every single
job which causes delays in execution. Secondly this doesn't work well
and containers may get stuck

Related issues that will be solved with this:
- actions#1396
- actions#246
- actions#485
- actions#422
- actions#442
@thboop
Copy link
Collaborator

thboop commented Feb 1, 2022

Going to close this out since its been a while since its been reported, if we see this again we can reopen.

@thboop thboop closed this as completed Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Runner Auto-update 😞
Projects
None yet
Development

No branches or pull requests

6 participants