Add ability to opt out of agent auto-update #246

myoung34 · 2019-12-20T15:00:09Z

Describe the enhancement

Running the agent within docker and having it auto update causes the container to throttle and never succeed without having to manage PID 1

It would be nice to have something in config.sh or run.sh, possibly environment variables to help opt out of auto-update

Code Snippet

n/a

Additional information

n/a

The text was updated successfully, but these errors were encountered:

bryanmacfarlane · 2019-12-24T00:24:11Z

We're also looking at producing an image as part of the runner build so if you are doing single use container builds then you never hit a required update - it's picking up the container with the runner in it that the service demands.

Question - are you doing single use containers or does the runner / container lifetime span jobs and builds in your scenario?

myoung34 · 2019-12-24T02:43:01Z

@bryanmacfarlane it spans

https://github.com/myoung34/docker-github-actions-runner

bryanmacfarlane · 2019-12-24T03:16:08Z

instead of entrypoint running run.sh (interactive) it should probably run runsvc.sh which handles update return codes. That coupled with a image released with every runner version would probably eliminate the issue

https://github.com/actions/runner/blob/master/src/Misc/layoutbin/runsvc.sh

bryanmacfarlane · 2019-12-24T03:17:10Z

^^ I'm working on some docker containers to be part of the build - I'll look over your files

myoung34 · 2019-12-24T03:28:48Z

just make sure the dockerfiles work for arm/arm64 please 😃

bryanmacfarlane · 2019-12-24T03:35:32Z

#239

SvanBoxel · 2020-03-13T09:40:20Z

I have a customer requesting this. At the moment the self-hosted runner self-updates, making it possible unpredictable for production workloads.

ioquatix · 2020-04-01T04:20:39Z

Please don't include docker by default. I already run this inside a container.

ioquatix · 2020-04-01T04:22:55Z

Here is my suggestions:

Don't want self-updating features, it's deployed as a versioned package. If an update is needed we will release a new package.
Please separate the files according to unix conventions: read-only area for binaries/assets and a directory for runtime state, e.g. the user's home directory, or allow it to be specified on the command line.

bryanmacfarlane · 2020-04-01T06:06:44Z

It's the service that demands and pulls the runner forward. There are new capabilities and features on a cloud service that's rolling forward which needs changes in the runner. Even if it's demanded only if you use a new feature, it still means a runner can get pulled forward.

So, when it gets pulled forward, if the container is referencing ./runsvc.sh, it will handle the update code and get pulled forward. if you're using a container built with the runner / latest, then you will minimize the frequency of the update (time).

Also note that if we build some containers and sim release with the runner, that doesn't preclude you from building your own containers as you're doing now. It would be a pure convenience for others.

ioquatix · 2020-04-01T06:10:01Z

I'd rather it stopped working than auto updated.

bryanmacfarlane · 2020-04-01T06:16:35Z

That's possible as well. Call ./run.sh, it will exit and not handle the update code. However, please be aware that all your workflows will stop running until you build another container or manually configure all your self hosted runners again. You will only be able to do that with the version the service demands that you didn't want to auto update to. So I'm not sure what the stop buys you.

Note that the runner is a simple step runner. The yaml and the orchestration runs service side (that's why things like matrix'ing sets with outputs etc.) and delegates steps to this app. We're also not talking about the hosted images, dev tools, etc. which usually affect the work being done. the tools in the container or on the self hosted runner will stay the same.

bryanmacfarlane · 2020-04-01T06:25:26Z

☝️ Also note that we are painfully aware of how this affects container scenarios and building your own. We're thinking about that so I don't want to discount the problem here. Just trying to explain how it works and what the current state is. Hope that helps.

ioquatix · 2020-04-01T06:29:13Z

I understand where you are coming from.

With CI, you are running a lot of untrusted code so maybe my argument doesn't hold much water.

That being said, you can isolate untrusted code using KVM, run it as an unprivileged user, trash the results.

However, the way the system is currently designed and distributed makes this difficult.

One would normally expect a package to have a clean split between "executables/assets" and "run-time state". For whatever reason, this package doesn't have that. Not even the configuration is separated out.

Ideally, you'd install something like github-actions into /usr/bin and start it by running github-actions --work-dir=/tmp/github-actions or something similar. It would read a per user and per system configuration file, or maybe accept a command line argument to a configuration file if required.

Some of the consequences of this design are:

Impossible to multiplex - e.g. have multiple instances of github-actions running on the same system.
Huge package size because it includes C# and NodeJS.
Hard to integrate with existing package managers that expect more "unix" like model.
Hard to isolate using traditional unix approach, e.g. drop privileges, chroot, etc.

So, github-actions necessitates containers and other isolation constructs because of it's design choices.

Auto update makes the package unpredictable. Some package systems provide verification functions to check if there are any errors on the file system or something has been manipulated. Some package managers provide permissions check system which validates file permissions. Some package managers provide file ownership query and other operations. If the package files are changing, these functions stop working. Additionally, it makes it hard to remove the package, if the files are not known. For example, removing this package in arch linux boils down to rm -rf which is, kind of rude.

So, my advice is:

There should be a static part of the package which includes executables, assets and other files.
A nice to have would be the ability to build using the systems build in C# / Node.JS packages - i.e. don't need to distribute them as part of the package.
It should be possible to run the github action as any user, with a per-instance configuration/working directory, without needing containers to achieve this.

ioquatix · 2020-04-01T06:31:33Z

Oh, I also wanted to add, auto update is a massive target for black hats, so completely agree with @SvanBoxel - hard to audit, hard to reproduce, hard to validate, etc.

bryanmacfarlane · 2020-04-01T07:02:50Z

We'll reflect on ☝️ for a bit. But note that there's a ton of separation right now where the runner knows nothing about yaml, the language and other service side / user features. that's all compiled / processed service side into the simplest "job" message which is one node of the whole run graph ( a set of steps). Those use contexts and are passed through fairly opaquely to the runner to minimize changes.

However there are times when there's a feature that's added which takes runner changes. As an example, js or container actions can have post job scripts and we're about to add pre. Another example would be side car containers for test (expressed in yaml but executed by the runner).

So there's actually multiple moving parts (1) the service (2) the runner and (3) the actions that 3rd parties are releasing. :o

So now, when an action packages a pre/script (start an emulator, shutdown etc.) they expect it to run and even though the outer workflow is run service side, actions run VM side and the steps are run by the runner.

In Azure (where this code started) we actually did a real complicated thing. We allowed the runners to stay back and only get update when the user used a new service side feature that demanded a new runner and we also allowed action authors to demand a min version of the runner. The service would compute a min based on features used and enumerate actions, calculate the min and if too old, send an update message. It still would get updated, just less often and much less predictably. Task authors couldn't figure it out, didn't get it right and it led to really hard to debug situations.

I think the fundamental design decision is the yaml is not run by the runners. It's run service side with a durable stateful orchestration which dishes out work to runners. That has alot of benefits with things like parallel graphs of nodes which dish out simple instructions to a runner. But it does mean that the runner binary is an RPC remote extension of that orchestration and actions are separate from that.

Completely open to refactoring the runner and ideas. But I think that the root of the problem is the multiple moving parts with the other two moving forward and getting new features. :)

If you got this far, thanks for reading my wall of text. Hope it provides some background.

ioquatix · 2020-04-01T07:20:30Z

Yeah, I can understand what you are saying.

I agree it's not simple and to be honest, I see both sides of that coin.

But as someone who was trying to put together a package for users, this was non-trivial to deal with.

Right now, all the files are installed into /var/lib/github-actions and owned by the user of the same name. It's not ideal.

So somehow we need to find a middle ground that allows someone like myself to package it up in a more traditional way, but also support things like on the fly updates.

Whether you just vendor all the self updating code into the runtime directory is kind of up to you/the engineers behind the code, but at least it should be possible. In essence, the package should install the core code, the system integration, the user account setup, systemd, pull in dependencies which may be provided by the system (nodejs, c#) etc, and then you might start up an instance, it would self update it's own scripts into the runtime directory, but this is largely state which could be wiped clean without affecting the package being reinstalled, running the update process in the future, etc.

myoung34 · 2020-04-01T11:27:08Z

It doesnt have to stop working in general if you do deprecation warnings saying "a new version is available" for one version.

Running X
Get alert on "version y is available"
Both X and Y work
Z is released
X stops working
Y starts showing "version z is available"

would work for most scenarios. deprecation is hard for sure, but everyone would have an upgrade window in this scenario

bryanmacfarlane · 2020-04-01T13:58:51Z

We follow the model above when we (rarely) deprecate features. When we do it, that process ☝️ spans months.

However, this is not about deprecation. This is about new features getting added to the service (which users use when released. e.g. new yaml step construct) and the runner having to add new code (not deprecating) for that feature to work. This is also about the runner adding new features to support new capabilities for action authors (and yet others who use those actions and maintain runners).

SvanBoxel · 2020-04-01T14:34:22Z

An important consideration I would like us to think about is that companies in some industries are simply not compliant if they run software that hasn't been explicitly verified/validated. So having a service that runs on your own infrastructure that automatically updates could mean your business is not compliant.

An explicit upgrade process, or a way to opt-out of automatic updates, would solve this compliance conflict. I can give the example of software that is being used within companies that require FDA compliance need to go through this process before a service is being upgraded: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=820&showFR=1&subpartNode=21:8.0.1.1.12.7

myoung34 · 2020-04-01T14:35:38Z

☝️ is what I was going to follow up with.
All patches need to have a change control process of "some sort" in some regulatory environments

bryanmacfarlane · 2020-04-01T14:56:02Z

Yep. We've worked with those regulated environments in both azure and github.

In those scenarios, other things do not move forward. Everything is controlled or air-gapped including the on-premise server (TFS at MS, upcoming GHES at GitHub). It's not just the runner that's under scrutiny. The tasks/actions were also scrutinized and controlled.

In those scenarios, the runner (and actions) do not update since the service is also controlled.
We also allowed for explicit but convenient update of the runner by placing the runner in a configured location.

Even in on-premises non air gapped, the service and actions do not continuously update so the runners can stay back. The minimum runner configured is recorded and control server / service side.

Which goes back to the key point: if the other parts (service and actions) move forward then the code that executes those features has to move forward.

bryanmacfarlane · 2020-04-03T14:26:03Z

Note that I think we should still look at investing in options here. The points above are clarifying where we currently are with the current model.

bryanmacfarlane · 2020-04-17T15:00:05Z

Also note in the short term if you are running the runner in a container, you should call the ./runsvc.sh entrypoint and not ./run.sh which is meant for interactive running. Calling ./runsvc.sh will at least handle the update and not exit.

In the medium term, we also want to support .container in the yaml when running in K8s so at least the container your job runs in (the steps) can stay static and is decoupled from the runner. That also means you don't have to build the runner into your containers. That's how it works with VM self-hosted - the runner on the host execs steps in the container you bring.

Finally, we should consider refactoring as @ioquatix referred to into the base runner (listener) and the steps portion which is derived from the service yaml orchestration.

ioquatix · 2020-04-22T06:04:00Z

@bryanmacfarlane one thing I considered several months ago was writing a compatible daemon in Ruby just for fun. Do you have any published specs for the GitHub Actions runner? I started reading through the C# source code but... trying to derive a protocol/specification by reading through C# is not really my thing.

matthewhembree · 2021-04-21T17:37:57Z

I'm a bit concerned about the runner updating to a pre-release version. When I build my container to get around the auto-update autokill, I query https://api.github.com/repos/actions/runner/releases/latest. Currently, this returns a version that needs to update itself.

tetchel · 2021-04-21T17:42:37Z

I also did a double-take at the "pre-release" tag on the last 3 versions: v2.277.0, v2.277.1, and v2.278.0. It's not clear to me why those specifically are pre-releases.

As far as I can tell, only the Linux arm/arm64 builds are pre-release, but those were added prior to the listed versions.

matthewhembree · 2021-04-21T18:14:24Z

Promises 👀
Promises 🔍
Promises 🙄

pietroalbini · 2021-04-22T09:31:50Z

Rust's builders also got stuck yesterday when a pre-release was pushed as an update to our runners. We can't use auto-updates for security reasons.

nehagargSeequent · 2021-08-11T02:28:25Z

Is there any update on the issue? I am using Azure Kubernetes to host the runners and every time a build is triggered, the runners first go offline to auto-update which is super annoying.

bryanmacfarlane · 2021-08-17T20:32:55Z

@pietroalbini I think you may be confusing the runner application (this repo) and the hosted images. Even if you could pin this application, the OS images and all the toolsets slide underneath it so I don't think this issue will address your issue and other's concerns.

pietroalbini · 2021-08-18T18:43:37Z

@bryanmacfarlane the reason Rust can't use auto-updates is not related to the hosted images, but it's a byproduct of running a custom runner fork as a workaround to another runner limitation.

The main problem Rust has with the self-hosted runner is #494 (the runner executing builds from PRs causing security issues), and to work around it we're using a fork of the runner that rejects any build coming that's not a push event. In isolation this works fine and avoids the security issue, but running that fork means we can never use the auto-update feature: any auto-update would update to the upstream version of the runner, which doesn't have the security protections we added to our fork.

In our fork we also added a change to prevent auto-updates, but since the GitHub backend only sends jobs to a runner if it's up to date, preventing auto-updates that way effectively disables the runner whenever new runner release is made. We have to watch for new releases and rebase our fork before the updat e rolls to our organization, and when we don't get around to it fast enough our queue stops, which is annoying.

patryk-kozak-branch · 2021-08-19T23:25:16Z

@bryanmacfarlane @hross
I feel that for v2.280.1 and v2.280.2 there was at least couple of days grace period after release, as runners with older version could still connect and work, didn't need to immediate upgrade them.
However version released today, v2.280.3 almost immediately broke (~5hr after release) runners and CI for entire company. Is there some kind of default time/scheduled plan for releases before they get applied and are forcing runners to be upgraded?

Do you have plans to create some kind of channel of communication for upgrades? With grace period to upgrade? Or to support X last versions for compatibility?
Sorry for all the different questions, but knowing answers to those questions are critical in real world application of GitHub CI and it usefulness 😉

TingluoHuang · 2021-08-20T15:35:58Z

@patryk-kozak-branch Sorry about what you experienced, do you have any runner diag log still available for me to check? I would like to understand why the 2.280.3 upgrade breaks everything for you. 🙇

myoung34 · 2021-09-10T15:23:10Z

@jenschelkopf is that private? Im unable to see that issue

jenschelkopf · 2021-09-10T15:47:43Z

Ah, yes @myoung34. That issue is part of a repository where we track customer feedback internally. I got a little link happy :) I'll hide the comment so other people don't try it as well.

ChristopherHX · 2021-09-11T09:40:10Z

My little, but effective workaround of my own runner implementation written in golang, which obviously doesn't want to be replaced by the official runner:
To opt out of auto update the connecting runner just must have a version >= latest during session creation.

In our fork we also added a change to prevent auto-updates, but since the GitHub backend only sends jobs to a runner if it's up to date, preventing auto-updates that way effectively disables the runner whenever new runner release is made.

As you can compile the runner in a fork or locally, why not change the runner version to 3.0.0 (file https://github.com/actions/runner/blob/main/src/runnerversion). If you run the dev script / workflow now your runner won't get updates for some month or even years. (no idea when github runners will reach 3.x.x, then use 4.0.0 and so on).

This method might bear the risk github requires a new runner feature like a new message encryption method (fips was the last one which were added) and your old runner version doesn't have that feature and fails to connect, because the actions service assumes the feature is present.

edudip-thomasp · 2021-09-24T13:20:55Z

So is there anything new on this topic? Auto-updating when running "runner" in Kubernetes is constantly failing. An option to opt out would be nice.

When using infrastructure as code, containers and recipes, pinning versions for reproducibility is a good practice. Usually docker container images contain static tools and binaries and the docker image is tagged with a specific version. Constructing a docker image of a runner with a specific runner version and then self-updating itself doesn't seem that natural, instead the docker image should use whatever that binary was built/tagged to. Additionally to this - this concept doesn't play well when using ephemeral runners and kuberentes. First of all, we need to pay the price of downloading/self-updating every single ephemeral pod for every single job which causes delays in execution. Secondly this doesn't work well and containers may get stuck Related issues that will be solved with this: - actions#1396 - actions#246 - actions#485 - actions#422 - actions#442

na-jakobs · 2022-01-13T14:52:31Z

I would like to see this as well 🥇

thboop · 2022-02-02T17:20:40Z

This has been implemented in runner 2.287.1, and is live on github.com. It will come to ghes/ghae in subsequent releases! 🎉
Please see this changelog for more details

myoung34 added the enhancement New feature or request label Dec 20, 2019

edacval mentioned this issue Apr 1, 2020

Refactor ioquatix/github-actions-bin#1

Merged

myoung34 mentioned this issue Apr 3, 2020

Runner will exit shortly for update, should back online within 10 seconds. loop myoung34/docker-github-actions-runner#12

Closed

SvanBoxel mentioned this issue Apr 17, 2020

How to stop self-updating #431

Closed

bharathkkb mentioned this issue Apr 21, 2020

Some changes for GKE github-developer/self-hosted-runners-anthos#3

Merged

davidkarlsen mentioned this issue Aug 20, 2021

Use current action runner evryfs/github-actions-runner#73

Merged

This comment has been minimized.

Sign in to view

myoung34 mentioned this issue Sep 10, 2021

Ephemeral mode myoung34/docker-github-actions-runner#137

Closed

veehaitch mentioned this issue Sep 13, 2021

github-runner: 2.281.1 -> 2.282.0, prevent self-update NixOS/nixpkgs#137673

Merged

12 tasks

myoung34 mentioned this issue Sep 27, 2021

Runner updating forever myoung34/docker-github-actions-runner#155

Closed

JulianGro mentioned this issue Sep 27, 2021

Add support for aarch64 (ARM) Linux builds on GitHub Actions. vircadia/vircadia-native-core#1346

Merged

ChristopherHX mentioned this issue Oct 26, 2021

Add runner-level config to enforce PR security #783

Draft

fgalind1 mentioned this issue Oct 28, 2021

Allow to optionally skip self-update #1444

Closed

myoung34 mentioned this issue Dec 13, 2021

New Runner never takes a job myoung34/docker-github-actions-runner#164

Closed

thboop closed this as completed Feb 2, 2022

ethomson mentioned this issue Feb 4, 2022

Github actions runner dies from update to 2.273.6 #788

Closed

fhammerl mentioned this issue Jul 13, 2022

Self hosted runners requires latest runner version to be updated all the time! #1843

Closed

gricey432 mentioned this issue Jun 25, 2024

too many breaking update #3332

Closed

Add ability to opt out of agent auto-update #246

Add ability to opt out of agent auto-update #246

Comments

myoung34 commented Dec 20, 2019

bryanmacfarlane commented Dec 24, 2019

myoung34 commented Dec 24, 2019

bryanmacfarlane commented Dec 24, 2019

bryanmacfarlane commented Dec 24, 2019

myoung34 commented Dec 24, 2019

bryanmacfarlane commented Dec 24, 2019

SvanBoxel commented Mar 13, 2020

ioquatix commented Apr 1, 2020

ioquatix commented Apr 1, 2020

bryanmacfarlane commented Apr 1, 2020 • edited Loading

ioquatix commented Apr 1, 2020

bryanmacfarlane commented Apr 1, 2020 • edited Loading

bryanmacfarlane commented Apr 1, 2020

ioquatix commented Apr 1, 2020

ioquatix commented Apr 1, 2020 • edited Loading

bryanmacfarlane commented Apr 1, 2020

ioquatix commented Apr 1, 2020 • edited Loading

myoung34 commented Apr 1, 2020

bryanmacfarlane commented Apr 1, 2020 • edited Loading

SvanBoxel commented Apr 1, 2020 • edited Loading

myoung34 commented Apr 1, 2020

bryanmacfarlane commented Apr 1, 2020 • edited Loading

bryanmacfarlane commented Apr 3, 2020

bryanmacfarlane commented Apr 17, 2020 • edited Loading

ioquatix commented Apr 22, 2020

matthewhembree commented Apr 21, 2021

tetchel commented Apr 21, 2021 • edited Loading

matthewhembree commented Apr 21, 2021

pietroalbini commented Apr 22, 2021

nehagargSeequent commented Aug 11, 2021

bryanmacfarlane commented Aug 17, 2021

pietroalbini commented Aug 18, 2021

patryk-kozak-branch commented Aug 19, 2021

TingluoHuang commented Aug 20, 2021

This comment has been minimized.

myoung34 commented Sep 10, 2021

jenschelkopf commented Sep 10, 2021

ChristopherHX commented Sep 11, 2021 • edited Loading

edudip-thomasp commented Sep 24, 2021 • edited Loading

na-jakobs commented Jan 13, 2022

thboop commented Feb 2, 2022

bryanmacfarlane commented Apr 1, 2020 •

edited

Loading

bryanmacfarlane commented Apr 1, 2020 •

edited

Loading

ioquatix commented Apr 1, 2020 •

edited

Loading

ioquatix commented Apr 1, 2020 •

edited

Loading

bryanmacfarlane commented Apr 1, 2020 •

edited

Loading

SvanBoxel commented Apr 1, 2020 •

edited

Loading

bryanmacfarlane commented Apr 1, 2020 •

edited

Loading

bryanmacfarlane commented Apr 17, 2020 •

edited

Loading

tetchel commented Apr 21, 2021 •

edited

Loading

ChristopherHX commented Sep 11, 2021 •

edited

Loading

edudip-thomasp commented Sep 24, 2021 •

edited

Loading