From bdf139cd4028eab65a629dc0fcf6602512695514 Mon Sep 17 00:00:00 2001 From: Tom Sparrow <793763+sparrowt@users.noreply.github.com> Date: Wed, 8 Nov 2023 15:59:57 +0000 Subject: [PATCH] Update readme with more details on Amazon Linux repo_upgrade race issue This adds more explanatory text, and also a specific example of how to set `repo_upgrade: none`. I spent a long time trying to modify the cloud-init config within the AMI before realising I could just set the 'User data' on the Jenkins Clouds config for the AMI, so hopefully this will help someone else too. Also list other symptoms of the same problem when Docker is being used - it took a while to realise that this was due to Amazon Linux security patching so again hopefully this reference will help others facing the same issue. --- README.md | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 3548b51e1..9db4f92f6 100644 --- a/README.md +++ b/README.md @@ -629,9 +629,17 @@ If you are using a Amazon Linux AMI and encounter exceptions like /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.201.b09-0.amzn2.x86\_64/jre/lib/currency.data (No such file or directory)*** or ***Remote call on EC2 \[...\] failed*** then chances are that the Amazon Linux is doing some -security upgrades in the background and causes the slave to be in an +security upgrades in the background and causes the agent to be in an invalid state. +Another symptom of the same problem if you run Docker on your agents can be +either docker containers randomly exiting with ExitCode 137 or getting error +messages with "EOF" part way through trying to pull or run images - this can occur +if there is a security patch for Docker itself because applying this update stops +Docker. These upgrades are asynchronous on Amazon Linux 2 (whereas on the older +Amazon Linux the docs stated that the SSH service was not started until they complete) +so your Jenkins job may have already started using Docker, causing this issue. + From the [documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-linux-ami-basics.html#security-updates) of the Amazon Linux it's stated: @@ -642,15 +650,22 @@ of the Amazon Linux it's stated: > default, Amazon Linux performs the security upgrades at launch for any > packages installed at that time +You can check by looking for `repo_upgrade:` in "cloud-init" settings `/etc/cloud/cloud.cfg`. + This issue can be solved in different ways: -1. Update your AMI and execute the security fixes =\> this will mean - however that the issue could eventually come back -2. Update your AMI and disable auto update of security fixes (see - amazon - [documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-linux-ami-basics.html#security-updates)) -3. Force security upgrade through the "init-script" in the Cloud - configuration +1. Rebuild your AMI so the latest security fixes are baked in =\> this will mean + however that the issue could eventually come back if further patches are published. +2. Update the Jenkins config for launching your AMI, so it disables automatic installation of security fixes + (see amazon [documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-linux-ami-basics.html#security-updates)) + by adding the following in 'User Data' under 'Advanced' for the AMI in Jenkins Clouds config: + + #cloud-config + repo_upgrade: none + +3. If you disable repo_upgrade per the previous point, you can optionally still ensure patches are + applied by adding your own yum command in the 'Init script' for the AMI in Jenkins Clouds config, + the advantage being that Jenkins will not start using the agent until this init script has finished ![](docs/images/init-scripts.png)