Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup-buildx-action hangs with 99% CPU when running on latest Github Action Runner v2.285.0 #117

Closed
dresselm opened this issue Dec 1, 2021 · 61 comments

Comments

@dresselm
Copy link

dresselm commented Dec 1, 2021

Behaviour

Our runners were auto-updated this morning from v2.284.0 to v2.285.0 and began to hang on the Download and Install Buildx step:

Screen Shot 2021-12-01 at 5 24 52 PM

Steps to reproduce this issue

  1. Upgrade your runner to version v2.285.0
  2. Run the docker/setup-buildx-action@v1 in your workflow
  3. You should see heavy CPU utilization that did not exist in v2.284.0
    image

Expected behaviour

The step should complete in under 10s per our typical experience

Actual behaviour

The step is hanging indefinitely

Configuration

  • Repository URL (if public): private
  • Build URL (if public): private

Note: all steps prior to the failing step pass as expected

name: Test Environment

on:
  pull_request:
    types: [labeled, reopened, synchronize, ready_for_review]
  workflow_dispatch:

env:
  HEAD_SHA: ${{ github.event.pull_request.head.sha }}

jobs:
  build:
    name: Build Image
    runs-on: self-hosted
    steps:
      - name: Set outputs
        id: vars
        run: |
          echo "::set-output name=head_sha::$(echo $PR_HEAD_SHA | cut -c1-7)"
        env:
          PR_HEAD_SHA: ${{ env.HEAD_SHA }}

      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKER_HUB_USERNAME }}
          password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: redacted
          role-to-assume: redacted

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create test-builder

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          endpoint: test-builder

      - name: Build and push Docker images
        uses: docker/build-push-action@v2.7.0

Logs

Download the log file of your build
and attach it to this issue.

Our logs are littered with private, implementation details. Could you help me isolate which part of the logs you need for diagnosis?

2021-12-01T19:25:54.2902593Z Found online and idle self-hosted runner in the  current repository's enterprise account that matches the required labels: 'self-hosted'
2021-12-01T19:25:54.4048239Z Waiting for the runner to pick up this job...
2021-12-01T19:26:54.7363687Z Job was not picked up by the preferred runner, try requesting additional runners to run this job
2021-12-01T19:26:54.8282602Z Waiting for a self-hosted runner to pickup this job...
2021-12-01T19:32:40.3175224Z Job is about to start running on the runner: self-hosted-runner-group-6zjgq-67xpv (enterprise)
2021-12-01T19:32:44.8744015Z Current runner version: '2.285.0'
2021-12-01T19:32:44.8753670Z Runner name: 'self-hosted-runner-group-6zjgq-67xpv'
2021-12-01T19:32:44.8755541Z Runner group name: 'self-hosted'
2021-12-01T19:32:44.8758083Z Machine name: 'self-hosted-runner-group-6zjgq-67xpv'
...
2021-12-01T19:32:49.5211515Z ##[group]Download and install buildx
2021-12-01T19:32:49.5825561Z Downloading https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64
2021-12-01T20:12:24.6445002Z ##[error]The operation was canceled.
2021-12-01T20:12:24.6564749Z Post job cleanup.
2021-12-01T20:12:24.7568406Z Post job cleanup.
2021-12-01T20:12:24.9055991Z Post job cleanup.
2021-12-01T20:12:24.9780442Z Post job cleanup.
2021-12-01T20:12:25.0473555Z [command]/usr/local/bin/docker logout 
2021-12-01T20:12:25.1428510Z Removing login credentials for https://index.docker.io/v1/
2021-12-01T20:12:25.1518891Z Cleaning up orphan processes
@ghostsquad
Copy link

I just ran into the same issue, as I attempt a new setup using private hosted runners. I thought it was related to running this in k8s, but this issue describes the exact same behavior. Downloading... and it just hangs there.

@ghostsquad
Copy link

I'd also like to confirm that when I switch to v2.284.0 this works.

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: example-runner
  namespace: actions-runner-system
spec:
  repository: <redacted>
  image: summerwind/actions-runner:v2.284.0-ubuntu-20.04
  env: []

@dresselm
Copy link
Author

dresselm commented Dec 2, 2021

@ghostsquad when you make this change (summerwind/actions-runner:v2.284.0-ubuntu-20.04) to the runner, it uses the older node version (v12.22.7) here instead of the updated node version (v12.13.1)? I haven't tried this yet, but will check it out tomorrow. I have been shelling into the runner and checking this: /runner/externals/node12/bin/node -v and it keeps using the new version and failing.

Another interesting thing I discovered is that this action runs runs using v12 but the latest GitHub runner (v2.285.0) shipped with v16 support as well.

@crazy-max
Copy link
Member

@dresselm

Another interesting thing I discovered is that this action runs runs using v12 but the latest GitHub runner (v2.285.0) shipped with v16 support as well.

Interesting indeed, are you on GHE?

@crazy-max
Copy link
Member

Our e2e workflow works fine with runner version: '2.285.0':

image

Are you using a self-hosted runner?

@jnsvd
Copy link

jnsvd commented Dec 2, 2021

We're seeing the same error on self-hosted runners. Version auto-changed overnight and now the job is timing out on the uses: docker/setup-buildx-action@v1.5.0 step:

  WARNING: Plugin "/home/runner/.docker/cli-plugins/docker-buildx" is not valid: failed to fetch metadata: fork/exec /home/runner/.docker/cli-plugins/docker-buildx: permission denied
  WARNING: No swap limit support
Download and install buildx
  Error: The operation was canceled.

@damyan
Copy link

damyan commented Dec 2, 2021

Yet another "same here". Self-hosted k8s runners on a private repo stuck forever at the buildx step.

@rjhenry
Copy link

rjhenry commented Dec 2, 2021

Seeing the same here, first noted on Tuesday afternoon (UK time)

@crazy-max
Copy link
Member

Can someone provide the workflow logs of an affected runner please (private info redacted ofc)? Thanks.

@jnsvd
Copy link

jnsvd commented Dec 2, 2021

@crazy-max

Set-up job logs:

2021-12-02T13:24:49.5232557Z Current runner version: '2.285.0'
2021-12-02T13:24:49.5238376Z Runner name: 'shared-vjx25-5cz76'
2021-12-02T13:24:49.5239071Z Runner group name: 'Default'
2021-12-02T13:24:49.5240198Z Machine name: 'shared-vjx25-5cz76'
2021-12-02T13:24:49.5243337Z ##[group]GITHUB_TOKEN Permissions
2021-12-02T13:24:49.5244528Z Actions: write
2021-12-02T13:24:49.5245038Z Checks: write
2021-12-02T13:24:49.5245509Z Contents: write
2021-12-02T13:24:49.5246054Z Deployments: write
2021-12-02T13:24:49.5246576Z Discussions: write
2021-12-02T13:24:49.5247045Z Issues: write
2021-12-02T13:24:49.5247521Z Metadata: read
2021-12-02T13:24:49.5247996Z Packages: write
2021-12-02T13:24:49.5248538Z Pages: write
2021-12-02T13:24:49.5249020Z PullRequests: write
2021-12-02T13:24:49.5249610Z RepositoryProjects: write
2021-12-02T13:24:49.5250185Z SecurityEvents: write
2021-12-02T13:24:49.5250721Z Statuses: write
2021-12-02T13:24:49.5251310Z ##[endgroup]
2021-12-02T13:24:49.5253761Z Secret source: Actions
2021-12-02T13:24:49.5254650Z Prepare workflow directory
2021-12-02T13:24:49.6124677Z Prepare all required actions
2021-12-02T13:24:49.6133014Z Getting action download info
2021-12-02T13:24:49.9042420Z Download action repository 'actions/checkout@v2' (SHA:ec3a7ce113134d7a93b817d10a8272cb61118579)
2021-12-02T13:24:50.3898900Z Download action repository 'actions/setup-node@v2.2.0' (SHA:38d90ce44d5275ad62cc48384b3d8a58c500bb5f)
2021-12-02T13:24:50.9495641Z Download action repository 'actions/cache@v2' (SHA:937d24475381cd9c75ae6db12cb4e79714b926ed)
2021-12-02T13:24:51.4209320Z Download action repository 'azure/CLI@v1' (SHA:7378ce2ca3c38b4b063feb7a4cbe384fef978055)
2021-12-02T13:24:51.9278970Z Download action repository 'docker/login-action@v1.10.0' (SHA:f054a8b539a109f9f41c372932f1ae047eff08c9)
2021-12-02T13:24:52.2985638Z Download action repository 'docker/setup-buildx-action@v1.5.0' (SHA:e673438944759779e411a0f7ceef3ba437dccfa0)
2021-12-02T13:24:52.7750364Z Download action repository 'docker/build-push-action@v2.6.1' (SHA:1bc1040caef9e604eb543693ba89b5bf4fc80935)

docker/setup-buildx-action@v1.5.0 logs:

2021-12-02T13:26:53.7736865Z ##[group]Run docker/setup-buildx-action@v1.5.0
2021-12-02T13:26:53.7737409Z with:
2021-12-02T13:26:53.7737792Z   driver: docker
2021-12-02T13:26:53.7738728Z   buildkitd-flags: --allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host
2021-12-02T13:26:53.7739669Z   install: false
2021-12-02T13:26:53.7740122Z   use: true
2021-12-02T13:26:53.7740488Z env:
2021-12-02T13:26:53.7741443Z   EXPOSE_DOMAIN_NAME: <redacted>
2021-12-02T13:26:53.7742918Z   ACR_PREVIEW: <redacted>
2021-12-02T13:26:53.7743756Z   VERSION: pr-3-18
2021-12-02T13:26:53.7886312Z   TOKEN: ***
2021-12-02T13:26:53.7886738Z ##[endgroup]
2021-12-02T13:26:53.8429557Z ##[group]Docker info
2021-12-02T13:26:53.8458069Z [command]/usr/local/bin/docker version
2021-12-02T13:26:53.8912983Z Client:
2021-12-02T13:26:53.8915314Z  Version:           20.10.8
2021-12-02T13:26:53.8915801Z  API version:       1.41
2021-12-02T13:26:53.8918136Z  Go version:        go1.16.6
2021-12-02T13:26:53.8918888Z  Git commit:        3967b7d
2021-12-02T13:26:53.8919435Z  Built:             Fri Jul 30 19:50:40 2021
2021-12-02T13:26:53.8921369Z  OS/Arch:           linux/amd64
2021-12-02T13:26:53.8921845Z  Context:           default
2021-12-02T13:26:53.8922455Z  Experimental:      true
2021-12-02T13:26:53.8923729Z 
2021-12-02T13:26:53.8925459Z Server: Docker Engine - Community
2021-12-02T13:26:53.8926831Z  Engine:
2021-12-02T13:26:53.8927325Z   Version:          20.10.9
2021-12-02T13:26:53.8928714Z   API version:      1.41 (minimum version 1.12)
2021-12-02T13:26:53.8929542Z   Go version:       go1.16.8
2021-12-02T13:26:53.8930283Z   Git commit:       79ea9d3
2021-12-02T13:26:53.8930946Z   Built:            Mon Oct  4 16:07:30 2021
2021-12-02T13:26:53.8931662Z   OS/Arch:          linux/amd64
2021-12-02T13:26:53.8932425Z   Experimental:     false
2021-12-02T13:26:53.8933640Z  containerd:
2021-12-02T13:26:53.8934337Z   Version:          v1.4.11
2021-12-02T13:26:53.8935253Z   GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
2021-12-02T13:26:53.8936558Z  runc:
2021-12-02T13:26:53.8937150Z   Version:          1.0.2
2021-12-02T13:26:53.8938645Z   GitCommit:        v1.0.2-0-g52b36a2d
2021-12-02T13:26:53.8939681Z  docker-init:
2021-12-02T13:26:53.8940315Z   Version:          0.19.0
2021-12-02T13:26:53.8941084Z   GitCommit:        de40ad0
2021-12-02T13:26:53.8952687Z [command]/usr/local/bin/docker info
2021-12-02T13:26:53.9405279Z Client:
2021-12-02T13:26:53.9406048Z  Context:    default
2021-12-02T13:26:53.9406650Z  Debug Mode: false
2021-12-02T13:26:53.9407019Z 
2021-12-02T13:26:53.9407499Z Server:
2021-12-02T13:26:53.9407932Z  Containers: 0
2021-12-02T13:26:53.9408407Z   Running: 0
2021-12-02T13:26:53.9408826Z   Paused: 0
2021-12-02T13:26:53.9409270Z   Stopped: 0
2021-12-02T13:26:53.9409669Z  Images: 1
2021-12-02T13:26:53.9410099Z  Server Version: 20.10.9
2021-12-02T13:26:53.9410593Z  Storage Driver: overlay2
2021-12-02T13:26:53.9411109Z   Backing Filesystem: extfs
2021-12-02T13:26:53.9411612Z   Supports d_type: true
2021-12-02T13:26:53.9412103Z   Native Overlay Diff: false
2021-12-02T13:26:53.9412589Z   userxattr: false
2021-12-02T13:26:53.9413999Z  Logging Driver: json-file
2021-12-02T13:26:53.9414541Z  Cgroup Driver: cgroupfs
2021-12-02T13:26:53.9415026Z  Cgroup Version: 1
2021-12-02T13:26:53.9415474Z  Plugins:
2021-12-02T13:26:53.9415882Z   Volume: local
2021-12-02T13:26:53.9416428Z   Network: bridge host ipvlan macvlan null overlay
2021-12-02T13:26:53.9417424Z   Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
2021-12-02T13:26:53.9418092Z  Swarm: inactive
2021-12-02T13:26:53.9418942Z  Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
2021-12-02T13:26:53.9419836Z  Default Runtime: runc
2021-12-02T13:26:53.9420492Z  Init Binary: docker-init
2021-12-02T13:26:53.9421142Z  containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
2021-12-02T13:26:53.9421911Z  runc version: v1.0.2-0-g52b36a2d
2021-12-02T13:26:53.9422417Z  init version: de40ad0
2021-12-02T13:26:53.9423218Z  Security Options:
2021-12-02T13:26:53.9423614Z   apparmor
2021-12-02T13:26:53.9423999Z   seccomp
2021-12-02T13:26:53.9424432Z    Profile: default
2021-12-02T13:26:53.9425035Z  Kernel Version: 5.4.0-1063-azure
2021-12-02T13:26:53.9425667Z  Operating System: Alpine Linux v3.14 (containerized)
2021-12-02T13:26:53.9426213Z  OSType: linux
2021-12-02T13:26:53.9426628Z  Architecture: x86_64
2021-12-02T13:26:53.9427054Z  CPUs: 8
2021-12-02T13:26:53.9427474Z  Total Memory: 31.36GiB
2021-12-02T13:26:53.9428268Z  Name: shared-vjx25-5cz76
2021-12-02T13:26:53.9428857Z  ID: MKF2:72OS:YKOV:2E5D:635C:TQSY:YW63:FN5L:JMGK:HTZQ:X6CY:C4XG
2021-12-02T13:26:53.9429427Z  Docker Root Dir: /var/lib/docker
2021-12-02T13:26:53.9429926Z  Debug Mode: false
2021-12-02T13:26:53.9430539Z  Registry: https://index.docker.io/v1/
2021-12-02T13:26:53.9431080Z  Labels:
2021-12-02T13:26:53.9431509Z  Experimental: false
2021-12-02T13:26:53.9432013Z  Insecure Registries:
2021-12-02T13:26:53.9432402Z   127.0.0.0/8
2021-12-02T13:26:53.9432838Z  Live Restore Enabled: false
2021-12-02T13:26:53.9433380Z  Product License: Community Engine
2021-12-02T13:26:53.9433715Z 
2021-12-02T13:26:53.9434164Z WARNING: No swap limit support
2021-12-02T13:26:53.9435253Z ##[endgroup]
2021-12-02T13:26:53.9851668Z ##[group]Download and install buildx
2021-12-02T13:26:54.1115979Z Downloading https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64
2021-12-02T13:38:51.7240440Z ##[error]The operation was canceled.

@crazy-max
Copy link
Member

@jnsvd I'm really confused. Can you add this step before the setup-buildx-action step to see if it's linked to the actions/tool-cache pkg that downloads buildx?:

      - name: Install Docker Buildx
        run: |
          mkdir -p "$HOME/.docker/cli-plugins"
          curl -SsL "https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64" -o "$HOME/.docker/cli-plugins/docker-buildx"
          chmod +x "$HOME/.docker/cli-plugins/docker-buildx"

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

@jnsvd
Copy link

jnsvd commented Dec 2, 2021

@crazy-max If we add the extra Install Docker Buildx step the pipeline finished successfully.

@jnsvd
Copy link

jnsvd commented Dec 2, 2021

@crazy-max Step actions/cache@v2 doesn't seem to be related to the problem, when removing it (and your additional step), the pipeline is still freezing on the docker/setup-buildx-action step. When applying your step and the actions/cache@v2 the pipeline ran to completion.

@richardpeng
Copy link

I can confirm that the Install Docker Buildx step above does allow the setup-buildx-action to run successfully.

@crazy-max
Copy link
Member

crazy-max commented Dec 2, 2021

Ok thanks for your feedback! One last thing, can you add the ACTIONS_RUNNER_DEBUG secret to true and give me the output? It could tell us if this is an issue with the actions/tool-cache.

@jnsvd

Step actions/cache@v2 doesn't seem to be related to the problem

This is not about the actions/cache action but the node module actions/tool-cache we use in the setup-buildx-action: https://github.com/actions/toolkit/tree/main/packages/tool-cache.

@richardpeng
Copy link

richardpeng commented Dec 2, 2021

Let me know if you need more logs than this, but you can see that the fixed version goes to Creating a new builder instance instead of getting stuck on Download and install buildx

CleanShot 2021-12-02 at 11 07 21@2x

CleanShot 2021-12-02 at 11 07 12@2x

The operation was canceled. is misleading. The action never times out. That message appears when you cancel the workflow.

@jnsvd
Copy link

jnsvd commented Dec 2, 2021

Ok thanks for your feedback! One last thing, can you add the ACTIONS_RUNNER_DEBUG secret to true and give me the output? It could tell us if this is an issue with the actions/tool-cache.

There's no more output when enabling diagnostic logging. The pipeline really seems to get stuck on Downloading https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64 somehow.

@richardpeng
Copy link

Would it be possible to add more logging for each of the steps? That might help future debugging

@crazy-max
Copy link
Member

@jnsvd

There's no more output when enabling diagnostic logging. The pipeline really seems to get stuck on Downloading https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64 somehow.

Oh my bad it's ACTIONS_STEP_DEBUG to true and you should have smth like this:

image

@crazy-max
Copy link
Member

Friendly ping @thboop @luketomlinson. Are you aware of some issues with the actions/tool-cache module on self-hosted runners updated to the latest v2.285.0?

We use the downloadTool and cacheFile functions here from actions/tool-cache and it seems to be frozen while caching. Any idea?

@dresselm
Copy link
Author

dresselm commented Dec 2, 2021

@crazy-max @richardpeng

Regarding adding more debug logging, I was stuck yesterday even though I had ACTIONS_STEP_DEBUG=true setup correctly. Primary reason was that I saw the last line Plugin path... but wasn't seeing Fixing perms and there was no additional debug details describing what was happening while it was stuck.

debug-logs

@richardpeng
Copy link

richardpeng commented Dec 2, 2021

Github is slowly rolling out updates to their runners so it took many tries to get a run on an upgraded self-hosted runner....

Here are the logs:

CleanShot 2021-12-02 at 13 05 35@2x

There are no further logs. It's just stuck at that spot

Update:

CleanShot 2021-12-02 at 13 18 10@2x

@crazy-max
Copy link
Member

It looks like this line is the culprit:

fs.copyFileSync(path.join(toolPath, filename), pluginPath);

I wonder if this is not a bug with fs.copyFileSync in the node version 12.22.7 added in v2.285.0 of the runner as shown in nodejs/node#40200

No specific reports on your side @TingluoHuang @thboop?

@TingluoHuang
Copy link

@crazy-max
This is the first report we got since we just bump the node12 packaged with the runner (v2.285.0) to that latest v12 LTS version.
The issue you linked is for only node v16, isn't it?

@crazy-max
Copy link
Member

@TingluoHuang

@crazy-max This is the first report we got since we just bump the node12 packaged with the runner (v2.285.0) to that latest v12 LTS version. The issue you linked is for only node v16, isn't it?

Yes indeed I try to find the root cause maybe it has been backported to 12.22.7 I'm not quite sure but anyway I tried again on a fresh runner 2.285.0 and can't repro (Ubuntu 20.04):

image

@richardpeng @dresselm @jnsvd @rjhenry @ghostsquad and others with this issue, can you show your Set up job logs like my screenshot? I wonder if the virtual env image should not been updated on your side. And also is this happening only on self-hosted runners for you all?

@TingluoHuang
Copy link

I think it might have something to do with the docker container volume mount. I am playing with it right now.

@crazy-max
Copy link
Member

@TingluoHuang Thanks!

@TingluoHuang
Copy link

node16 doesn't work, it hangs as well. 😢

image

@TingluoHuang
Copy link

it's related to the double copy... 😵‍💫

  • fs.copy(tmp, /opt/toolcache)
  • fs.copy(/opt/toolcache, /home/runner/...)
Fri, 03 Dec 2021 21:53:15 GMT
  export NODE_DEBUG=fs
Fri, 03 Dec 2021 21:53:15 GMT
  whoami      
Fri, 03 Dec 2021 21:53:15 GMT
  echo "try download and copy with curl and cp"
Fri, 03 Dec 2021 21:53:15 GMT
  curl -O -L https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64
Fri, 03 Dec 2021 21:53:15 GMT
  mkdir -p /opt/hostedtoolcache/buildx/0.7.1/x64
Fri, 03 Dec 2021 21:53:15 GMT
  cp -v ./buildx-v0.7.1.linux-amd64 /opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx
Fri, 03 Dec 2021 21:53:15 GMT
  mkdir -p /home/runner/.docker/cli-plugins
Fri, 03 Dec 2021 21:53:15 GMT
  cp -v /opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx /home/runner/.docker/cli-plugins/docker-buildx
Fri, 03 Dec 2021 21:53:15 GMT
  ls -l /home/runner/.docker/cli-plugins/docker-buildx
Fri, 03 Dec 2021 21:53:15 GMT
  
Fri, 03 Dec 2021 21:53:15 GMT
  node -v
Fri, 03 Dec 2021 21:53:15 GMT
  echo "try use node to copy from /opt/hostedtoolcache to /home/runner/.docker/cli-plugins"
Fri, 03 Dec 2021 21:53:15 GMT
  node -e "fs=require('fs'); fs.copyFileSync('/opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx', '/home/runner/.docker/cli-plugins/docker-buildx2'); "
Fri, 03 Dec 2021 21:53:15 GMT
  #node ./setup-buildx-action-master/dist/index.js 
Fri, 03 Dec 2021 21:53:15 GMT
  ls -l /home/runner/.docker/cli-plugins/docker-buildx2
Fri, 03 Dec 2021 21:53:15 GMT
  
Fri, 03 Dec 2021 21:53:15 GMT
  echo "try use node to copy from ./buildx-v0.7.1.linux-amd64 to /opt/hostedtoolcache first and then use cp to copy to /home/runner/.docker/cli-plugins"
Fri, 03 Dec 2021 21:53:15 GMT
  node -e "fs=require('fs'); fs.copyFileSync('./buildx-v0.7.1.linux-amd64', '/opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx3'); "
Fri, 03 Dec 2021 21:53:15 GMT
  ls -l /opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx3
Fri, 03 Dec 2021 21:53:15 GMT
  cp -v /opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx3 /home/runner/.docker/cli-plugins/docker-buildx4
Fri, 03 Dec 2021 21:53:15 GMT
  ls -l /home/runner/.docker/cli-plugins/docker-buildx4
Fri, 03 Dec 2021 21:53:15 GMT
  
Fri, 03 Dec 2021 21:53:15 GMT
  echo "try use node to copy from ./buildx-v0.7.1.linux-amd64 to /opt/hostedtoolcache first and then copy to /home/runner/.docker/cli-plugins"
Fri, 03 Dec 2021 21:53:15 GMT
  node -e "fs=require('fs'); fs.copyFileSync('./buildx-v0.7.1.linux-amd64', '/opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx5'); "
Fri, 03 Dec 2021 21:53:15 GMT
  ls -l /opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx5
Fri, 03 Dec 2021 21:53:15 GMT
  node -e "fs=require('fs'); fs.copyFileSync('/opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx5', '/home/runner/.docker/cli-plugins/docker-buildx6'); "
Fri, 03 Dec 2021 21:53:15 GMT
  ls -l /home/runner/.docker/cli-plugins/docker-buildx6
Fri, 03 Dec 2021 21:53:15 GMT
  shell: /usr/bin/bash -e {0}
Fri, 03 Dec 2021 21:53:15 GMT
runner
Fri, 03 Dec 2021 21:53:15 GMT
try download and copy with curl and cp
Fri, 03 Dec 2021 21:53:15 GMT
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Fri, 03 Dec 2021 21:53:15 GMT
                                 Dload  Upload   Total   Spent    Left  Speed
Fri, 03 Dec 2021 21:53:15 GMT

Fri, 03 Dec 2021 21:53:16 GMT
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
Fri, 03 Dec 2021 21:53:16 GMT
100   664  100   664    0     0   5487      0 --:--:-- --:--:-- --:--:--  5487
Fri, 03 Dec 2021 21:53:16 GMT

Fri, 03 Dec 2021 21:53:16 GMT
 28 44.9M   28 13.0M    0     0  33.8M      0  0:00:01 --:--:--  0:00:01 33.8M
Fri, 03 Dec 2021 21:53:16 GMT
100 44.9M  100 44.9M    0     0  65.6M      0 --:--:-- --:--:-- --:--:--  107M
Fri, 03 Dec 2021 21:53:16 GMT
'./buildx-v0.7.1.linux-amd64' -> '/opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx'
Fri, 03 Dec 2021 21:53:16 GMT
'/opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx' -> '/home/runner/.docker/cli-plugins/docker-buildx'
Fri, 03 Dec 2021 21:53:16 GMT
-rw-r--r-- 1 runner runner 47112192 Dec  3 21:53 /home/runner/.docker/cli-plugins/docker-buildx
Fri, 03 Dec 2021 21:53:16 GMT
v16.13.1
Fri, 03 Dec 2021 21:53:16 GMT
try use node to copy from /opt/hostedtoolcache to /home/runner/.docker/cli-plugins
Fri, 03 Dec 2021 21:53:16 GMT
-rw-r--r-- 1 runner runner 47112192 Dec  3 21:53 /home/runner/.docker/cli-plugins/docker-buildx2
Fri, 03 Dec 2021 21:53:16 GMT
try use node to copy from ./buildx-v0.7.1.linux-amd64 to /opt/hostedtoolcache first and then use cp to copy to /home/runner/.docker/cli-plugins
Fri, 03 Dec 2021 21:53:16 GMT
-rw-r--r-- 1 runner runner 47112192 Dec  3 21:53 /opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx3
Fri, 03 Dec 2021 21:53:16 GMT
'/opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx3' -> '/home/runner/.docker/cli-plugins/docker-buildx4'
Fri, 03 Dec 2021 21:53:16 GMT
-rw-r--r-- 1 runner runner 47112192 Dec  3 21:53 /home/runner/.docker/cli-plugins/docker-buildx4
Fri, 03 Dec 2021 21:53:16 GMT
try use node to copy from ./buildx-v0.7.1.linux-amd64 to /opt/hostedtoolcache first and then copy to /home/runner/.docker/cli-plugins
Fri, 03 Dec 2021 21:53:17 GMT
-rw-r--r-- 1 runner runner 47112192 Dec  3 21:53 /opt/hostedtoolcache/buildx/0.7.1/x64/docker-buildx5
Fri, 03 Dec 2021 21:53:54 GMT
Error: The operation was canceled.

@crazy-max
Copy link
Member

@TingluoHuang

it's related to the double copy... 😵‍💫

Ok so cleaning dest beforehand should mitigate this issue in the meantime.

@maxnowack
Copy link

I've tried to downgrade to summerwind/actions-runner:v2.284.0-ubuntu-20.04, but it doesn't work. Is there a workaround for this issue?

@rjhenry
Copy link

rjhenry commented Dec 6, 2021

I've tried to downgrade to summerwind/actions-runner:v2.284.0-ubuntu-20.04, but it doesn't work. Is there a workaround for this issue?

Yeah, unfortunately the runners self-update. I'm running a very hacky loop on my local machine to keep pipelines running for myself - fortunately, I'm the one triggering most of these pipelines so it's only me inconvenienced.

while true; do
  for pod in $(k get po -n github-actions-runner -l runner-deployment-name=assureddt -o jsonpath='{.items[*].metadata.name}'); do
    k exec -it -n github-actions-runner ${pod} -c runner -- rm -r /opt/hostedtoolcache/buildx/
  done
  sleep 60
done

It's dirty and I'm not proud of it, but it is working.

@crazy-max
Copy link
Member

@maxnowack #117 (comment) should mitigate your issue in the meantime.

@crazy-max
Copy link
Member

@maxnowack
Copy link

Should be solved since actions/runner@v2.285.1 (release)

I can confirm, that it is resolved with the new runner version 🙂

@novascreen
Copy link

we are now running into this issue since our runners updated to v2.288.0

the extra Install Docker Buildx step described earlier does not help

@crazy-max
Copy link
Member

@novascreen Must be linked to actions/runner#1651 which updates node 12 with the same issue. Suggest to open an issue on https://github.com/actions/runner. cc @TingluoHuang @thboop

@TingluoHuang
Copy link

@novascreen what Linux kernel version your environment is on? there was a kernel bug get patched in Linux 5.4 which break node.js

You can read more from actions/runner#1536

@thril
Copy link

thril commented Mar 3, 2022

FYI, this is also happening on runner version 2.288.1, which I believe is expected from what I've read here and in the linked issues. We're using a self-hosted runner and this job is using container: node:14.17.6.

In the "Docker info" section for docker/setup-buildx-action@v1, it does display a warning:

WARNING: Plugin "/github/home/.docker/cli-plugins/docker-buildx" is not valid: failed to fetch metadata: fork/exec /github/home/.docker/cli-plugins/docker-buildx: permission denied

Not sure if that is significant or not. Then on the Download and install buildx section it hangs. This issue is now blocking our deploys. 😢

@dresselm
Copy link
Author

dresselm commented Mar 3, 2022

@crazy-max @thril

This latest GHA runner, 2.288.1, has a NODE update again (https://github.com/actions/runner/releases/tag/v2.288.1, actions/runner#1651) and we are seeing the same issue as before. This change is slowly propagating out as it did last time and some of our runners are running 2.287.1 and some have the latest.

Theoretically, GHA-folk believe that since the Kernel issue is fixed, they are good to go with the fs.copyFileSync

@dresselm
Copy link
Author

dresselm commented Mar 3, 2022

@crazy-max @thril based on a comment with @crazy-max, it appears that we will need to update our runners to use the latest Linux (or other *nix) kernel. Seems like this is not the most customer-friendly way for GHA to manage this upgrade. Seems like recommending the OS-level upgrade first (with a node deprecation warning) would have been better.

@dresselm
Copy link
Author

dresselm commented Mar 3, 2022

@crazy-max any inkling of which runner OS will work today?

@thril
Copy link

thril commented Mar 4, 2022

We ended up pinning our runner to version 2.287.1, at least temporarily. This has us up and running again until the images we use contain the patch.

@crazy-max
Copy link
Member

@dresselm

any inkling of which runner OS will work today?

Any idea @TingluoHuang? If you're using Ubuntu you can check https://bugs.launchpad.net/ubuntu/+source/linux-base/+bug/1953199

To avoid this bug, I'd suggest avoid Linux v5.6.0 - v5.10.83 for now
...
I've confirmed that 5.5.19 does NOT have this bug, and it was introduced in 5.6.0 via torvalds/linux@1a980b8.

@afritzler
Copy link

Can we re-open this issue since we (and other people in this thread) are still running into this problem?

@afritzler
Copy link

afritzler commented Mar 7, 2022

OK, what solved the issue for us was pinning the version to 287.1 and disabling the auto update of the runner:

image: ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner:v2.287.1-ubuntu-20.04
env:
  # Disables automatic runner updates
  - name: DISABLE_RUNNER_UPDATE
    value: "true"

@crazy-max
Copy link
Member

Can we re-open this issue since we (and other people in this thread) are still running into this problem?

Afaik it's a runner/kernel issue, not the action itself.

@tjg184
Copy link

tjg184 commented Apr 8, 2022

Just to help anyone else out looking at this thread, a Linux kernel fix (at least 5.4) will address this issue.

fs.copyFileSync hang

@jankaderabek
Copy link

Hi, I used version v2.287 of github runner as a workarround to the reportred issue. Everything was fine, but from today GH dont't want to process my jobs because is says that I have unsupported version. I've updated to version 2.289.2 but the result of that is that the actions is stucked like is reported in this issue.

Do you have any solution for that?

@crazy-max
Copy link
Member

@jankaderabek

is says that I have unsupported version

do you have more details/logs?

@jankaderabek
Copy link

GH runner version:

Current runner version: '2.289.2'

This is the result of setup-buildx-action:

Run docker/setup-buildx-action@v1
  with:
    driver: docker-container
    buildkitd-flags: --allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host
    install: false
    use: true
  env:
    DOCKER_REGISTRY: registry.digitalocean.com/tapclick
Docker info
  /usr/local/bin/docker version
  Client:
   Version:           [2](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:2)0.10.12
   API version:       1.41
   Go version:        go1.16.12
   Git commit:        e91ed57
   Built:             Mon Dec 1[3](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:3) 11:[4](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:4)0:[5](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:5)7 2021
   OS/Arch:           linux/amd[6](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:6)4
   Context:           default
   Experimental:      true
  
  Server: Docker Engine - Community
   Engine:
    Version:          20.10.12
    API version:      1.41 (minimum version 1.12)
    Go version:       go1.16.12
    Git commit:       459d0df
    Built:            Mon Dec 13 11:46:12 2021
    OS/Arch:          linux/amd64
    Experimental:     false
   containerd:
    Version:          v1.4.12
    GitCommit:        [7](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:7)b11cfaabd73bb[8](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:8)0[9](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:9)07dd23182b9347b4245eb5d
   runc:
    Version:          1.0.2
    GitCommit:        v1.0.2-0-g52b36a2d
   docker-init:
    Version:          0.19.0
    GitCommit:        de40ad0
  /usr/local/bin/docker info
  Client:
   Context:    default
   Debug Mode: false
  
  Server:
   Containers: 0
    Running: 0
    Paused: 0
    Stopped: 0
   Images: 0
   Server Version: 20.[10](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:10).12
   Storage Driver: overlay2
    Backing Filesystem: extfs
    Supports d_type: true
    Native Overlay Diff: true
    userxattr: false
   Logging Driver: json-file
   Cgroup Driver: cgroupfs
   Cgroup Version: 1
   Plugins:
    Volume: local
    Network: bridge host ipvlan macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
   Swarm: inactive
   Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
   Default Runtime: runc
   Init Binary: docker-init
   containerd version: 7b[11](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:11)cfaabd73bb80907dd23182b9347b4245eb5d
   runc version: v1.0.2-0-g52b36a2d
   init version: de40ad0
   Security Options:
    seccomp
     Profile: default
   Kernel Version: 5.10.0-0.bpo.9-amd64
   Operating System: Ubuntu 20.04.4 LTS (containerized)
   OSType: linux
   Architecture: x86_64
   CPUs: 4
   Total Memory: 7.772GiB
   Name: tapclick-runner-dchth-m8vmd
   ID: F77F:R5YT:YWEY:W4DE:64DV:GLXI:F3EO:WU3T:6DA6:BTNX:OQVD:FKPC
   Docker Root Dir: /var/lib/docker
   Debug Mode: false
   Registry: https://index.docker.io/v1/
   Labels:
   Experimental: false
   Insecure Registries:
    [12](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:12)7.0.0.0/8
   Live Restore Enabled: false
   Product License: Community Engine
  
Download and install buildx
  Downloading https://github.com/docker/buildx/releases/download/v0.8.2/buildx-v0.8.2.linux-amd[64](https://github.com/tapclick-cz/vouchers-management-platform/runs/5955822277?check_suite_focus=true#step:6:64)
  Error: The operation was canceled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests