Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step seems to hang for about 3 minutes after all work has completed #10211

Closed
3 of 13 tasks
komish opened this issue Jul 10, 2024 · 6 comments
Closed
3 of 13 tasks

Step seems to hang for about 3 minutes after all work has completed #10211

komish opened this issue Jul 10, 2024 · 6 comments

Comments

@komish
Copy link

komish commented Jul 10, 2024

Description

I have an action that typically takes 1 to 15 seconds to complete suddenly start taking 3 minutes (give or take seconds) to complete. Log timestamps indicate the actual work of the action is completing in the expected timeframe, but the task continues to hang until the 3 minute timer completes.

E.g. https://github.com/komish/actions-workflow-call-test/actions/runs/9879802864

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • Ubuntu 24.04
  • macOS 12
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Image: ubuntu-22.04
Version: 20240630.1.0 and 2024070708

Is it regression?

Not sure, last successful run was

Image: ubuntu-22.04
Version: 20240624.1.0

Linked below in "Expected behavior".

Expected behavior

I expect the step to terminate relatively quickly after the code completes, give or take a few seconds for things to cleanly shut down.

E.g. https://github.com/openshift-helm-charts/sandbox/actions/runs/9788116992/job/27025670858#step:12:15 showing a 1s runtime

Actual behavior

The step is remaining active for 3 minutes (on average) before it completes, adding ~2m45s of time for just this step.

Repro steps

Run any of these jobs:

https://github.com/komish/actions-workflow-call-test/blob/c35f81829c1f80cc651d7e1d6852cf83590e8cb2/.github/workflows/tools-installer.yml#L30-L48

@hemanthmanga
Copy link
Contributor

@komish Thank you for bringing this issue to us, we are investigating on this issue and will update you on this issue after our findings.

@hemanthmanga
Copy link
Contributor

Hi @komish,

Could you please try with provided workaround,

    runs-on: ubuntu-latest
    steps:
      - name: Install chart-verifier
        uses: redhat-actions/openshift-tools-installer@v1
        with:
          source: github
          skip_cache: false
          chart-verifier: 0.1.0
          

  install-oc-ubuntu-latest:
    runs-on: ubuntu-22.04
    steps:
      - name: Install oc
        uses: redhat-actions/openshift-tools-installer@v1
        with:
          oc: latest
          skip_cache: false
          
          
image

@komish
Copy link
Author

komish commented Jul 15, 2024

@hemanthmanga unfortunately, no, in some cases I need to be able to run this without accessing the cache. Is there an environmental component related to the cache that's causing the hang while skipping the cache?

@komish
Copy link
Author

komish commented Jul 16, 2024

@hemanthmanga are there any leads into where the process hang might be occurring across either the images or the runner code itself?

Very much appreciate the investigation you've done so far; I know this is kind of a strange and elusive problem. At some point soon, I'll need to decide if I need to re-architect workflows to work around this consistently added time. If there are no leads into what might be happening, then I suppose that decision could be made sooner rather than later on my end.

I realize that 3 minutes seems like a small amount of time, but it's causing our workflows to hit the 6 hour timeout (without being close to completing) as we call this workflow in many places. 3 minutes is really adding up at scale for us.

@komish
Copy link
Author

komish commented Jul 17, 2024

@hemanthmanga if you think it's more likely this is runner related (vs. image), I'm happy to start up a thread there.

@komish
Copy link
Author

komish commented Jul 18, 2024

Closing out. My issue seems to be related to https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/.

On Node20, the action hangs. On Node 16/18, no hang. I don't know why - but I would have to assume the runner image is out of scope. Thanks for all of your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants