Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows runners are consistently extremely slow compared to Linux and macOS #7320

Open
3 of 10 tasks
myitcv opened this issue Mar 21, 2023 · 32 comments
Open
3 of 10 tasks

Comments

@myitcv
Copy link

myitcv commented Mar 21, 2023

Description

In the CUE project we are seeing incredibly slow run times on Windows 2022 runners.

For a recent example see https://github.com/cue-lang/cue-trybot/actions/runs/4477594562/.

Roughly speaking, here are the numbers we are seeing averaged out across 20-30 builds per day.

OS actions/checkout go test (cache hit)
ubuntu-22.04 2-3 secs 8 secs
macOS-11 5 secs 9 secs
windows-2022 15-20 secs 120+ secs

Notice that actions/checkout is consistently slower on Windows. Yes, there is a network access element to this, but it is minimal.

The go test comparison is the cleanest comparison. As indicated in the column heading, these are the timings for a full cache hit. i.e. no network access required, no rebuilds required, no tests actually running (because they hit the test cache). So the go test command is purely a function of disk access and CPU. For this command, Windows is consistently 15 times slower than Linux. macOS is, pleasingly, comparable with Linux.

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 18.04
  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 11
  • macOS 12
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

For windows-2019:

Current runner version: '2.303.0'
Operating System
  Microsoft Windows Server [2](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:2)019
  10.0.1776[3](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:3)
  Datacenter
Runner Image
  Image: windows-2019
  Version: 2023031[4](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:4).1
  Included Software: https://github.com/actions/runner-images/blob/win19/20230314.1/images/win/Windows2019-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/win19%2F20230314.1
Runner Image Provisioner
  2.0.12[7](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:8).1

For windows-2022:

Current runner version: '2.303.0'
Operating System
  Microsoft Windows Server [2](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:2)022
  10.0.20[3](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:3)[4](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:4)8
  Datacenter
Runner Image
  Image: windows-2022
  Version: 20230314.1
  Included Software: https://github.com/actions/runner-images/blob/win22/20230314.1/images/win/Windows2022-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20230314.1
Runner Image Provisioner
  2.0.12[7](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:8).1

Is it regression?

Unclear

Expected behavior

Windows runners to be comparable in terms of speed to Linux and macOS for actions/checkout and go test steps.

Actual behavior

Windows runners consistently taking 15 times as long as Linux and macOS for CPU and disk-intensive commands.

Repro steps

The CUE repo itself is quite involved. So as a proxy for something that is relatively CPU and disk intensive we have created a slimmed down repo using actions/checkout.

https://github.com/myitcvscratch/slow-windows-actions

See the most recent run for results:

https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477601338

Looking at averages of this setup across a number of runs we see similar figures for actions/checkout to those seen in our CUE setup:

Runner actions/checkout
ubuntu-20.04 3 secs
ubuntu-22.04 3 secs
macos-11 5 secs
macos-12 5 secs
windows-2019 20 secs
windows-2022 20 secs

So whilst this doesn't include the go test step (because getting a warm cache is a tricky step to reproduce) the use of actions/checkout is a sufficiently good proxy to show the problem.

@ilia-shipitsin
Copy link
Contributor

I suspect it might be Windows Defender in action.
can you try the following (in order to check whether Windows Defender disabling can help) ?

    - run: Set-MpPreference -DisableRealtimeMonitoring $true
      shell: powershell

@mvdan
Copy link

mvdan commented Mar 21, 2023

Here are many other people similarly seeing that actions/checkout is very slow on Windows: actions/checkout#1150

@myitcv
Copy link
Author

myitcv commented Mar 21, 2023

@ilia-shipitsin

I suspect it might be Windows Defender in action.
can you try the following (in order to check whether Windows Defender disabling can help) ?

See commit myitcvscratch/slow-windows-actions@8b71de3 which resulted in run https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4480846656/jobs/7876717612. It has basically no effect.

I also tried making this change in the CUE project in https://review.gerrithub.io/c/cue-lang/cue/+/551317. That resulted in https://github.com/cue-lang/cue-trybot/actions/runs/4476128503 which again showed no effect.

So assuming my testing is valid, turning off Windows Defender does not appear to have any effect in our case.

@myitcv
Copy link
Author

myitcv commented Mar 21, 2023

Here are many other people similarly seeing that actions/checkout is very slow on Windows: actions/checkout#1150

Thanks, @mvdan.

Just to emphasise however that the use of actions/checkout in my example above is a proxy for the much bigger problem we are seeing in the go test step in the CUE project. It's very likely not the best proxy, but hopefully good enough.

The reason I flag this is that ultimately we will consider this issue "fixed" when the go test step is "fast" and not just an improvement in the actions/checkout step.

@ilia-shipitsin
Copy link
Contributor

I've also tried to disable Windows Defender, no significant difference so far

https://github.com/ilia-shipitsin/slow-windows-actions/actions/runs/4483502669/jobs/7882868552

@ilia-shipitsin
Copy link
Contributor

ilia-shipitsin commented Mar 21, 2023

@myitcv , as for go test task performance degradation, do you see appropriate performance on standalone (not related to GH actions) Win 2019/2022 server ?

I mean, is degradation runner specific or platform specific

@myitcv
Copy link
Author

myitcv commented Mar 21, 2023

@myitcv , as for go test task performance degradation, do you see appropriate performance on standalone (not related to GH actions) Win 2019/2022 server ?

I mean, is degradation runner specific or platform specific

We only have the numbers from GitHub actions workflow runs.

@ilia-shipitsin
Copy link
Contributor

we need to narrow it, whether standalone Windows server behaves the same slow or not

@mikhailkoliada
Copy link
Contributor

Duplicate of #5166

@mikhailkoliada mikhailkoliada marked this as a duplicate of #5166 Mar 22, 2023
@myitcv
Copy link
Author

myitcv commented Mar 22, 2023

@mikhailkoliada - isn't #5166 demonstrating a slowdown between 2019 and 2022?

The numbers we are seeing show consistent slowness on both 2019 and 2022.

Therefore I'm not clear this is a duplicate.

@ilia-shipitsin
Copy link
Contributor

side note, I did some investigation on "checkout slowness", it looks like there's some delay between git and automation task. git itself takes 2-3 seconds (I put commands into cmd and wrapped with Measure-Command { ... }

image

I can beleive that agent communication could add 15-20 sec, but it does not look like a root cause for go test slowness.

@nebuk89
Copy link

nebuk89 commented Mar 23, 2023

Hey! let me chat with the team and see what is going on and see if we need a separate wrap up ticket for Windows Perf (or I will get someone to re-open this one!)

@jianges jianges reopened this Mar 24, 2023
@nebuk89
Copy link

nebuk89 commented Mar 24, 2023

@mvdan @myitcv we think this may be the same as the checkout issue. Given that 'hunch' we will tackle that tracking our progress here: actions/checkout#1186

We will keep this issue open until we can validate if it is a dupe and go from there (or start working on a new root cause for this after I guess 😱 let's hope not!)

@myitcv
Copy link
Author

myitcv commented Mar 24, 2023

@nebuk89 - thanks for looking into this and the detailed update. Much appreciated.

nicholasbishop added a commit to nicholasbishop/uefi-rs that referenced this issue Mar 28, 2023
This job is still timing out sometimes. There are some open bugs about slow
Windows runners, maybe relevant:
actions/runner-images#7320
phip1611 pushed a commit to nicholasbishop/uefi-rs that referenced this issue Mar 28, 2023
This job is still timing out sometimes. There are some open bugs about slow
Windows runners, maybe relevant:
actions/runner-images#7320
nicholasbishop added a commit to rust-osdev/uefi-rs that referenced this issue Mar 28, 2023
This job is still timing out sometimes. There are some open bugs about slow
Windows runners, maybe relevant:
actions/runner-images#7320
cueckoo pushed a commit to cue-lang/cue-trybot that referenced this issue Jan 3, 2024
Trying a suggestion mentioned at:

    actions/runner-images#7320 (comment)

Signed-off-by: Paul Jolly <paul@myitcv.io>
Change-Id: I37200a5c6bc936f2de3bb8b96034a2892404e48a
Dispatch-Trailer: {"type":"trybot","CL":1174197,"patchset":1,"ref":"refs/changes/97/1174197/1","targetBranch":"master"}
@astrojuanlu
Copy link

I switched from windows-2022 to windows-2019 and the speed increased tremendously.

Any chance we can get Windows XP runners?

@myitcv
Copy link
Author

myitcv commented Jan 31, 2024

I switched from windows-2022 to windows-2019 and the speed increased tremendously.

For completeness, this had no impact in our situation.

@zanieb
Copy link

zanieb commented Mar 22, 2024

Switching to windows-2019 and disabling Windows Defender did not improve performance in my use-case.

@Somfic
Copy link

Somfic commented May 22, 2024

Also having issues with Windows runners, they are extremely slow compared to their MacOS and Ubuntu counterparts:

image

(Can't link as the actions are in a private repo)

@harshavardhana
Copy link

harshavardhana commented May 22, 2024

Our tests are now running at 50+mins compared to Linux < 10mins, moving to windows-2019 didn't fix anything, neither did disabling windows defender.

We are moving away from GitHub managed runners for these, moving to in-house self-hosted runners.

@zanieb
Copy link

zanieb commented May 23, 2024

For what it's worth, over at Astral we switched to using a Dev Drive with ReFS to great benefit astral-sh/uv#3522 (the GitHub Windows Runners are still the bane of my existence though)

@texadactyl
Copy link

End of July 2024 - any progress on this oldish Windows anomaly?

My partner and I test our work on Ubuntu, MacOS, and Windows (significantly slower) to make sure we can run on all 3 environments. We are lucky that our runs are short at the moment (< 8 minutes) but expect project growth and therefore longer run times in the future.

@JacopoMadaluni
Copy link

JacopoMadaluni commented Sep 5, 2024

Our benchmarks are also incredibly slow on Windows.
Our MacOS build takes about 5m (checkout, setup-rust, setup-node, build)
The windows counterpart takes 20+ minutes.

The build in our case is a Rust package, windows spends 10x more time to download and compile crates compared to MacOS and Linux.

Screenshot 2024-09-05 at 12 59 49

lukebakken added a commit to rabbitmq/rabbitmq-dotnet-client that referenced this issue Sep 5, 2024
lukebakken added a commit to rabbitmq/rabbitmq-dotnet-client that referenced this issue Sep 6, 2024
Discovered while updating `rabbitmq/rabbitmq-tutorials` to version `7.0.0-rc.8` of this library.

* Add `trackConfirmations` argument to `ConfirmSelectAsync` to allow disabling internal confirm tracking.

* Increase CI timeouts since GHA Windows runners are slow (actions/runner-images#7320)
@stephen-carter-at-sf
Copy link

stephen-carter-at-sf commented Sep 6, 2024

Yeah this is aweful. Besides jobs taking way longer, I also always get failures from network timeouts when on windows only.

yarn install v1.22.22
[1/[5](https://github.com/forcedotcom/sfdx-scanner/actions/runs/10745601373/job/29805079241?pr=1620#step:4:6)] Validating package.json...
[2/5] Resolving packages...
[3/5] Fetching packages...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
error Error: https://registry.yarnpkg.com/rxjs/-/rxjs-6.6.7.tgz: ESOCKETTIMEDOUT
...

It only ever happens on windows runners... never on macos or linux runners.
:-(

@texadactyl
Copy link

Not a Windoze lover but our project workfile runs on 3 OSes; Windows is the slowewwwwwwwwwwweest leg in our jobs.

cc: @platypusguy

@torokati44
Copy link

This is getting ridiculous, to the point where "optimizations" actually turn out to be "pessimizations", because of this.
See: ruffle-rs/ruffle#17529 (comment)

Together with https://github.com/orgs/community/discussions/42335, not to mention actions/runner#1182 (to potentially reduce copy-pasting somewhat, as a workaround for the workaround for the slowness...), it really makes Windows+GHA difficult to like...

@marvin-j97
Copy link

image
🥲

@QuantumQuin
Copy link

I recently had to use a Windows runner and came here to say the experience was awful due to how slow it is. On the same size runner, it can take more than 8x the time vs using a Mac or Linux runner. I had to use Windows to simulate a specific environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests