-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows-2022 lose network connection when using wsl1 (regression from windows-2019) #5151
Comments
I have also spent a lot of time trying to figure this out with windows-2022 runners + WSL and the hard crashes make it impossible to really do any troubleshooting. The debug logs don't have any information, there's nowhere left to go from the outside, so we really need some engineering help here. |
Hey @ssbarnea, |
@ssbarnea quick update — we were unable to investigate the issue in the last three weeks due to unpredictable circumstances. We will continue the investigation soon. Sorry for the delay! |
@miketimofeev Thanks for the update. Luckily for us the old windows-2019 is still running. Hopefully you will narrow down the source of these issues. |
@ssbarnea , I am able to reproduce this issue on a self-hosted agent. After activating WSL and msys2 we are getting BSOD( Maybe, it makes sense to ask As a workaround you can run tmate in WSLv1 - https://github.com/mxschmitt/action-tmate/blob/master/src/index.js:
|
I am going to close the thread as external issue. |
Let me be clear, we attempted to add tmate because the runner was crashing anyway, originally we did not had any tmate on it because we did not need one. |
In that case you should create an issue in the repo - https://github.com/microsoft/WSL . Or provide how to reproduce the issue without using tmate step. |
Not really because using WSL2 on my own Windows machine on azure works fine, without these problems. AFAIK, this issue is 100% related to github runners, and it is a clear regression. Keep in mind that the many users might get WSL2 automatically as a side effect of environments being upgraded. Also, due to the nature of the service our hands are tied as we cannot debug the lost network connectivity ourselves. |
@ssbarnea the issue is reproducible when installing GitHub runner agent on the windows-2022 VM, so it can be debugged locally I believe |
Windows Server 2022 doesn't support WSLv2 - microsoft/WSL#6301 (comment) |
Is not supported an euphemism for being broken? I think we need some more clear messaging here. All users will prefer to know if there is a team working or planning to fix it, or not really. https://docs.microsoft.com/en-us/windows/wsl/install-on-server does not indicate in any way that this is not supported for Windows Server editions. AFAIK, WSL2 is implied on newer OS when you do If you run install on 2022, you get v2, not v1 and without any red warning about this being an unsupported platform. https://github.com/actions/virtual-environments#available-environments does not list any Windows non-server options. Collaborating these, should we expect that Microsoft/GitHub do not provide any hosted runners that can run WSL2 under Github action? If that is true, maybe it is time to specify this clear on that page, preferably with bold letters. To clarify, WSL2 is required by any POSIX tools that use containers as container engines (podman or docker) would not run under WSL1. Somehow I do have the impression that the dead-cat is send back and forth between between virtual-environments and WSL teams, none being willing to address the issue or at least to ack as working to address it in a way, either by adding a runner like windows-10 or windows-11 which apparently are not affected by these issues. As a maintainer of a VsCode Ansible extension, I find harder and harder to support use of Microsoft Windows Operating System because it is impossible to run GitHub Action CI/CD under it. Should we start pushing everyone to avoid using Windows and god for either Linux or MacOS?... I really do not want to endup having to pup a big popup that say "Use of this extension under Windows is unsupported, please do not file any bug reports about it.". I hope we can find a solution for this. |
Is not supported an euphemism for being broken? I think we need some more clear messaging here. All users will prefer to know if there is a team working or planning to fix it, or not really. - We could try to help if you provide steps to reproduce the issue without using msys2 subsystem with WSLv1. If you run install on 2022, you get v2, not v1 and without any red warning about this being an unsupported platform. - Currently, only WSLv1 is supported on Window Server 2022 microsoft/WSL#6301 (comment) Collaborating these, should we expect that Microsoft/GitHub do not provide any hosted runners that can run WSL2 under Github action? If that is true, maybe it is time to specify this clear on that page, preferably with bold letters. - We have never mentioned about WSLv2 support on GitHub Actions. To clarify, WSL2 is required by any POSIX tools that use containers as container engines (podman or docker) would not run under WSL1. - In that case you should use a self-hosted runners which support WSLv2. Nothing we can do from our side. |
Description
We were able to identify that certain commands run inside wsl can crash the windows-2022 runners without any possible way to debug it.
It seems to always happen with windows-2022 and never happened with windows-2019.
I spend few DAYS trying to narrow down the issues that causes windows-2022 runner to stop responding and I started to believe that it is crash, causing by some instability.
I even recently got another one stuck at https://github.com/ssbarnea/bug-gha-windows-2022/runs/5362695105?check_suite_focus=true which is a task that works without problems normally.
Please note that I am not the only engineer that faces these problems and I know at least two others that reported similar issues.
Virtual environments affected
Image version and build link
Build link https://github.com/ssbarnea/bug-gha-windows-2022/runs/5362695105?check_suite_focus=true
Is it regression?
YES
Expected behavior
Not a crash.
Actual behavior
Runner stops responding and the job seems stuck at the step that was running until the timeouts come into place.
It should be noted that the step timeout does not work in this case, only the job level timeouts seems to be working.
Any attempt to cancel the workflow will not do anything.
Repro steps
Try https://github.com/ssbarnea/bug-gha-windows-2022/pull/1/files which has a simple workflow that was used as a way to reproduce the problems with minimal amount of code.
Keep in mind: same actions with older windows-2019 seem to be working.
Related thread:
The text was updated successfully, but these errors were encountered: