-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Hangs for libxml2 with MSYS/MinGW in Windows Docker Container #77
Comments
Are you sure linking hangs indefinitely for you? Linking can take pretty long on mingw. Especially, if you are low on RAM and the OS starts swapping to disc... |
LibXML2 normally builds in just a minute or so on the host using a non-MSYS build of the MinGW toolchain, and building LibXML2 on the same image but using CMake with the MSYS toolchain takes a similarly small amount of time - I've let the build process go overnight in the case described above without seeing any progress. If it were to succeed after more than 12 hours I'd say that something is still not working quite right if building the same source with a different build system happens several orders of magnitude more quickly. |
I'm sorry if my issue is not relevant to the topic but it seems to me that it might have the same root cause - we've started experiencing very similar hangs at least for
But for some reason this only happens for builds triggered by Jenkins - manual executions work fine. The builds are run directly on Windows VMs, without Docker. The previous build environment built from scratch on December 3rd works with Jenkins without problems. |
There seem to be some pipe deadlocks going on in cygwin. Their pipe code has been getting some overhaul lately, I'm not sure if all of the current fixes are in 3.3.3 or if there's been more since then. |
Is there any way I could help track down if this is indeed related to pipe deadlocks in the runtime? |
If you install |
Here's the And then these are the results of running The leaf processes have threads with either |
Well, a cursory look at the last 5 files shows sed and grep apparently blocked on reads from pipes. nm, being native, doesn't show much but appears to be blocked on a write. The intervening sh processes are just waiting on child processes. |
That does sound like its related to the changes in I might build an |
there is no real |
/cc @tyan0 any thoughts on this? |
I'll take a look and will give that a shot if there's been any work since the last |
I don't know if this is related or not, but I just saw some hangs with g-ir-scanner calling
|
No, I think this may be arm64 specific. |
The g-ir-scanner thing seems to have gotten better after a reboot. |
I am having the same issue when building mingw. I managed to reduce the issue to the following shell script: #!/bin/sh
seq 1 99999 > big_file
eval '$(eval cmd.exe //c "type big_file" | : )' When running as a normal user this completes immediately, but when run as a system service it hangs forever. The issue appears to be that when running under the My suspicion is that this is caused by f79a461 (which keeps the read end of the pipe open) and b531d6b (which changes the behavior depending on whether or not the program is running as a service). |
Thanks for the detailed investigation! That could explain why I've never seen this (and I've never managed to find time to learn how to set up/use docker on Windows, was thinking about trying to leverage GHA to test it...). I think the next step would be to verify that this dupes on upstream cygwin (I expect it would), and report this there. I wonder why they avoided the better code if running as |
Dupes on upstream cygwin. I've sent a report to them: https://cygwin.com/pipermail/cygwin/2022-March/251097.html |
Interesting. I'm running things as a regular user, but I wonder if the fact that Docker is also involved is triggering the same |
Does that test case hang in a Docker container? Can you run (Windows) |
https://cygwin.com/pipermail/cygwin/2022-March/251100.html
And a proposed patch: |
if it would be helpful, I can open a PR with that patch applied, so that a binary for testing will be available in the CI artifacts. |
#88 has the proposed patch applied |
As requested upstream, a repo with a Github action that reproduces the hang (despite the proposed patch applied in #88): |
The latest proposed patch (https://cygwin.com/pipermail/cygwin-patches/2022q1/011859.html) was applied to #88. This worked as expected in my test action. @pananton (or anyone else who experiences this issue): please test with the current msys-2.0.dll from the artifacts of #88 (this would be https://github.com/msys2/msys2-runtime/suites/5802411523/artifacts/193963886 assuming the URL is stable) |
I can confirm that my problem is fixed with this patch. |
A patch for this issue has landed upstream: e9c96f0. |
I ran into the exact same issue as @pananton when trying to build a package with conan in a Windows Docker Gitlab CI container. In my case I have issues with the m4 package: I already tried to use the provided
Locally on a Windows PC the same m4 conan recipe with the same msys2 conan package (cci.latest) works without any issue. The version of this msys2 conan package is this one: http://repo.msys2.org/distrib/x86_64/msys2-base-x86_64-20220118.tar.xz My interpretation of the error message above is, that the file If I restart the job, then this is not deterministic, which file fails, but always some of the first files it tries to compile. I will now try to use an older msys2 conan package, as mentioned by @pananton here: |
@Pro I can confirm that I also had problems with building m4, but used libiconv as example. And sadly simply replacing msys2.dll did not help. As you've mentioned, build does not stuck with new version but then it fails during make. Actually at least for libiconv it fails from time to time, but sometimes succeeds - that's why I mistakenly commented that problem is solved earlier. It's pretty annoying that m4 conan recipe build fails because it is used for building some other recipes. |
If there's some other issue besides the hang, I recommend opening a new issue here with details/standalone steps to reproduce like in this issue.
My interpretation of the message above is, that something is trying to access an empty filename:
|
Jep, but this output is misleading. It tricked me too. After looking into the depcomp script, you will see the following lines: https://git.savannah.gnu.org/cgit/gnulib.git/tree/build-aux/depcomp#n525
And the Nonetheless, thanks for your hint! I was able to solve my issue: I was finally able to build the There are two things which were the final solution:
---
recipes/msys2/all/conanfile.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/recipes/msys2/all/conanfile.py b/recipes/msys2/all/conanfile.py
index 797bec9..440a3cf 100644
--- a/recipes/conan-center/msys2/all/conanfile.py
+++ b/recipes/conan-center/msys2/all/conanfile.py
@@ -74,6 +74,8 @@ class MSYS2Conan(ConanFile):
self._kill_pacman()
self.run('bash -l -c "pacman --debug --noconfirm --ask 20 -Syuu"') # Normal update
self._kill_pacman()
+ self.run('bash -l -c "pacman --debug --noconfirm --ask 20 -U https://repo.msys2.org/msys/x86_64/msys2-runtime-3.2.0-8-x86_64.pkg.tar.zst https://repo.msys2.org/msys/x86_64/msys2-runtime-devel-3.2.0-8-x86_64.pkg.tar.zst"')
+ self._kill_pacman()
self.run('bash -l -c "pacman --debug -Rc dash --noconfirm"')
except ConanException:
self.run('bash -l -c "cat /var/log/pacman.log || echo nolog"')
@@ -179,6 +181,6 @@ class MSYS2Conan(ConanFile):
self.output.info("Appending PATH env var with : " + msys_bin)
self.env_info.path.append(msys_bin)
-
+
self.conf_info["tools.microsoft.bash:subsystem"] = "msys2"
self.conf_info["tools.microsoft.bash:path"] = os.path.join(msys_bin, "bash.exe")
--
2.17.1 |
It should only fail like that if the |
Ah, correct! https://git.savannah.gnu.org/cgit/gnulib.git/tree/build-aux/depcomp#n126
Not sure why, but I guess one of these commands (maybe sed) was in conflict with the pre-installed Can only guess, but now it's solved for me, at lease when downgrading |
@Pro Is the msys-2.0.dll you used applied the latest patch e9c96f0? From where did you download the countermeasure version of msys-2.0.dll? The binaries at https://github.com/msys2/msys2-runtime/suites/5802411523/artifacts/193963886 seems to be applied the older version patch (perhaps v3 patch). The latest patch is v6. |
It doesn't answer your question as to which was used, but I have kept #88 up to date with the iterations (it should currently be sitting with the committed version cherry-picked). I was thinking about trying to cherry-pick this and maybe some of the other console patches currently on the cygwin-3_3-branch, since there are some important fixes there, but it sounded like we were going to wait for a cygwin release and rebase onto that instead. |
Go to the 'Checks' tab, hit the 'Artifacts' dropdown near the top right, and 'install' is the only artifact. (that's https://github.com/msys2/msys2-runtime/suites/5900014396/artifacts/200749613) |
@jeremyd2019 I can confirm the latest artifact seems to be applied the v6 patch. Thanks. |
This should be fixed now in 3.3.5 |
I'm noticing an odd hang while trying to build
libxml2
with MSYS + MinGW that I'm really struggling to dig into...I'm using MSYS/MinGW within a Windows Docker container - place both of these files into a directory and build with Windows Docker Desktop using
docker build -t test-image .
while in said directory.Dockerfile
mirrorupgrade.hook
Once I'm running that image as a container I'm doing:
The build reliably hangs trying to link:
Here's output from a build using
make V=1
instead in order to show the actual compilation commands:make_output_libxml2.txt
I suspect the issue is something happening within
libtool
or is related to that warning message fromar
when building non-verbosely - if I buildlibxml2
with CMake instead, everything works out fine. I've browsed the patches at https://github.com/msys2/MINGW-packages/tree/master/mingw-w64-libxml2, but none of them seem relevant to this failure mode - and I've seen a very similar failure mode happen in the same container while trying to buildsqlite3
as well.I'll note that building other Autotools packages on the same image - notably
libffi
,libyaml
, andopenssl
seem to work fine.Any ideas on what could be going on? I'm not sure how to proceed on debugging this further.
The text was updated successfully, but these errors were encountered: