Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC fails because argument list created by nix is too long #41340

Open
nh2 opened this issue May 31, 2018 · 96 comments
Open

GCC fails because argument list created by nix is too long #41340

nh2 opened this issue May 31, 2018 · 96 comments

Comments

@nh2
Copy link
Contributor

nh2 commented May 31, 2018

Quick summary

  • nix blows up number of -L flags passed to GCC
  • GCC passes all -L flags to its subprogram cc1 via an environment variable
  • If that's longer than 128 KB, then everything crashes with E2BIG

Details

So once again, I'm building some Haskell with lots of library dependencies.

Trying to upgrade this build to 18.03, the nix-build of the Haskell package newly fails with

Setup: Missing dependencies on foreign libraries:
* Missing C libraries: glog

This error message is wrong (cabal doesn't correctly propagate the argument list too long error shown below; I filed a bug about it being misleading at haskell/cabal#5355); the dependency exists.

The problem is that because in some eventual gcc invocation the amount of arguments passed to GCC turns out to be very long.

strace confirms:

strace -fye execve -s 100000000 -v runhaskell Setup.hs build 2>&1 | grep E2BIG
execve("cc1", [arguments here], [env vars here, "COLLECT_GCC_OPTIONS='-fno-stack-protector -L... 150KB of -L flags here'"] = -1 E2BIG (Argument list too long)

E2BIG (Argument list too long), in this case because COLLECT_GCC_OPTIONS is longer than 128 KB (32 * 4 KB pages, see here, and a repro script I made here).

What is the COLLECT_GCC_OPTIONS environment variable? It is an environment variable set by gcc before calling out to cc1, over which it communicates flags to cc1. Most (if not all?) flags given to gcc will make it into this variable. So it can grow very big (easily larger than the 128 KB limit, especially on nix).

Note that even flags given in a "response file" via gcc @myresponsefile.rsp (which was designed to pass GCC flags via a file instead of command line args to circumvent command line arg limits) will be put into COLLECT_GCC_OPTIONS by gcc itself to communicate them to cc1 (I have just confirmed that with a small example on my Ubuntu 16.04). So using @myresponsefile.rsp is not a workaround. (Yes, this seems to defeat the purpose of response files, but I suspect those were originally made to circumvent a much smaller limit of command line argument on Windows, where the limit is well below the 128 KB limit for environment variable lengths on Linux).


Aside: nix inflates the number of -L flags by the fact that each -L option to gcc is present multiple times, but those duplicates make only for factor 4x or so; even if they were deduplicated, I'd already be at half of MAX_ARG_STRLEN with my medium-size Haskell project; so if I added a couple more dependencies to my Haskell project (all recursive nix Haskell dependencies make it as -L options into the gcc command line), I'd quickly exceed that limit again even without duplication.


Problems to fix

  1. Deduplicating the -L flags passed to GCC will help the issue by a small constant factor and make a couple more projects compile, but won't help with projects with many dependencies.
  2. The fundamental issue seems to be a GCC problem (its way of passing arbitrarily sized information via environment variables to a direct child program doesn't work), so technically it's not nix's department. But we need to do something about it, because otherwise we can't build large Haskell projects (on nix or otherwise).

Steps to reproduce

  • Build a Haskell project with lots of dependencies on nixpkgs 18.03.
    • update: and lots of native C dependencies, and lots of executable sections in the cabal file, see comment below

Environment

  • on top of nixpkgs commit a0b977b; tested on both NixOS and Ubuntu 16.04
@ElvishJerricco
Copy link
Contributor

Build a Haskell project with lots of dependencies on nixpkgs 18.03.

How many dependencies are we talking here? I've worked with quite a few without error.

@nh2
Copy link
Contributor Author

nh2 commented May 31, 2018

CCing people that were in the past involved in dealing with large amount of args to gcc (#26554, #26974, #27609, #27657):

@domenkozar @edolstra @orivej @copumpkin @Ericson2314 @ryantrinkle

@nh2
Copy link
Contributor Author

nh2 commented May 31, 2018

How many dependencies are we talking here? I've worked with quite a few without error.

@ElvishJerricco

  • the top-level cabal file has 155 direct build-depends
  • the nix-shell's ghc-pkg list shows 330 total deps in the ghc-pkg database
  • COLLECT_GCC_OPTIONS has 1790 -L flags (+200 chars that aren't -L flags)
    • deduplicating these -L flags ends up with 700 of them
    • each -L flag is 64 chars long on average
    • most have the form /nix/store/hash...-haskelllibname-1.2.3/lib but non-Haskell buildInputs are in here as well
    • deduplicated, there are ~330 different Haskell libraries with 2 -L entries each:
      • one for /lib and e.g. /lib/ghc-8.2.2/tasty-0.11.3
  • 1790 -L entries * (65 chars + some spaces and quotes I stripped away in my grep) >= 128 K chars

@ElvishJerricco
Copy link
Contributor

Could we use GCC spec files instead of command line arguments? This wouldn't work with Clang, but creating a Clang wrapper for supporting spec files is something that has sounded useful for a while now.

@nh2
Copy link
Contributor Author

nh2 commented Jun 1, 2018

Deduplicating the -L flags passed to GCC will help the issue by a small constant factor and make a couple more projects compile, but won't help with projects with many dependencies.

I am relatively certain that this involves inserting an ordNub into this line: https://github.com/haskell/cabal/blob/db05f8dd42bf28bfe9afa7992f3ca51e0f1af0c1/Cabal/Distribution/Simple/Configure.hs#L1684

just as in the line below that. I have confirmed that this line produces the duplicates.

@nh2
Copy link
Contributor Author

nh2 commented Jun 1, 2018

Could we use GCC spec files instead of command line arguments?

@ElvishJerricco I don't know that. It may well be that gcc still passes options to cc1 via the env var if we do it, but I'm not familiar with them.

Do you know how to use them? If yes, could you make a small example that we can strace to check?

nh2 added a commit to nh2/cabal that referenced this issue Jun 1, 2018
… and `PD.extraLibDirs`.

Should help with big invocations as found in
NixOS/nixpkgs#41340.
nh2 added a commit to nh2/cabal that referenced this issue Jun 1, 2018
… and `PD.extraLibDirs`.

Should help with big invocations as found in
NixOS/nixpkgs#41340.
@nh2
Copy link
Contributor Author

nh2 commented Jun 1, 2018

I can see the duplicated -L paths already in echo $NIX_LDFLAGS in nix-shell.

The nixpkgs manual says

Bintools Wrapper's setup hook causes any lib and lib64 subdirectories to be added to NIX_LDFLAGS.

but I haven't quite found yet where it's done and whether we can dedupe it there.

@nh2
Copy link
Contributor Author

nh2 commented Jun 1, 2018

I found there's another part to it (haskell/cabal#5356 (comment)):

Independet of Haskell depencencies, for system dependencies (so, stuff that makes it in via no-Haskell-package-buildInputs), Cabal adds a duplicated -L flag for each executable, test-suite and benchmark in the .cabal file.

My case has 11 of those, so that also contributes to some blow-up.

@Ericson2314
Copy link
Member

Great cabal fixes! We should set strictDeps = true; in generic builder to further deduplicate things.

@colonelpanic8
Copy link
Contributor

@Ericson2314 I think the strictDeps = true; fix does not work for stack. We'll need the cabal fix to get that working.

colonelpanic8 pushed a commit to colonelpanic8/cabal that referenced this issue Jun 4, 2018
… and `PD.extraLibDirs`.

Should help with big invocations as found in
NixOS/nixpkgs#41340.
@i-am-the-slime
Copy link

So is there a workaround/trick how I can build a stack project now? Can I downgrade the version of something?

nh2 added a commit to nh2/cabal that referenced this issue Jun 10, 2018
… and `PD.extraLibDirs`.

Should help with big invocations as found in
NixOS/nixpkgs#41340.
@nh2
Copy link
Contributor Author

nh2 commented Jun 10, 2018

@i-am-the-slime Building stack against a version of the Cabal library that has this patch haskell/cabal#5356 should fix it.

nh2 added a commit to nh2/cabal that referenced this issue Jun 14, 2018
… and `PD.extraLibDirs`.

Should help with big invocations as found in
NixOS/nixpkgs#41340.
23Skidoo pushed a commit to nh2/cabal that referenced this issue Jun 20, 2018
… and `PD.extraLibDirs`.

Should help with big invocations as found in
NixOS/nixpkgs#41340.
23Skidoo pushed a commit to haskell/cabal that referenced this issue Jun 20, 2018
…LibDirs`.

Should help with big invocations as found in
NixOS/nixpkgs#41340.
@nh2
Copy link
Contributor Author

nh2 commented Jun 22, 2018

My cabal PR has been merged which should provide a temporary mitigation: haskell/cabal#5356

For the long run we still have to fix GCC upstream to not use the length-limited COLLECT_GCC_OPTIONS environment variable for passing things around.

@nh2
Copy link
Contributor Author

nh2 commented Jun 22, 2018

@Mistuke I saw your comment and a mentioned fix on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030

What confuses me is that everthing there talks about -L flags being passed via the command line while what we observe here is that they are being passed between gcc and cc1 via an environment variable COLLECT_GCC_OPTIONS (which also breaks due to the length limit on env vars).

Do you know if this should be fixed by that upstream patch?

@Mistuke
Copy link

Mistuke commented Jun 22, 2018

@nh2 probably, the code filters out the -L very early on. So what comes out of do_spec_1 will already be shortened so COLLECT_GCC_OPTIONS should use it. It does honor response files so it shouldn't have re-expanded the options yet.

If you want confirmation I can test using a GCC 9 build on Monday.

@Mistuke
Copy link

Mistuke commented Jun 25, 2018

@nh2 So I'm not sure it'll fix your particular problem.. It definitely adds the -L to a response file now. but it seems it also still expands them into COLLECT_GCC_OPTIONS which seems like a bug.

But also more annoyingly, collect2 doesn't seem to pass them on in an response file when it calls ld. So it seems like the fix is too superficial.

@nh2
Copy link
Contributor Author

nh2 commented Jun 28, 2018

@Mistuke Thanks for double-checking that for us. Will you follow up with upstream GCC on your issue, or should I do that?

@Mistuke
Copy link

Mistuke commented Jun 29, 2018 via email

@nh2
Copy link
Contributor Author

nh2 commented Jun 30, 2018

@Mistuke Sure, just let me know when I shall comment!

@domenkozar
Copy link
Member

domenkozar commented Jul 1, 2018

I have a similar issue on macOS using GHC 8.4.3:

Linking Setup ...
clang-5.0: warning: argument unused during compilation: '-nopie' [-Wunused-command-line-argument]
/nix/store/ckq71kkymh1ji2b44xn80wmr7fmi6wr5-clang-wrapper-5.0.2/bin/cc: line 183: /nix/store/bcl9zj60h52p47dy85s326mdrqx52417-clang-5.0.2/bin/clang: Argument list too long
`cc' failed in phase `Linker'. (Exit code: 126)

@Gabriella439
Copy link
Contributor

Gabriella439 commented Jan 31, 2023

Also, for ghc specifically this might also help: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9897

Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this issue Feb 3, 2023
The motivation behind this is to alleviate the problem
described in NixOS#41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in NixOS#112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up a
PR to fix that:

tpoechtrager/cctools-port#131

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this issue Feb 22, 2023
The motivation behind this is to alleviate the problem
described in NixOS#41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in NixOS#112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up two
PRs to fix that:

tpoechtrager/cctools-port#131
tpoechtrager/cctools-port#132

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this issue Feb 22, 2023
The motivation behind this is to alleviate the problem
described in NixOS#41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in NixOS#112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up two
PRs to fix that:

tpoechtrager/cctools-port#131
tpoechtrager/cctools-port#132

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Gabriella439 added a commit that referenced this issue Feb 24, 2023
The motivation behind this is to alleviate the problem
described in #41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in #112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up a
PR to fix that:

tpoechtrager/cctools-port#131

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
@Gabriella439
Copy link
Contributor

We've verified internally that the combination of #213831 and and https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9897 fixes Haskell builds on macOS to no longer exceed ARG_MAX by consistently using response files throughout the chain (ghc, cc-wrapper, and ld-wrapper).

Fixing Linux builds still might require a tiny bit more work (since ld-wrapper currently only enables response files for cctools, which is only used on macOS).

This obviates the need to deduplicate or compress command-line arguments, but that's still a good thing to do anyway.

@jsoo1
Copy link
Contributor

jsoo1 commented Sep 15, 2023

Unfortunately, limiting argv is not enough to avoid E2BIG on linux. The kernel checks if there is enough stack space for both argv and environ before execing anything: https://github.com/torvalds/linux/blob/4eb2bd24756e0c8e254de8931ba7ee4346e75bbc/fs/exec.c#L509

Edit: This link does not really capture just how much of a risk E2BIG is. For reference, just search through this file for E2BIG.

@bbarker
Copy link
Contributor

bbarker commented Jun 11, 2024

With GHC 9.4.5 on NixOS 24.05, it seems I am still encountering the issue when trying to build xmobar:

/ghc-bignum-1.3/include -I/nix/store/k2nxkk1glg5blqjpak6incqyjyhgdnlm-elfutils-0.191-dev/include -I/nix/store/1xcnwyx7pdmdq66xdz20fbl2q0arjzrl-libffi-3.4.6-dev/include -I/nix/store/lnp12bydmfwjf19wbnw0xzynq1xx86ad-ghc-9.6.4/lib/ghc-9.6.4/lib/../lib/x86_64-linux-ghc-9.6.4/rts-1.0.2/include -I/nix/store/lnp12bydmfwjf19wbnw0xzynq1xx86ad-ghc-9.6.4/include/
xmobar> error: gcc: fatal error: cannot execute ‘/nix/store/14c6s4xzhy14i2b05s00rjns2j93gzz4-gcc-13.2.0/libexec/gcc/x86_64-unknown-linux-gnu/13.2.0/cc1’: execv: Argument list too long
xmobar> compilation terminated.

Quite possibly, I'm going about this the wrong way - I have detailed instructions and all the relevant files at https://github.com/bbarker/dotxmonad/tree/0fdc2c9f4efb2895350474c2ceb7784cbf1c919f - any suggestions would be welcome.

@nh2
Copy link
Contributor Author

nh2 commented Jun 16, 2024

I'm encountering it on NixOS 24.05 with the Haskell opencv package; Cabal outputs:

Error: Setup: Missing dependencies on foreign libraries:
* Missing (or bad) C libraries: stdc++, opencv_gapi, opencv_stitching,
opencv_alphamat, opencv_aruco, opencv_bgsegm, opencv_bioinspired,
opencv_ccalib, opencv_dnn_objdetect, opencv_dnn_superres, opencv_dpm,
opencv_face, opencv_freetype, opencv_fuzzy, opencv_hdf, opencv_hfs,
opencv_img_hash, opencv_intensity_transform, opencv_line_descriptor,
opencv_mcc, opencv_quality, opencv_rapid, opencv_reg, opencv_rgbd,
opencv_saliency, opencv_stereo, opencv_structured_light,
opencv_phase_unwrapping, opencv_superres, opencv_optflow,
opencv_surface_matching, opencv_tracking, opencv_highgui, opencv_datasets,
opencv_text, opencv_plot, opencv_videostab, opencv_videoio,
opencv_xfeatures2d, opencv_shape, opencv_ml, opencv_ximgproc, opencv_video,
opencv_xobjdetect, opencv_objdetect, opencv_calib3d, opencv_imgcodecs,
opencv_features2d, opencv_dnn, opencv_flann, opencv_xphoto, opencv_photo,
opencv_imgproc, opencv_core

because the g++ invocations that probe for the libraries produce E2BIG (Argument list too long) (only visible in strace due to haskell/cabal#5355).

@jsoo1
Copy link
Contributor

jsoo1 commented Jun 16, 2024

This is most (in my mind) a gcc bug. Most everything that gcc drives gets invoked with a response file. There is one exception: collect2. It gets invoked with an env var that is multiple megabytes in size by the time it gets there. Please see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030 and maybe push on upstream a bit if you can.

I think the problem is a hard one to solve but would make a lot of people's lives a lot better.

There is a workaround - put your problematic modules in separate packages altogether. That's what we ended up doing at $WORK.

@nh2
Copy link
Contributor Author

nh2 commented Jun 16, 2024

@jsoo1 Yes, definitely a GCC bug.

@nh2
Copy link
Contributor Author

nh2 commented Jun 16, 2024

A workaround for #41340 (comment) that is to set

__propagatePkgConfigDepends = false;

which is from

, # Cabal 3.8 which is shipped by default for GHC >= 9.3 always calls
# `pkg-config --libs --static` as part of the configure step. This requires
# Requires.private dependencies of pkg-config dependencies to be present in
# PKG_CONFIG_PATH which is normally not the case in nixpkgs (except in pkgsStatic).
# Since there is no patch or upstream patch yet, we replicate the automatic
# propagation of dependencies in pkgsStatic for allPkgConfigDepends for
# GHC >= 9.3 by default. This option allows overriding this behavior manually
# if mismatching Cabal and GHC versions are used.
# See also <https://github.com/haskell/cabal/issues/8455>.
__propagatePkgConfigDepends ? lib.versionAtLeast ghc.version "9.3"

That existing is to some extent my fault:

@spacekitteh
Copy link
Contributor

A workaround for #41340 (comment) that is to set

__propagatePkgConfigDepends = false;

Any idea how to do this with haskell-flake?

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/trouble-building-xmobar-argument-list-too-long/46897/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.