-
-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudaPackages: use the same libstdc++ as the rest of nixpkgs #223664
Conversation
overrideCC stdenv (wrapCCWith rec { | ||
cc = compatibleStdenv.cc.cc; | ||
libcxx = stdenv.cc.cc.lib; | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never used wrapCCWith
and overrideCC
before, so I've no idea if I'm doing this correctly. I'm also probably abusing the libcxx
argument, I guess it's a hatch for clang, but it did the job for me somehow
I'd really appreciate a comment from @FRidh who had suggested overrideCC
at some point
Interesting! Curious to see if such an approach could work! |
Result of 18 packages failed to build:
94 packages built:
|
Failed derivations
|
# NB: We don't actually know if this is the right thing to do | ||
backendStdenv.cc.cc.lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably should leave a warning somewhere, but if one accidentally uses backendStdenv.cc.cc.lib
they'll end up breaking things by linking against the old libstdc++
: .lib
is just an output that cc
happens to have, and wrapCCWith
doesn't expose any passthru for libcxx
Result of 293 packages failed to build:
1290 packages built:
|
Failed derivations
|
Well, I'm sorry, it's been emotional |
|
So, IIIUC, with this PR we would only need to use |
Yes. We'll build with (at the time of writing) gcc11, but we'll link to the same libstdc++ we expect other nixpkgs citizens to link to. This way we should be able to co-exist within the same process's namespace I'll repeat once again that |
Could we drop/override it somehow to prevent the footgun? Something like backendStdenv = backendStdenv.override { cc.cc.lib = throw "deprecated! use X instead!"; } |
As such, |
Indeed, I have previously been a follower of such advice :) I like the idea of preemptively avoiding footguns entirely, but perhaps doing so is excessively onerous
I love the idea of this change, but I don't think I can support merging something that breaks JAX since it is so central to the CUDA/ML package ecosystem. What do you think would be required to unf#ck jax? Also, perhaps we should target staging for a change this far-reaching? Wdyt? |
JAX has been failing on master for quite some time now, last time I checked was 2 days ago: https://hercules-ci.com/github/SomeoneSerge/nixpkgs-unfree/jobs/3696 It's just that I was hoping this PR would fix jax as well. My hypothesis is that it didn't because we use
Actually, unless one sets
...but maybe I just got too accustomed to cuda changes triggering rebuilds of the entire universe?
I'm absolutely with you on this one. Only that in this case I feel we'd have to touch some internals of Nix, which would likely mean creating a new footgun
Again, I apologize for the language (although I had rather got it out and apologized, than kept it). |
ah yes, those failures are a PITA! However, they should be fixed by #219778 which i'm hoping to get merged tomorrow. Maybe that will fix JAX 🤞
Mmm gotcha, I didn't realize this would require Nix-internals meddling...
Oh no apologies necessary -- i thought it was funny! Yeah, this would be an ideal scenario... Perhaps one day we can get there haha |
As far as I can see, it's a different failure. The one observed in hercules logs is a libstdc++ mismatch on |
Yes, that's correct. Perhaps I should be more specific: As far as I'm aware, what has happened recently is
Anyhow, I was assuming that you were commenting build failures with JAX as opposed to runtime libstdc++ failures! |
Gotcha! I was referring to |
In the hindsight, the way we (I) handled libstdc++ in #218265 was very much a mistake and we should remember it, because the failure is rather nasty: I just unet-corresp/notebooks on master [$!?] via 🐍 v3.10.10 via ❄️ impure (nix-shell-env) took 2s
❯ python 2023-03-29.py
Traceback (most recent call last):
File "/home/ss/Sources/unet-corresp/notebooks/2023-03-29.py", line 21, in <module>
import zmq
File "/nix/store/rjdykq4lhq711y7gbidap60754ma5wmc-python3-3.10.10-env/lib/python3.10/site-packages/zmq/__init__.py", line 103, in <module>
from zmq import backend
File "/nix/store/rjdykq4lhq711y7gbidap60754ma5wmc-python3-3.10.10-env/lib/python3.10/site-packages/zmq/backend/__init__.py", line 31, in <module>
raise original_error from None
File "/nix/store/rjdykq4lhq711y7gbidap60754ma5wmc-python3-3.10.10-env/lib/python3.10/site-packages/zmq/backend/__init__.py", line 26, in <module>
_ns = select_backend(first)
File "/nix/store/rjdykq4lhq711y7gbidap60754ma5wmc-python3-3.10.10-env/lib/python3.10/site-packages/zmq/backend/select.py", line 31, in select_backend
mod = import_module(name)
File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/nix/store/rjdykq4lhq711y7gbidap60754ma5wmc-python3-3.10.10-env/lib/python3.10/site-packages/zmq/backend/cython/__init__.py", line 6, in <module>
from . import (
ImportError: /nix/store/205vsmbfhq1q2vhgskpqyymqvba4mscp-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/j4jznffk0l87frb77llh0ap3j0zahl5b-zeromq-4.3.4/lib/libzmq.so.5) Moving
|
|
Result of 14 packages failed to build:
99 packages built:
|
Failed derivations
|
Good idea! I think that makes thing much simpler, which is always a good thing I guess my only remaining question (#223664 (comment)) is why don't we remove |
Oh, we still are using it, in downstream packages 🙃 E.g. in tensorflow we're currently swapping the stdenv for backendStdenv, so that tf does use gcc11. With this PR though, tensorflow will not only consume gcc11, it'll consume gcc11 with gcc12's libstdc++. The footgun is at |
FWIW, I feel like these ad hoc issues are worth a separate PR. I'm willing to bet this must be caused by some hacks applied upstream |
inherit (cudaFlags) cudaCapabilities dropDot; | ||
|
||
stdenv = if cudaSupport then backendStdenv else inputs.stdenv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samuela look, this worked 😆 😅
import faiss
import zmq
works now.
So yeah, further on the roadmap is to add hooks for nvcc and FindCUDAToolkit.cmake, and rename this into cudaStdenv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdenv = if cudaSupport then backendStdenv else inputs.stdenv; | |
stdenv = if cudaSupport then backendStdenv else stdenv; |
@samuela I cherry-picked the This one, however, seems to be ready to go. And I already have ideas for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I'm starting to wrap my head around what's going on here. So IIUC
- Compilers, like gcc, are derivations just like all derivations, usually named
cc
. They also havecc.lib
outputs where their correspondinglibcxx
Libraries live. A Compiler derivation,cc
, outputs a naive executable that does not have any knowledge of the Libraries incc.lib
. - A Compiler,
cc
, is wrapped with wrapCCWith or wrapCC, producing a Wrapped Compiler that is automatically configured to find certain libraries, possibly (but not necessarily) includingcc.lib
. Confusingly, the Wrapped Compilers are also often namedcc
in which casecc.cc
is the underlying Compiler andcc.cc.lib
is the Compiler's correspondinglibcxx
. - Wrapped Compilers and Libraries (including
libcxx
?) are bundled together into Standard Environments, usually namedstdenv
. What wrapping Standard Environments do in addition to Wrapped Compilers is not entirely clear to me. But in this case,stdenv.cc
is the Wrapped Compiler,stdenv.cc.cc
is the underlying Compiler, andstdenv.cc.cc.lib
is the Library corresponding to the underlying Compiler.
The goal of this PR is to build CUDA-related packages with gcc11's Compiler cc
(as required by CUDA) in tandem with gcc12's cc.lib
Library (as is used throughout the rest of Nixpkgs). After this PR, backendStdenv
will be a Standard Environment with a compiler toolchain including gcc11's Compiler cc
, but building against gcc12's libcxx
Library.
Is this understanding correct?
Btw, given a Standard Environment, say backendStdenv
, how does one access the appropriate libcxx
Library? I understand that backendStdenv.cc.cc.lib
is Not The Way...
@@ -25,17 +25,19 @@ | |||
builtins.head optLevels | |||
, faiss # To run demos in the tests | |||
, runCommand | |||
}: | |||
}@inputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}@inputs: | |
}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdenv = if cudaSupport then backendStdenv else stdenv;
This causes infinite recursion in (whatever version of Nix I've tried this with a month ago)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like
effectiveStdenv = if cudaSupport then backendStdenv else stdenv;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that earlier in tensorflow, I think. Sandro pointed out that this way it's really easy for someone to accidentally refer to the old stdenv
since it's still in the scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that's a fair point. I'm happy to go either way
inherit (cudaFlags) cudaCapabilities dropDot; | ||
|
||
stdenv = if cudaSupport then backendStdenv else inputs.stdenv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdenv = if cudaSupport then backendStdenv else inputs.stdenv; | |
stdenv = if cudaSupport then backendStdenv else stdenv; |
|
||
overrideCC stdenv (wrapCCWith rec { | ||
cc = compatibleStdenv.cc.cc; | ||
libcxx = stdenv.cc.cc.lib; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simply set this to null
a la
libcxx = null; |
on that note, could we just use
overrideCC stdenv compatibleStdenv.cc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: would it be possible to build against gcc11's libcxx
using overrideCC stdenv compatibleStdenv.cc
and then link/patchelf with gcc12's libcxx
in buildInputs
? Here we would be taking advantage of backwards compatibility between libcxx
versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simply set this to null
We need exactly the opposite: we want this to be gcc12's libstdc++. If we use libstdc++ from gcc11 we get the mistmatch as soon as we import any C++ library from the normal nixpkgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not build with the standard gcc11 toolchain and then link/patchelf against gcc12's libcxx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, theoretically, we could. In fact, that was pennae's original suggestion. I think this would be more error-prone: we can forget to remove a reference to gcc11 in one place, we can forget to add a reference to gcc12 in another place, and we'd also have to do that in some hooks. I absolutely despise hooks, though I usually have to submit to them.
The approach in this PR, however, means that gcc11's libstdc++ never enters our build environment, unless a user accidentally adds it. The latter is a possibility, and I myself have made this mistake several times already. I think we want to run sanity checks, not patchelf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh ok, I gotcha! Thanks for explaining!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker for this PR, but would it be possible to include a simple check for gcc11's libstdc++ in preBuild
or something like that in backendStdenv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a number arguments that pkgs/build-support/cc-wrapper/default.nix
accepts. I would guess that both extraBuildCommands and {extra,native}{Tools,Packages}
must allow setting up with a bit of bash hackery. I have not yet explored them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned, I'd also like cudaStdenv
to include a working nvcc
, so there's another reason to learn more about these inputs
overrideCC nixpkgsStdenv (wrapCCWith { | ||
cc = nvccCompatibleStdenv.cc.cc; | ||
|
||
# This option is for libcxx, but we (ab)use it for gcc's libstdc++. Note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This option is for clang" typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant it this way, but maybe "for clang" is clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I didn't realize that libcxx was a clang-specific library. I assumed that gcc also called it libcxx. Perhaps "for clang's libcxx" would be clearest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it also "rhymes" with the end of the sentence: "gcc's libstdc++")
Yes, you understand the purpose of this PR exactly right!1
Ideally, we don't have to access it. I still do not know of a good way to access it through Nix. Mind that some stdenv's do not come with a c++ standard library at all. We could try and provide a passthru, but we'd be diverging from the rest of nixpkgs Footnotes
|
d5fffbe
to
3af299f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM any blockers to merging?
@samuela I think it's ready to go. All the extra work I can think of goes outside the current scope |
Woo! Thanks for all your hard work and patient explanations @SomeoneSerge ! |
Description of changes
A potential solution to #220341, work in progress.
TLDR: we're dealing with libstdc++ mismatches when using cuda-enabled c++ packages (built using gcc11, because nvcc requires so) with any other c++ packages built via gcc12.
@pennae suggested we could try and rely on libstdc++ backward compatibility, specifically we could build with gcc11 and patchelf the resulting artifacts to consume libstdc++ from gcc12
In a similar spirit, this PR explores the possibility of directly passing the new libstdc++ to the old gcc11, avoiding the need to patchelf (and maintain the patchelfing logic)
Things done
easyocr
fails without this patch, but succeeds with itsandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)CC @NixOS/cuda-maintainers