Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages: append $ORIGIN to Runpaths #226038

Merged
merged 2 commits into from
Apr 14, 2023

Conversation

SomeoneSerge
Copy link
Contributor

Description of changes

Re-opening #225955 to target staging because modifying autopatchelf triggers rebuilds for some measly 21k packages...

@NixOS/cuda-maintainers

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@SomeoneSerge SomeoneSerge changed the base branch from master to staging April 13, 2023 16:14
@samuela
Copy link
Member

samuela commented Apr 13, 2023

Ah I had not previously considered rebuilds caused by autopatchelf that are not CUDA-related

@samuela
Copy link
Member

samuela commented Apr 13, 2023

LGTM any objections to merging?

@SomeoneSerge
Copy link
Contributor Author

Hmm, does anyone in particular own the autopatchelf script?

@@ -38,6 +38,12 @@ backendStdenv.mkDerivation {
stdenv.cc.cc.lib
];

# Picked up by autoPatchelf
# Needed e.g. for libnvrtc to locate (dlopen) libnvrtc-builtins
appendRunpaths = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I wanted to name this autoPatchelfAppendRunpaths so that it would be clear which hook this variable is picked up by, but it was a little too mouthful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have the long, complex name. There is no point trying to provide short names, that give little indication of how they work. Also, this variable should not be used too often. The only reason the other variables are so short is mostly historical. runtimeDependencies was already a mistake. Let's not clutter the namespace with more generic variables. More recent additions are of the form autoPatchelfIgnoreMissingDeps.

Copy link
Member

@layus layus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SomeoneSerge, I do happen to own the autoPatchelf script. Mostly because I re-wrote that in python. I do however have some questions:

  1. Do you really need $ORIGIN added to the libraries ? If you only need the binaries, you could already use --runtime-dependencies to add fixed rpath values. Although it is a bit cumbersome because it adds /lib.
  2. Do you really need an indirect, relative path like $ORIGIN ? I mean, you already know what $ORIGIN is at this stage, right ? So you could use the final, resolved value.
  3. Do you need $ORIGIN added everywhere ? If it is only a few libraries that need to dlopen, you could fix these manually, with patchelf.
  4. Why do you append these paths instead of prepending ? Won't we soon need another flag to prepend paths ?

What I worry about here is feature creep. autoPatchelfHook is intended to discover and fix as many things as possible, but this extension makes it a bit too magical, less disciplined. It will be really tempting to just add whatever path is missing/undetected in that argument, and call it a day. And that is rather a job for a while loop applying patchelf. I do not want autoPatchelfHook to become the go-to tool to replace all custom patchelf invocations with mkDerivation attributes.

Now, I agree this extension has its uses, because dlopen() is a pain sometimes. I would have loved to have that when packaging matlab. But in the end there was only one matlab lib that needed x11 libs, and once identified adding x11 to the RPATH of all the binaries seemed wasteful.

Doing a cursory glance over nixpkgs, it seems like the feature is not commonly needed. I found only one use case of autoPatchelfHook and $ORIGIN together, in cudann:

# Without --add-needed autoPatchelf forgets $ORIGIN on cuda>=8.0.5.
postFixup = strings.optionalString (strings.versionAtLeast versionTriple "8.0.5") ''
patchelf $out/lib/libcudnn.so --add-needed libcudnn_cnn_infer.so
'';

I find their solution very elegant. It makes the implicit dlopen dependency explicit, so that autoPatchelf can do it's magic correctly. So I guess this question supersedes all the above: Would that solution work for you, or do you really need a big nuke adding to all the rpaths ?

@SomeoneSerge
Copy link
Contributor Author

Hello @layus, and thank you! I start at the end and work back to the beginning

--add-needed ...
... I find their solution very elegant. It makes the implicit dlopen dependency explicit

Yes, and this is also what I started with when we have encountered libnvrtc.so failing to load its builtings (libnvrtc-builtins.so): SomeoneSerge@f55d2a1.
This works, however as you just noted, this is not the first time we encounter this issue: libcudnn_cnn_infer.so has troubled as first and long time ago, and for all this long time we didn't even notice that our libnvrtc.so didn't always work. Looking at cuda_nvrtc.src, all the libraries there come with an empty runpath and empty DT_NEEDED. In other words, they're all suspect to randomly calling dlopen() with any of their neighbours as targets. Ultimately, I think in this situation we should save developers' time and let cuda libraries "see" adjacent files when they dlopen, without waiting for the next report

With this assumption, we should either put $n^2$ entries in each DT_NEEDED (taking care to skip certain libraries, the so called "stubs"), or we append an equivalent of $ORIGIN to the runpaths. The remaining question is $ORIGIN vs the store path. I tried to inquire on matrix, but ended up arbitrarily picking $ORIGIN, because it meant that larger part of the outputs' content is constant. I'm aware this means that copying these binaries to other locations affects the linker's search order. I'm not particularly opposed to replacing it with ${placeholder "out"}/lib, or with making autopatchelf compute an actual equivalent of $ORIGIN (thinking of libraries that live elsewhere than $out/lib)

What I worry about here is feature creep

Yes, this was my exact concern and the reason I wanted to wait for someone contributing to autoPatchelf.
As a matter of fact, I have caught myself wishing for an unconditional --append-rpath and --prepend-rpath more than, but you're right that usually there's a better way without them. In #225240 however I specifically want to ignore DT_NEEDED and append a path unconditionally. I'm still not sure if I'd rather solve this by introducing --append-rpath or --append-parent-dir

Won't we soon need another flag to prepend paths

Tbh, I feel like providing both

Why do you append these paths instead of prepending

This is a good question. I can provide only a few tangential arguments in favour of keeping the append way:

  • I'm trying to address a particular failure mode where cuda libraries try to locate their builtins and do not find anything at all. Lowest priority/--append-rpath is sufficient
  • I feel like dependencies explicitly defined in DT_NEEDED (if any) should have a higher priority

Do you need $ORIGIN added everywhere

Current intent is to patchelf all shared libraries

Do you really need an indirect, relative path like $ORIGIN

As admitted above, I don't really need it. I tried to find if there is a preferred way in nixpkgs and it seemed like there really isn't.

To speculate a little more on this, it seems to me that $ORIGIN is only dangerous when one copies the binaries. I didn't test this, but it seems like the dynamic linker should resolve symlinks before evaluating $ORIGIN. In other respects, $ORIGIN even has nice properties: e.g. fewer unique files in the store when using several nixpkgs revisions.

Do you really need $ORIGIN added to the libraries ? If you only need the binaries

I specifically want to address the libraries. This PR modifies the executables as well, but it's more of a side effect that I don't mind. We probably should do this to the executables as well, but they're a more complicated story we didn't really touch upon


This turned to be a pretty long reply, but I think it fills in most of the missing context. As you see there are indeed alternatives available. What do you think?

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Apr 14, 2023

To clarify, I think this PR is approximately the smallest code change that satisfies #225240: e.g. no --prepend-rpath, and no ORIGIN to absolute path resolution, because neither was required. It is not the smallest change in terms of side effects and debt. I'm somewhat OK with merging it and observing the potential fallout. I'm open to incorporating any suggestions

@layus
Copy link
Member

layus commented Apr 14, 2023

Thanks you for replying in detail. I see the same problems surfacing over and over. I guess I am being pedantic here about forcing users in the "right" way to use the tool. Maybe they will prefer fast rather than precise, and that is valid too. I can only guess the amount of time you already spent on this.

About $ORIGIN, I think it makes a lot of sense. It is already used in several places in nixpkgs, too.

So, I am frustrated that there seems to be no perfect, clean way to do automatic patching, but that does not make this PR invalid. Let's merge :-).

@layus layus merged commit 46a39c4 into NixOS:staging Apr 14, 2023
@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Apr 14, 2023

I actually came up with a argument for expanding ORIGIN over the night: if we do make a copy of libnvrtc.so and it contains $ORIGIN, we have to remember to copy libnvrtc-builtins.so as well. If we had expanded $ORIGIN into the respective absolute path, the copied .so's runpath would've signaled an explicit dependency on "something in the old directory"

@SomeoneSerge
Copy link
Contributor Author

Oh, and also I didn't update appendRunpaths to autoPatchelfAppendRunpaths yet. I'll try to quickly open a follow-up when I get to the lab

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Apr 14, 2023

And more on the feature creep, another feature I still crave for is being able to pass an attrset with the extra needed/extra runpaths to a derivation. Sort of

mkDerivation {
  elfEnsureNeeded = {
    "$out/lib/libnvrtc.so" = "libnvrtc-builtins.so";
  };
  elfEnsureRunpath = {
    "**/*.so" = addOpenGLRunpath.driverLink;
    "$out/lib/*.so" = "${placeholder "out"}/lib";
  };
  ...
}

@layus
Copy link
Member

layus commented Apr 14, 2023

I actually came up with a argument for expanding ORIGIN over the night: if we do make a copy of libnvrtc.so and it contains $ORIGIN, we have to remember to copy libnvrtc-builtins.so as well. If we had expanded $ORIGIN into the respective absolute path, the copied .so's runpath would've signaled an explicit dependency on "something in the old directory"

This is valid, but why would you copy that somewhere else ? And again, this PR only improves the packaging, it does not make it perfect. And that is fine. Thank you for fixing the dependencies, that is already a big improvement.

Oh, and also I didn't update appendRunpaths to autoPatchelfAppendRunpaths yet. I'll try to quickly open a follow-up when I get to the lab

Oops :-). Ping me on the updated PR. And if you do not find time to do it, it is also fine. It's not like this is worse than runtimeDependencies ;-).

@layus
Copy link
Member

layus commented Apr 14, 2023

And more on the feature creep, another feature I still crave for is being able to pass an attrset with the extra needed/extra runpaths to a derivation. Sort of

Not sure why you would want that ? As assertions to ensure that autopatchelf did work as intended ? Or as an advanced api to autoPatchelf for extra dlopen dependencies ?

Do not get me wrong, more features can be a good thing, I just like to have a clear use-case with them.

@SomeoneSerge
Copy link
Contributor Author

Not sure why you would want that ? As assertions to ensure that autopatchelf did work as intended ? Or as an advanced api to autoPatchelf for extra dlopen dependencies ?

As a declarative way to prepare instructions (and assertions) for autopatchelf (or a similar tool) to perform (and verify). This way I'd be able to inspect these instructions with a simple nix eval too

@layus
Copy link
Member

layus commented Apr 14, 2023

While assertions kind of make sense, instructions seems like they would require a different tool. If your instructions are complete, there is no need for auto patchelf. It's just advanced patchelf instructions. If you want partial requirements and assertions, well, that indeed looks like a whole new feature for autoPatchelf 🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants