cudaPackages: append $ORIGIN to Runpaths #226038

SomeoneSerge · 2023-04-13T16:14:20Z

Description of changes

Re-opening #225955 to target staging because modifying autopatchelf triggers rebuilds for some measly 21k packages...

@NixOS/cuda-maintainers

Things done

samuela · 2023-04-13T16:30:24Z

Ah I had not previously considered rebuilds caused by autopatchelf that are not CUDA-related

samuela · 2023-04-13T20:01:17Z

LGTM any objections to merging?

SomeoneSerge · 2023-04-13T20:51:45Z

Hmm, does anyone in particular own the autopatchelf script?

SomeoneSerge · 2023-04-13T20:52:34Z

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix

@@ -38,6 +38,12 @@ backendStdenv.mkDerivation {
    stdenv.cc.cc.lib
  ];

+  # Picked up by autoPatchelf
+  # Needed e.g. for libnvrtc to locate (dlopen) libnvrtc-builtins
+  appendRunpaths = [


Btw, I wanted to name this autoPatchelfAppendRunpaths so that it would be clear which hook this variable is picked up by, but it was a little too mouthful

I would rather have the long, complex name. There is no point trying to provide short names, that give little indication of how they work. Also, this variable should not be used too often. The only reason the other variables are so short is mostly historical. runtimeDependencies was already a mistake. Let's not clutter the namespace with more generic variables. More recent additions are of the form autoPatchelfIgnoreMissingDeps.

layus

Hi @SomeoneSerge, I do happen to own the autoPatchelf script. Mostly because I re-wrote that in python. I do however have some questions:

Do you really need $ORIGIN added to the libraries ? If you only need the binaries, you could already use --runtime-dependencies to add fixed rpath values. Although it is a bit cumbersome because it adds /lib.
Do you really need an indirect, relative path like $ORIGIN ? I mean, you already know what $ORIGIN is at this stage, right ? So you could use the final, resolved value.
Do you need $ORIGIN added everywhere ? If it is only a few libraries that need to dlopen, you could fix these manually, with patchelf.
Why do you append these paths instead of prepending ? Won't we soon need another flag to prepend paths ?

What I worry about here is feature creep. autoPatchelfHook is intended to discover and fix as many things as possible, but this extension makes it a bit too magical, less disciplined. It will be really tempting to just add whatever path is missing/undetected in that argument, and call it a day. And that is rather a job for a while loop applying patchelf. I do not want autoPatchelfHook to become the go-to tool to replace all custom patchelf invocations with mkDerivation attributes.

Now, I agree this extension has its uses, because dlopen() is a pain sometimes. I would have loved to have that when packaging matlab. But in the end there was only one matlab lib that needed x11 libs, and once identified adding x11 to the RPATH of all the binaries seemed wasteful.

Doing a cursory glance over nixpkgs, it seems like the feature is not commonly needed. I found only one use case of autoPatchelfHook and $ORIGIN together, in cudann:

nixpkgs/pkgs/development/libraries/science/math/cudnn/generic.nix

Lines 94 to 97 in bca7aaf

    
               # Without --add-needed autoPatchelf forgets $ORIGIN on cuda>=8.0.5. 
        
               postFixup = strings.optionalString (strings.versionAtLeast versionTriple "8.0.5") '' 
        
                 patchelf $out/lib/libcudnn.so --add-needed libcudnn_cnn_infer.so 
        
               '';

I find their solution very elegant. It makes the implicit dlopen dependency explicit, so that autoPatchelf can do it's magic correctly. So I guess this question supersedes all the above: Would that solution work for you, or do you really need a big nuke adding to all the rpaths ?

SomeoneSerge · 2023-04-14T01:16:45Z

Hello @layus, and thank you! I start at the end and work back to the beginning

--add-needed ...
... I find their solution very elegant. It makes the implicit dlopen dependency explicit

Yes, and this is also what I started with when we have encountered libnvrtc.so failing to load its builtings (libnvrtc-builtins.so): SomeoneSerge@f55d2a1.
This works, however as you just noted, this is not the first time we encounter this issue: libcudnn_cnn_infer.so has troubled as first and long time ago, and for all this long time we didn't even notice that our libnvrtc.so didn't always work. Looking at cuda_nvrtc.src, all the libraries there come with an empty runpath and empty DT_NEEDED. In other words, they're all suspect to randomly calling dlopen() with any of their neighbours as targets. Ultimately, I think in this situation we should save developers' time and let cuda libraries "see" adjacent files when they dlopen, without waiting for the next report

With this assumption, we should either put $n^2$ entries in each DT_NEEDED (taking care to skip certain libraries, the so called "stubs"), or we append an equivalent of $ORIGIN to the runpaths. The remaining question is $ORIGIN vs the store path. I tried to inquire on matrix, but ended up arbitrarily picking $ORIGIN, because it meant that larger part of the outputs' content is constant. I'm aware this means that copying these binaries to other locations affects the linker's search order. I'm not particularly opposed to replacing it with ${placeholder "out"}/lib, or with making autopatchelf compute an actual equivalent of $ORIGIN (thinking of libraries that live elsewhere than $out/lib)

What I worry about here is feature creep

Yes, this was my exact concern and the reason I wanted to wait for someone contributing to autoPatchelf.
As a matter of fact, I have caught myself wishing for an unconditional --append-rpath and --prepend-rpath more than, but you're right that usually there's a better way without them. In #225240 however I specifically want to ignore DT_NEEDED and append a path unconditionally. I'm still not sure if I'd rather solve this by introducing --append-rpath or --append-parent-dir

Won't we soon need another flag to prepend paths

Tbh, I feel like providing both

Why do you append these paths instead of prepending

This is a good question. I can provide only a few tangential arguments in favour of keeping the append way:

I'm trying to address a particular failure mode where cuda libraries try to locate their builtins and do not find anything at all. Lowest priority/--append-rpath is sufficient
I feel like dependencies explicitly defined in DT_NEEDED (if any) should have a higher priority

Do you need $ORIGIN added everywhere

Current intent is to patchelf all shared libraries

Do you really need an indirect, relative path like $ORIGIN

As admitted above, I don't really need it. I tried to find if there is a preferred way in nixpkgs and it seemed like there really isn't.

To speculate a little more on this, it seems to me that $ORIGIN is only dangerous when one copies the binaries. I didn't test this, but it seems like the dynamic linker should resolve symlinks before evaluating $ORIGIN. In other respects, $ORIGIN even has nice properties: e.g. fewer unique files in the store when using several nixpkgs revisions.

Do you really need $ORIGIN added to the libraries ? If you only need the binaries

I specifically want to address the libraries. This PR modifies the executables as well, but it's more of a side effect that I don't mind. We probably should do this to the executables as well, but they're a more complicated story we didn't really touch upon

This turned to be a pretty long reply, but I think it fills in most of the missing context. As you see there are indeed alternatives available. What do you think?

SomeoneSerge · 2023-04-14T01:26:26Z

To clarify, I think this PR is approximately the smallest code change that satisfies #225240: e.g. no --prepend-rpath, and no ORIGIN to absolute path resolution, because neither was required. It is not the smallest change in terms of side effects and debt. I'm somewhat OK with merging it and observing the potential fallout. I'm open to incorporating any suggestions

layus · 2023-04-14T09:27:50Z

Thanks you for replying in detail. I see the same problems surfacing over and over. I guess I am being pedantic here about forcing users in the "right" way to use the tool. Maybe they will prefer fast rather than precise, and that is valid too. I can only guess the amount of time you already spent on this.

About $ORIGIN, I think it makes a lot of sense. It is already used in several places in nixpkgs, too.

So, I am frustrated that there seems to be no perfect, clean way to do automatic patching, but that does not make this PR invalid. Let's merge :-).

SomeoneSerge · 2023-04-14T09:33:21Z

I actually came up with a argument for expanding ORIGIN over the night: if we do make a copy of libnvrtc.so and it contains $ORIGIN, we have to remember to copy libnvrtc-builtins.so as well. If we had expanded $ORIGIN into the respective absolute path, the copied .so's runpath would've signaled an explicit dependency on "something in the old directory"

SomeoneSerge · 2023-04-14T09:35:45Z

Oh, and also I didn't update appendRunpaths to autoPatchelfAppendRunpaths yet. I'll try to quickly open a follow-up when I get to the lab

SomeoneSerge · 2023-04-14T09:44:06Z

And more on the feature creep, another feature I still crave for is being able to pass an attrset with the extra needed/extra runpaths to a derivation. Sort of

mkDerivation {
  elfEnsureNeeded = {
    "$out/lib/libnvrtc.so" = "libnvrtc-builtins.so";
  };
  elfEnsureRunpath = {
    "**/*.so" = addOpenGLRunpath.driverLink;
    "$out/lib/*.so" = "${placeholder "out"}/lib";
  };
  ...
}

layus · 2023-04-14T10:27:07Z

I actually came up with a argument for expanding ORIGIN over the night: if we do make a copy of libnvrtc.so and it contains $ORIGIN, we have to remember to copy libnvrtc-builtins.so as well. If we had expanded $ORIGIN into the respective absolute path, the copied .so's runpath would've signaled an explicit dependency on "something in the old directory"

This is valid, but why would you copy that somewhere else ? And again, this PR only improves the packaging, it does not make it perfect. And that is fine. Thank you for fixing the dependencies, that is already a big improvement.

Oh, and also I didn't update appendRunpaths to autoPatchelfAppendRunpaths yet. I'll try to quickly open a follow-up when I get to the lab

Oops :-). Ping me on the updated PR. And if you do not find time to do it, it is also fine. It's not like this is worse than runtimeDependencies ;-).

layus · 2023-04-14T10:30:22Z

And more on the feature creep, another feature I still crave for is being able to pass an attrset with the extra needed/extra runpaths to a derivation. Sort of

Not sure why you would want that ? As assertions to ensure that autopatchelf did work as intended ? Or as an advanced api to autoPatchelf for extra dlopen dependencies ?

Do not get me wrong, more features can be a good thing, I just like to have a clear use-case with them.

SomeoneSerge · 2023-04-14T15:15:47Z

Not sure why you would want that ? As assertions to ensure that autopatchelf did work as intended ? Or as an advanced api to autoPatchelf for extra dlopen dependencies ?

As a declarative way to prepare instructions (and assertions) for autopatchelf (or a similar tool) to perform (and verify). This way I'd be able to inspect these instructions with a simple nix eval too

layus · 2023-04-14T18:35:59Z

While assertions kind of make sense, instructions seems like they would require a different tool. If your instructions are complete, there is no need for auto patchelf. It's just advanced patchelf instructions. If you want partial requirements and assertions, well, that indeed looks like a whole new feature for autoPatchelf 🤩

SomeoneSerge added 2 commits April 13, 2023 19:12

autoPatchelfHook: add appendRunpaths argument

b612fe3

cudaPackages: append $ORIGIN to runtime paths

97870fb

SomeoneSerge changed the base branch from master to staging April 13, 2023 16:14

SomeoneSerge marked this pull request as ready for review April 13, 2023 16:32

SomeoneSerge requested a review from layus as a code owner April 13, 2023 16:32

SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Apr 13, 2023

samuela approved these changes Apr 13, 2023

View reviewed changes

SomeoneSerge commented Apr 13, 2023

View reviewed changes

layus reviewed Apr 13, 2023

View reviewed changes

layus merged commit 46a39c4 into NixOS:staging Apr 14, 2023

SomeoneSerge mentioned this pull request Apr 21, 2023

Explicit DT_NEEDED NVIDIA/build-system-archive-import-examples#5

Open

This was referenced May 8, 2023

cudaPackages: keep $ORIGIN in Runpaths? #225240

Closed

cudaPackages.cudatoolkit: use nix-built dependencies to avoid spurious failures #224646

Closed

SomeoneSerge mentioned this pull request Jun 5, 2023

cudaPackages.cudatoolkit: mark libnvrtc-builtins needed for libnvrtc #235982

Merged

12 tasks

yorickvP mentioned this pull request Sep 21, 2023

/run/opengl-driver/lib not added to torch nix-community/dream2nix#698

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaPackages: append $ORIGIN to Runpaths #226038

cudaPackages: append $ORIGIN to Runpaths #226038

SomeoneSerge commented Apr 13, 2023

samuela commented Apr 13, 2023

samuela commented Apr 13, 2023

SomeoneSerge commented Apr 13, 2023

SomeoneSerge Apr 13, 2023

layus Apr 13, 2023

layus left a comment

SomeoneSerge commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023 •

edited

Loading

layus commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023 •

edited

Loading

SomeoneSerge commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023 •

edited

Loading

layus commented Apr 14, 2023

layus commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023

layus commented Apr 14, 2023

	# Without --add-needed autoPatchelf forgets $ORIGIN on cuda>=8.0.5.
	postFixup = strings.optionalString (strings.versionAtLeast versionTriple "8.0.5") ''
	patchelf $out/lib/libcudnn.so --add-needed libcudnn_cnn_infer.so
	'';

cudaPackages: append $ORIGIN to Runpaths #226038

cudaPackages: append $ORIGIN to Runpaths #226038

Conversation

SomeoneSerge commented Apr 13, 2023

Description of changes

Things done

samuela commented Apr 13, 2023

samuela commented Apr 13, 2023

SomeoneSerge commented Apr 13, 2023

SomeoneSerge Apr 13, 2023

Choose a reason for hiding this comment

layus Apr 13, 2023

Choose a reason for hiding this comment

layus left a comment

Choose a reason for hiding this comment

SomeoneSerge commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023 • edited Loading

layus commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023 • edited Loading

SomeoneSerge commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023 • edited Loading

layus commented Apr 14, 2023

layus commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023

layus commented Apr 14, 2023

SomeoneSerge commented Apr 14, 2023 •

edited

Loading

SomeoneSerge commented Apr 14, 2023 •

edited

Loading

SomeoneSerge commented Apr 14, 2023 •

edited

Loading