Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glibc: implement LD_FALLBACK_PATH environment variable #248547

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

oxij
Copy link
Member

@oxij oxij commented Aug 11, 2023

Description of changes

This adds LD_FALLBACK_PATH environment variable support to glibc. From the commit message:

Subject: [PATCH] Implement support for LD_FALLBACK_PATH envvar and related
 command line options

LD_FALLBACK_PATH has the same semantics as LD_LIBRARY_PATH but unlike
LD_LIBRARY_PATH, instead of having the highest priority, it has the
second-lowest (it gets used just before the default system directories).

Thus, LD_FALLBACK_PATH provides a way to specify fallback library paths.

In other words, LD_FALLBACK_PATH provides a way to override system libraries
without creating a chroot.

Consider the following use cases:

* Sometimes you just want to change a set of system libraries for a set of
  programs without changing the system default. For instance, if you have
  different GPUs requiring different libGL, CUDA, and/or OpenCL
  implementations. (Though, specifically for libGL there are better solutions
  for per-screen routing of OpenGL calls.)

  LD_FALLBACK_PATH is especially useful in a developer setting given that
  maintaining multiple (e.g. for each GPU type) per-project chroot
  environments is pretty hard.

* Running a closed-source program (like Steam and its games) on an unsupported
  distribution (this patch was originally created for SLNOS, a distribution
  based on NixOS): closed-source programs commonly override LD_LIBRARY_PATH to
  replace system libraries with their own bundled versions, but they
  frequently do it the wrong way by simply assigning LD_LIBRARY_PATH to point
  at their own /lib directory instead of prepending that /lib path to the
  current value of LD_LIBRARY_PATH, thus failing to pick up any non-system
  libraries they don't bundle but require specific versions of.

  From the perspective of a closed-source program's vendor, LD_FALLBACK_PATH
  --- being the low-priority option --- can't really be used to consistently
  achieve the same effect to as overriding LD_LIBRARY_PATH, thus making it
  unlikely they would ever touch it. But it can be used by other tools to
  supply needed default libraries to the closed-source program in question.

With LD_FALLBACK_PATH, you can just drop all those libraries you want to
replace your defaults into a directory and put its path into LD_FALLBACK_PATH,
or use Nix or a similar package manager to do that for you.

Why?

Basically, LD_FALLBACK_PATH provides a cheap ad-hoc alternative to generating app-specific chroots for making wrappers that want to change default libraries.

I made it as a better alternative to #31263 in #31263 (comment), and kept it updated since. If you look through that issue, you'll see that neither the original problem, nor any of the related ones mentioned there are solved, even though most of the issues are closed as stale.

I've been using this thing since Spring 2018 without issues.

Pings

ping @domenkozar and @Ericson2314 who wanted me to PR this and @timokau who also reported happily using my patch in #59595 (comment)

Copy link
Contributor

@Atry Atry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice

@Atry
Copy link
Contributor

Atry commented Aug 13, 2023

By the way, I created a similar PR #248777 , then I just realized you already created this.

@Atry
Copy link
Contributor

Atry commented Aug 13, 2023

Will you try to submit this patch to the upstream glibc?

Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thrown this against upstream glibc already? Even though ldd(1) seems to have issues with that? Asking because every glibc patch causes an additional maintenance burden and I'd really like to know if there's a realistic chance of this ever getting upstreamed.

@ghost
Copy link

ghost commented Aug 14, 2023

If upstream changes that block of code that was copy-pasted, are we going to notice and change our copy? What if they fix an LPE there?

Could we have this patch controlled by an off-by-default flag, build an extra copy glibc-with-ld-fallback, and pass --set-interpreter ${glibc-with-ld-fallback}/bin/ld.so when patchelfing said video games? It sounds like this feature is intended for relatively-leaf-like nodes of the package graph.

@oxij
Copy link
Member Author

oxij commented Aug 14, 2023

@Atry @Ma27 Yes, I do plan to submit it upstream after I figure out the ldd issue.

@oxij
Copy link
Member Author

oxij commented Aug 14, 2023

@amjoseph-nixpkgs

If upstream changes that block of code that was copy-pasted, are we going to notice and change our copy? What if they fix an LPE there?

Yes, I think I fixed that locally similarly to your suggestion above, re-building to check now.

Could we have this patch controlled by an off-by-default flag, build an extra copy glibc-with-ld-fallback, and pass --set-interpreter ${glibc-with-ld-fallback}/bin/ld.so when patchelfing said video games? It sounds like this feature is intended for relatively-leaf-like nodes of the package graph.

Sure, a separate glibc will solve Steam issues. But doing this to the default glibc also allows to cheaply replace libpulse with https://github.com/oxij/libcardiacarrest at build-time and then replace back with libpulse on per-app basis at runtime, for instance.

@jeff-hykin
Copy link

jeff-hykin commented Aug 14, 2023

Will you try to submit this patch to the upstream glibc?

I agree, and think this is pretty important. The glibc maintainers will have a much better idea of what kind of complications/benefits such a change could cause.

@ghost
Copy link

ghost commented Aug 16, 2023

Sure, a separate glibc will solve Steam issues.

👍

But doing this to the default glibc also allows to cheaply replace libpulse with https://github.com/oxij/libcardiacarrest at build-time

Meh, I already carry a (large) patch that simply disables libpulse throughout nixpkgs (and sets libpulse.meta.broken=true to be sure). It's not that hard. I just don't want to fight the people who seem religiously opposed to merging it; it isn't that important to me.

and then replace back with libpulse on per-app basis at runtime, for instance.

That is a pretty esoteric use case.

Wouldn't it make more sense for libcardiacarrest to have an environment variable that causes it to dlopen() libpuslse and forward all calls to it?

I get really nervous about fiddling around with glibc's dynamic loader, nixpkgs-wide. There is a lot of voodoo in there, and a long history of exploits when environment variables are consulted during the dynamic loading process. I think this is a cool feature but for it to be on-by-default everywhere... I'm not sure about the cost/benefit there.

@ghost
Copy link

ghost commented Aug 16, 2023

I agree, and think this is pretty important. The glibc maintainers will have a much better idea of what kind of complications/benefits such a change could cause.

OTOH @oxij may not get an entirely unbiased hearing there, since this feature is useful almost exclusively for closed-source software. I dunno.

I would support merging this as off-by-default (i.e. an extra glibc-with-ldfallback, opt-in) immediately while trying to upstream the patch in parallel. If that process bogs down we can revisit this, with the benefit of people (besides the author and a few others) having used it in the meantime.

@Atry
Copy link
Contributor

Atry commented Aug 16, 2023 via email

@jeff-hykin
Copy link

jeff-hykin commented Aug 16, 2023

off-by-default (i.e. an extra glibc-with-ldfallback, opt-in) immediately while trying to upstream the patch in parallel

Even though I'm mostly opposed to adding this fallback on something so core/critical, this^ approch sounds fair to me. I know getting a review for something major like glibc is probably going to take forever and have a lot of churn, so it feels bad to block what might be a quick fix for the python many Linux abi and Steam.

I'm going to have to defer to others though in terms of whether this adds too much complexity to the already massive nixpkgs.

@Atry If you have any more examples, or can find examples of packages that can be fixed with this change, I think that would make for a really strong argument for merging this PR. (basically showing this is a general technique/hack instead of just a hack that should be done inside of steam and python)

@Atry
Copy link
Contributor

Atry commented Aug 16, 2023

@jeff-hykin see cachix/devenv#773 , #248777, and https://github.com/Atry/nix-ld-so-cache-test

@jeff-hykin
Copy link

You're doing a pretty good job removing my reservations about this PR 😁

@Atry if you've got a fork or a patch file for the python change, I'll test it out with some of my projects to support/validate it.

@Atry
Copy link
Contributor

Atry commented Aug 19, 2023

@jeff-hykin See https://github.com/Atry/nix-ld-so-cache-test

@oxij
Copy link
Member Author

oxij commented Aug 26, 2023

I pushed 6960b056594bbf413d076ea60e0c08573b1efe91 into here. It, I think, fixes all the nitpicks. It also seems to be working for the last 12 days on my system so, it's probably ok.

@oxij
Copy link
Member Author

oxij commented Mar 13, 2024 via email

@SemMulder
Copy link

SemMulder commented Mar 13, 2024

Slightly off-topic, but posting it here because it seems to target the same use-case.

Just encountered Flox over at https://github.com/flox/flox and https://flox.dev/.

Which has something called ld-floxlib which uses LD_AUDIT to achieve similar semantics as what this PR is trying to achieve, but without patching glibc.

@SomeoneSerge
Copy link
Contributor

On macOS, shared libraries are referenced via absolute path by nix built executables, therefore they would not be affected ...

So this isn't directly related to the current issue, which is that "nixpkgs with LD_LIBRARY_PATH" (on platforms which use it) are "broken" (in the sense that it breaks our mechanism for pinning versions of libraries, which for the moment is DT_RUNPATH).

Which has something called ld-floxlib which uses LD_AUDIT to achieve similar semantics as what this PR is trying to

Nice, thanks! Is this the same approach as described in https://hpc.guix.info/blog/2020/05/faster-relocatable-packs-with-fakechroot/?

@SemMulder
Copy link

SemMulder commented Mar 13, 2024

Nice, thanks! Is this the same approach as described in https://hpc.guix.info/blog/2020/05/faster-relocatable-packs-with-fakechroot/?

From there:

Fortunately, the little-known audit interface of the GNU dynamic linker comes in handy: its la_objsearch hook allows you to alter the way ld.so looks for shared libraries. Thus, a few lines of C are all it takes to get ld.so to translate /gnu/store file names. Neat!

So, yes, that is indeed the same mechanism :).

EDIT: the use-case is slightly different though. The Guix folks use it to make closures relocatable by rewriting all search paths from /gnu/store to <relocated-store-dir>.

@tomberek
Copy link
Contributor

tomberek commented Apr 3, 2024

Which has something called ld-floxlib which uses LD_AUDIT to achieve similar semantics as what this PR is trying to achieve, but without patching glibc.

Yes, we use LD_AUDIT as an alternative and less commonly used mechanism to guide dynamic loading. This also helps avoid invalidating the cache. Let me see if there are some portions of the approach and lessons learned along the way that might be helpful.

@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Apr 5, 2024
cadkin pushed a commit to mdfbaam/nixpkgs that referenced this pull request Apr 25, 2024
@SemMulder
Copy link

@tomberek

Let me see if there are some portions of the approach and lessons learned along the way that might be helpful.

If you can find the time to do this: I would be very interested in this :). I'm pretty sure other people following this PR would also find it interesting!

@oxij oxij force-pushed the glibc/ld-fallback branch from 0355157 to 99d35e5 Compare June 1, 2024 17:24
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jun 1, 2024
@oxij
Copy link
Member Author

oxij commented Jun 2, 2024

Rebased to fix conflicts.

I also left pkgsGlibcLdFallback of the previous iteration on top of https://github.com/oxij/nixpkgs/commits/glibc/ld-fallback-v5/ so that this would be a complete noop by default again.

Since people appear to cherry-picking these commits to their own branches, maybe consider just merging this? It is, after all, completely optional and does nothing by default.

Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is far from the first time I'm reading this so my vision may have got blurry. I still hold that

  1. there's a benefit to persisting a variant of this patch - as a piece of knowledge - in Nixpkgs git history, and to making the patch more available for experimentation;
  2. there's no reason to hold the patch back. In particular:
    1. The change has no effect on the default Nixpkgs instantiation, and so has no security implications for users unless they explicit opt-in into the experimental functionality.
    2. We do have a deprecation mechanism for evalModules' options, so we will be able to phase the patch out if it becomes stale

IMO neither this patch nor the LD_AUDIT-based approaches seem to solve the LD_LIBRARY_PATH/DT_RUNPATH priority issue in its entirety: we should pursue and explore both.


EDIT(2024-06-02 23:03 UTC), in reaction to Ma27's latest message: I think we've already shared all opinions, and I suppose it's time we acknowledged we can't reach a consensus. The change won't go through without a consensus. Reaching out with the upstream is the next available action

@Ma27
Copy link
Member

Ma27 commented Jun 2, 2024

My stance on this is still the same, to reiterate: this alters the way how libraries are resolved at a very central place and I still see no intentions in upstreaming this. Happy to be proven wrong though. I'd like to see feedback from upstream developers, then we'll see how to proceed.

Beforehand, this can of course be maintained in its own repo though.

@jeff-hykin
Copy link

jeff-hykin commented Jun 10, 2024

neither this patch nor the LD_AUDIT-based approaches seem to solve the LD_LIBRARY_PATH/DT_RUNPATH priority issue in its entirety: we should pursue and explore both.

As far as I can tell, for linux, it seems like the LD_AUDIT approach gives total control over loading order, including fallback methods. For example, it could re-implement the lookup logic (runpath/rpath/LD_library/system etc) but just with a patch on the fallback case. Is there an edgecase where this isn't sufficient @SomeoneSerge ?

My PR Review / Thoughts

Ignoring feasibility for a moment, I would prefer the core problem be fixed with a separate package, rather than adding an option to GCC.

For example, given a shell.nix, add one package (an LD audit shim), and set LD_AUDIT to that shim. This would avoid rebuilds, cache busting, and (important to me) not add complexity to GCC in nixpkgs. Again, OP did everything right for this PR, IMO. That said, Nixpkgs maintaince is a huge burden, a lot of which comes from the philosophy of "well it works right now, and people are using it, so let's tack on one more patch/option". At some point there's too many tacked-on options.

Going back to feasibility, AFAIK it should be possible to have the LD audit approach support the same fallback path ENV var as this PR. But in addition, in the future, the LD audit approach could add per-binary lookups (similar to modifying the runpath, but without actually modifying the binary). For example, if a steam game cannot modify its own binary, but also does not want to affect the LD fallback of child processes, it could add itself to a env var mapping (bin-path to fallback-path mapping)

# using "/.." as a split-pattern
# newlines and comments added for clairty (would be removed in practice)
#  "/../=" as the map-assignment 
LD_TARGETED_FALLBACK="
$LD_TARGETED_FALLBACK

:/..:
/nix/store/1234-steamthing/bin/game1  # binary
:/../=: 
/nix/store/1234-gcc-thing/lib:/nix/store/1234-libpng/lib 
# ^ colon-separated LD fallback for this binary specifically

:/..: # next mapping
/nix/store/1234-steamthing/bin/game2  # binary
:/../=:
/nix/store/1234-gcc-thing/lib:/nix/store/1234-libpng/lib
"

Packages in nixpkgs (like the steam game) could also, in the future, individually add the LD audit shim as a runtime dependency. Which would be a nice way to gradually introduce this change into nixpkgs.

_

Since, at the moment, it seems to me like that is both more flexible and less intrusive, I'm in favor of that route over accepting this PR.

@SomeoneSerge
Copy link
Contributor

@jeff-hykin I'd rather we moved the LD_AUDIT discussion into a separate issue, e.g. NixOS/bundlers#18 (or maybe a new more general one under NixOS/nixpkgs/). The present PR as I see it offers a prospective way to address an issue in glibc introduced by lowering the priority of DT_RUNPATH compared to DT_RPATH. This is something we should bring upstream for discussion, and IMO the discussion would be more promising with the option merged in Nixpkgs.

@oxij would you like to team up on preparing the email to upstream?..

@jeff-hykin
Copy link

@jeff-hykin I'd rather we moved the LD_AUDIT discussion into a separate issue, e.g. NixOS/bundlers#18

I'm not sure what it has to do with bundlers. I only mention the LD audit approach as a reason for blocking this PR. I agree it should eventually have its own discussion.

@oxij would you like to team up on preparing the email to upstream?

Despite my stance on the PR, I do support you guys on emailing upstream. Let me know if you need more voices and I get some other people to chime in.

neither this patch nor the LD_AUDIT-based approaches seem to solve the LD_LIBRARY_PATH/DT_RUNPATH priority issue in its entirety: we should pursue and explore both.

@SomeoneSerge I still think this is important to know for this PR and emailing upstream.
What cases don't they solve/dont solve well?

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Jun 11, 2024

@jeff-hykin

Issue at hand (opinion): it is customary for other projects to break Nixpkgs software by abusing LD_LIBRARY_PATH. Even when we ourselves want to use environment variables to influence the library look up, we end up implementing monstrously complex solutions like NixGL and nixglhost, so as not to accidentally break things (as the devenv issue shows, we still do). AFAIU,1 this was not an issue when Nix* first chose to patchelf search paths directly into the binaries (as opposed to any alternative, e.g. using a per-dso ld.so.cache like Guix currently does). AFAIU Nixpkgs had got "broken" with the transition from DT_RPATH to DT_RUNPATH.

The fact that one could implement (and that Guix and flox have) an even more complex wrapper is not entirely orthogonal to this problem, but doesn't directly address it either. The ability to hijack ld.so's search mechanism in its entirety is very interesting, it's obvious a lot could be achieved by exploring that path (maybe we could even improve on the libc situation and create a wrapper that always chooses the "newest" libc/libstdc++), but I disagree that this is on-topic or that this is blocking. The only reason I see not to merge this is that this is likely to be removed or grow obsolete relatively fast - but I'm repeating myself and I promised to stop.

Footnotes

  1. Actually, I'm not familiar with the time line. Maybe it already was DT_RUNPATH, and maybe LD_LIBRARY_PATH has always been "breaking" for Nixpkgs. Was it?

@jeff-hykin
Copy link

The only reason I see not to merge this is that this is likely to be removed or grow obsolete relatively fast - but I'm repeating myself and I promised to stop.

Thanks that was helpful as now I understand your position.

I'll join you in saying, I think my side has been heard.

@cole-h cole-h removed the ofborg-internal-error Ofborg encountered an error label Jun 25, 2024
@wegank wegank added 12.approvals: 1 This PR was reviewed and approved by one reputable person 2.status: merge conflict This PR has merge conflicts with the target branch labels Sep 8, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
None yet
Development

Successfully merging this pull request may close these issues.