Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actual static linking / Not linking to glibc #5643

Open
the-maldridge opened this issue May 2, 2019 · 14 comments
Open

Actual static linking / Not linking to glibc #5643

the-maldridge opened this issue May 2, 2019 · 14 comments

Comments

@the-maldridge
Copy link

the-maldridge commented May 2, 2019

Per suggestion in #5537 and encouragement from @angrycub, I'm opening this issue for the purpose of collecting thumbs-ups for building Nomad against non-glibc C libraries. As the binaries currently provided on https://nomadproject.io/ are not statically linked they only work if the machine they're running on has the correct version of glibc available.

Given that the Go way is to provide binaries that will run anywhere the CPU arch matches, I personally find this surprising. I find is even more surprising given that the effort to build in a container that contains an alternate C library is minimal.

While not getting into the justification of why one would run Nomad on a non-glibc platform as it is beyond the scope of this issue, some quick reasons that jump to mind from my own environments where muslc is the library of choice:

  • muslc is a smaller and more correct implementation of the C Standard Library. Its very difficult to do something so badly in Go that a hard dependency on glibc is introduced, so artificially introducing the dependency by policy is an odd, and currently unjustified in writing, choice.
  • Distribution provided packages aren't just for glibc (see the patches that @nilium is currently maintaining for Void Linux's builds)
  • Nomad is distributed as a "static" binary, not a distribution package. Either Hashicorp supports distributions or they don't, but distributing a binary which should run on any Linux distribution, and then claiming that those distributions aren't supported is deceptive.
  • muslc is the library of choice for building containers, and a number of distributions are now available that allow the benefits of musl to come to the metal. Given that the systems that nomad supervises tend to be minimal, it seems an obvious choice to pick a minimal system to run on the metal, thus maximizing Mhz and RAM available to tasks.

If builds without an artificial dependency on glibc is something that is important to you, please thumbsup this issue (or even just the support to create these builds on your own, as you should be able to do already but can't with the current release). Don't reply with a +1 as that just clutters the thread and makes your support difficult to track.

@schmichael
Copy link
Member

Sorry for the late reply. This is a complex issue that defies a quick and risk-free fix. I'll try to respond to each of your points:

Given that the Go way is to provide binaries that will run anywhere the CPU arch matches, I personally find this surprising.

While Go is renowned for producing binaries that can run on a wide variety of systems, it's far from a perfect solution. The Go toolchain distributed by golang.org does not run on Alpine. Alpine ships their own customized build of the Go toolchain and applies patches for compatibility with musl.

Nomad requires CGO (for libcontainer among other things) which complicates portability.

Its very difficult to do something so badly in Go that a hard dependency on glibc is introduced, so artificially introducing the dependency by policy is an odd, and currently unjustified in writing, choice.

Our Nvidia device driver implementation requires glibc due to differences in glibc's lazy binding.

Nomad is distributed as a "static" binary, not a distribution package.

If we shipped all of Nomad's plugins as external binaries, the agent itself could probably run on musl (perhaps even drop cgo entirely). This would require significant changes to our source layout, build system, distributed artifacts, documentation, and test infrastructure. We have no plans to produce "lite" portable agent binaries at this time.

If builds without an artificial dependency on glibc is something that is important to you

We would love more assistance in improving Nomad's portability and Alpine/musl support. I hope we have demonstrated in #5537 that we do respond to specific issues and put effort toward improving our portability story.

As far as I know making the prebuilt Nomad binaries work on Alpine/musl does not have a straightforward solution, but I'd love to be wrong or have assistance in working toward portability.

@the-maldridge
Copy link
Author

@schmichael no worries at all on the slow reply, honestly I didn't expect to ever get a reply on this ticket.

As a Void maintainer, I'm well aware of the hoops that are needed to get a full blown toolchain up and running, however I'm also not aware of any projects out there that have such deep dependencies (well maybe glibc, but that's another point entirely) that they care about the toolchain they're built with.

I live in hope that the libcontainer dependency will move out into a plugin. As that system becomes more robust, I'd really like to get to a point where nomad itself is running rootless.

I'm well acquainted with glibc's generally broken lazy binding system and the sad state of affairs that is GPU driver options on Linux. I'm very happy that there was a quick solution in adding the build tag to shut it off, though it would have been nice to get that into an actual release, rather than delaying it for a while since in the official binaries it was a noop change.

I'd actually really like to see nomad shipping more plugins as external binaries that could either be setgid or setuid as necessary. Running the whole nomad binary as root is a really interesting security story that I'm still trying to grapple with. I can see why this is nowhere on the roadmap, as its a complete overhaul of a lot of the core parts of Nomad; as a means of improving the amount of code running with elevated privileges, I really hope this approach can be considered.

#5537 was a fantastic start that was unfortunately marred by the fix not making it into the release. What it demonstrated from my perspective was that getting the fixes into the codebase was something that Hashicorp was willing to do, but putting them into another release - and unbreaking a class of users in the process - was another.

The solution to running Nomad on Alpine is to compile from source. The nvidia integration has to be switched off to make it work, but as GPUs tend to be a specialized accelerator that's not on the vast vast majority of the fleet, this is a level of complexity I'm willing to pay.

Useful takeway from this ticket: the single biggest thing that could be useful right now to supporting Nomad on non-glibc systems would be a smoke-test build before every release. This is making sure that the build works at all, as there have now been 2 releases where nomad version did not work after a build. Of course this smoke test build only has value if it can block a release, but small steps and all that.

@prologic
Copy link

prologic commented Oct 3, 2019

(not tested myself) Can the Nomad binaries be compiled and linked statically today with the current source tree and layout?

@the-maldridge
Copy link
Author

Yes, you can. Its very hard to build something that can't be statically linked. Here's some commands that will do it for you in Alpine:

    apk add \
        bash \
        g++ \
        git \
        linux-headers \
        musl-dev

    # Hashicorp Build
    mkdir -p src/github.com/hashicorp/nomad
    cd src/github.com/hashicorp/nomad || return 1

    # Get source and apply any patches.
    git clone -b v$version https://github.com/hashicorp/nomad.git .

    echo "Building..."
    go build -x \
       -o bin/nomad \
       -tags "nonvidia release ui" \
       --ldflags '-linkmode external -extldflags "-static"' \
       .

That's yanked from some obsolete files from when I was linking statically, now I maintain an internal Alpine repo which has the binaries dynamically linked to musl.

@prologic
Copy link

prologic commented Oct 3, 2019

The -tags "nonvidia bit is probably the important one that drags in cgo :)

@the-maldridge
Copy link
Author

Not quite. cgo is pulled in by the DNS resolver. This is a fairly well understood gotcha at this point. The nvidia issue is that the bindings for that driver perform unchecked operations with no error handling, so if you don't have the shared objects available then you're SOL, since musl doesn't provide the same lazy bindings that glibc does (hence why this works on glibc if you don't have the soname, but not on musl).

@the-maldridge
Copy link
Author

the-maldridge commented Dec 3, 2019

After thinking long and hard about this issue, I'm not sure in its present state traction can be made. Too many of Nomad's drivers need to link system level code. I think the only way to make progress on this would be to factor out all the drivers from Nomad into go-plugin executables. As a practical upshot and something that might be easier to sell to the product owners, this would mean the main, network exposed nomad binary doesn't need to be running with euid 0.

I'm not sure what the overall desire within HashiCorp for this change would be, but it would be a pretty big win for security since increasingly less code would need to run in a privileged mode. If the podman driver that's been talked about on gitter ever gets off the ground, it would be trivial to run completely rootless nomad, and that would make my security team very happy indeed.

@schmichael
Copy link
Member

schmichael commented Dec 20, 2019

After thinking long and hard about this issue, I'm not sure in its present state traction can be made. ... I think the only way to make progress on this would be to factor out all the drivers from Nomad into go-plugin executables.

👍 I'd love to ship a minimal Nomad binary (all plugins external) alongside our monolithic binary, but it's probably not going to be prioritized soon (not 0.11 and probably not 0.12) since it's nontrivial effort to support this in our build system.

There are plans underway to ship proper Linux packages alongside our zip files which may give us a route to drop our monolithic binary while still providing a "batteries included" package. That could make this effort substantially easier.

@the-maldridge
Copy link
Author

That's a really cool idea. I use go-plugin in some of my own projects, but haven't yet noticed any machinery that would let me selectively bake in plugins to a monolith. Can you share any pointers to documentation where I can find how that works?

@schmichael
Copy link
Member

Unfortunately go-plugin itself doesn't have a notion of internal or builtin plugins. Builtin plugins define the same interface as external plugins and register themselves manually with the agent's plugin manager: https://github.com/hashicorp/nomad/blob/v0.10.2/helper/pluginutils/catalog/register.go#L11-L20

The PluginLoader discovers external plugins and merges them into the plugin catalog.

The PluginLoader's Dispense method is kind of the bridge between internal and external plugins as it executes the binary for external plugins.

Just brainstorming but an approach to migrating to shipping monolithic and minimal builds would be to create hidden Nomad subcommands for executing builtin plugins: eg nomad _plugin-exec <name> would start the go-plugin subprocess within the monolithic Nomad binary. A registration pattern could be used to statically build a list of builtin plugins at compile-time (and therefore be empty for the minimal build).

@the-maldridge
Copy link
Author

Hmm, I like the idea of the hidden subcommands. It seems really clean from an interfaces perspective, if not an implementation one.

@brodul
Copy link

brodul commented Sep 3, 2020

image

I use a weird distribution of Linux that does't have interpreter, bash or libs on standard location.
The community uses a program called patchelf to modify ELF headers of close source programs. https://github.com/NixOS/patchelf
In this case it can be used as a workaround.

image

@the-maldridge
Copy link
Author

Interesting use of patchelf, I was unaware it was safe to use on Go compiled binaries. My use case has switched to running Nomad in a chroot that has the loader and libraries that Nomad expects to see, but I do still compile out the nvidia extensions as they are just more trouble than they're worth.

@tgross
Copy link
Member

tgross commented Jun 21, 2021

While I don't think it entirely resolves this issue because we still have libcontainer in the mix, #10796 externalizes the nvidia device plugin (finally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants