Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies and installation size #193

Closed
bradleyharden opened this issue May 15, 2020 · 19 comments
Closed

Dependencies and installation size #193

bradleyharden opened this issue May 15, 2020 · 19 comments
Labels
discussion Forum like question asked; not exactly a feature or a bug. such-is-nix "Why Nix?", "Nix alternative", etc.

Comments

@bradleyharden
Copy link

bradleyharden commented May 15, 2020

Hi,

I recently came across this project and wanted to check it out. To my knowledge, this is probably the first piece of software written in Haskell that I've dealt with, so maybe I'm just ignorant. But frankly, I was blown away by the installation process, number of dependencies and install size. After installing nix and neuron, my /nix directory clocks in at 6.8 GB, which is more than my /usr and /lib directories combined.

Why does it need to install so much? Is this standard for Haskell projects? Why can't it make use of existing dependencies already installed on the system? For instance, I see that it installed glibc, Python 2.7, and other dependencies, even though they are already installed on my system. Moreover, it looks like it installed at least three different versions of glibc, 2.26, 2.27 and 2.30, along with some other files for glibc 2.28. Is all of that really necessary?

Disk space isn't a huge issue, but 7 GB is still a lot. I'm also installing it on a work VM where I have to explicitly request additional storage when I exceed 100 GB. That's a big dent in my storage.

I'm using Ubuntu 18.04 if that's relevant, and I installed from master.

@srid
Copy link
Owner

srid commented May 15, 2020

Nix does not use your system dependencies. Nix packages rely on what's in the Nix store.

The problem here I think is that the final neuron executable derivation has a runtime dependency on the Haskell library derivations, which pulls in a whole slew of dependencies. But these should not have been runtime dependencies, but only build dependencies.

nixpkgs's enableSeparateBinOutput is supposed to address this (if I understood @domenkozar correctly), but I had trouble getting it to work, so I've left a WIP note here:

neuron/project.nix

Lines 75 to 87 in 38ab3fb

# Strip off the library part, so we trim out dependencies to only those
# needed by the executable.
#
# FIXME: This causes cyclic references, with the binary depending on the
# library derivation (`strings` on the executable reports
# "lib/ghc-8.6.5/x86_64-linux-ghc-8.6.5" etc).
#
# Might also be related to this:
# https://github.com/NixOS/cabal2nix/issues/433
makeExecutable = x: overrideCabal x (drv: {
enableSeparateBinOutput = true;
enableSeparateDataOutput = true;
});


@bradleyharden What happens if you run gc on your nix store after installing neuron? Try the following command and see how much disk space it frees up:

nix-collect-garbage -d

There is also #183.

@srid srid added the discussion Forum like question asked; not exactly a feature or a bug. label May 15, 2020
@bradleyharden
Copy link
Author

I'm a bit confused now. I ran garbage collection, and it says it freed 575 MB. But it appears to have freed more than that, because I'm down to 5.4 GB. Still quite large though.

Have you figured out a static build yet? I would be interested in trying it out. How big is it/will it be?

Also, just out of curiosity, why doesn't nix attempt to integrate at all with the system? That seems like a big ask to make of users. You end up duplicating a lot of dependencies.

@cprussin
Copy link

cprussin commented May 16, 2020

@bradleyharden Nix solves many problems around dependency management, and the way it does so is through tightly controlling the environment it builds so it can guarantee the environment is the same across wildly different systems consuming it. It cannot use your system environment because it cannot make guarantees about your system environment that it needs to make to be able to solve these problems.

For example, if you've ever gotten into a situation where your system python is python3 but a project requires python2--it's a huge pain, and nix prevents you from ever having to deal with it in your system config by ensuring the packages exposed to the environment are the ones you need. But it can't do that with system dependencies, because it can't guarantee that you have a python installed on your system or that it's the right version, and you wouldn't want it to go messing with your system config underneath you.

In some ways nix is similar to using a docker container to control your dev/build environment. Even if docker were not virtualized in any way, you probably wouldn't want docker to use your system dependencies, because then your development/build environment would no longer be reproducible, which is the whole reason why you're using docker in the first place. The same logic applies to nix, although nix takes a somewhat different approach from docker (and fully avoids virtualization).

Generally a large cache is shared between many projects, so in the space regard nix gets more reasonable when you use it across many projects. However, in general, we choose to solve for reproducibilty and predictability at the expense of disk space (e.g. I would always choose to duplicate a program's install if it means I don't have to deal with dependency hell & system configuration to get up and running)

@cprussin
Copy link

by the way, we have a very friendly and welcoming community in the #nix channel in the functional programming slack. Feel free to come by and we'll be happy to discuss these things, we're all very passionate about them!

@srid
Copy link
Owner

srid commented May 24, 2020

In recent master the installation size has gone down to ~4GB (not counting the nix/cachix install). EDIT: Oh, and also - it avoids compiling neuron itself (gets it from cache).

That's not much, but I imagine we can whittle it down further to under ~200MB once those ghost nix dependencies are resolved (I suspect this is something to do with cabal's behaviour).

@bradleyharden
Copy link
Author

Sorry for the late reply. Trying Neuron sort of fell off my radar. 200 MB seems much more reasonable, if you can get there.

Now that I know of Nix, I've been seeing it pop up in various articles, and I've read bits and pieces about it. It seems like a very powerful concept.

I do wonder, though, what happens when you have absolutely no incentive to share libraries, or rather, no disincentive to multiple library versions. In principle, if a library strictly adheres to semantic versioning, it shouldn't be a problem if multiple dependents share the most up-to-date minor version of a given major version. But if you always specify an exact build dependency, then it seems like users will inevitably end up with multiple, mostly-redundant copies of the same library. Or maybe I'm missing something about the concept? I suppose storage space is the only thing you sacrifice. But it still rubs me as inherently wasteful, even if it does solve a significant problem. But maybe I'm just stuck in an old way of thinking.

Anyway, this issue has probably gotten off topic. Please feel free to close it.

@cprussin
Copy link

I suppose storage space is the only thing you sacrifice. But it still rubs me as inherently wasteful, even if it does solve a significant problem.

@bradleyharden that's exactly, intentionally, and explicitly correct. When using nix as a dev tool, the selected tradeoff is storage space for (basically) guaranteed reproducible dev environments and builds. It's impossible to generally say if two nearly identical copies of something are close enough to make your system work, so nix doesn't try and stores both. Storage space is cheap, time is expensive--which would you rather waste?

@cprussin
Copy link

Also worth noting that when using nixos (as opposed to nix for projects) it's much more common to use packages out of a package set that only exposes one (major) version for each package (nixpkgs), so you don't end up with duplication as much. Using nix for projects and locking dependency versions is much more akin to using lockfiles (npm/yarn/bundler/etc etc) to ensure dependencies don't move out from under your projects and break your builds randomly.

@bradleyharden
Copy link
Author

@cprussin, the decision is certainly rational. I think my aversion is probably rooted in the fact that my very first experience with Nix was a fairly small tool that ended up using 7 GB of storage. As a first impression, I think it established my mental model of Nix as being incredibly wasteful with storage. "If it takes 7 GB just for Neuron, how much space will larger tools require?" Maybe that's not actually representative or fair, but I think that is what's going on in my head.

@srid
Copy link
Owner

srid commented Jun 18, 2020

"If it takes 7 GB just for Neuron, how much space will larger tools require?"

Just to be clear. Normally it should be taking ~200MB for neuron. Why does it take 7 GB? Because of an unnecessary runtime dependency (on the Haskell packages/compiler), as explained here: #193 (comment)

I don't know how to remove this unnecessary dependency. But doing it will fix the disk space issue.

@bradleyharden
Copy link
Author

@srid, yes, I was just commenting that my first impression was 7 GB. I think it takes some time to deconstruct first impressions, even if there's a clear justification. I also don't have any other examples to compare to, since I haven't yet used Nix for anything else. I'll keep an eye out though. It seems like a useful tool.

@anka-213
Copy link

@srid The closure of neuron (on my mac) is just 59Mb, so it should definitely be possible to reduce the size, if we can make the installation just download the closure instad of all the build dependencies.

$ nix-store -q --size $(which neuron)
61836368
$ nix-store --dump $(which neuron) > neuron.nar && ls -l neuron.nar
-rw-r--r--  1 user  staff    59M 18 Jun 21:28 neuron.nar

I'm not sure what the easiest way to do that would be, but we could always provide the nar file for download under releases and ask the user to install it with something like nix-store --restore plus some other command, but I assume there is an easier way.

I am currently using neuron on CI, so this would reduce the build times significantly (from currently around 4 minutes, so it's not that bad, but it could be less than a minute).

@srid
Copy link
Owner

srid commented Jun 18, 2020

I improved the situation substantially in #240. After that PR gets merged, neuron install should take ~1GB of space (total space including cachix/nix is: 1.8G).

It is still not perfect. In particular I wonder why cabal2nix is evaluating the various Haskell derivations instead of reusing the original derivation from cache.

@srid srid added the such-is-nix "Why Nix?", "Nix alternative", etc. label Jun 19, 2020
@srid
Copy link
Owner

srid commented Jun 27, 2020

I am currently using neuron on CI, so this would reduce the build times significantly (from currently around 4 minutes, so it's not that bad, but it could be less than a minute).

@anka-213 Checkout https://github.com/srid/neuron-template - the CI build time is around 2 minutes. It could still be improved of course (see previous comment), but it is not that bad I think.

@anka-213
Copy link

@srid Thanks, that's great!

I wonder if we could prevent the dependency on cabal2nix, but that is mostly an academic exercise. 😃 It should be possible in theory, since the result of it should be predictable and not dependent on the current system. Not sure if it would be worth it though.

But the thing that would help the most for adoption (I've tried to make my colleges use this) would be to provide a static binary that does not depend on nix. On the other hand, it was an excuse to make them try nix. 😉

@srid
Copy link
Owner

srid commented Jun 30, 2020

#260 produces a 58M size static binary (no dependencies) on Linux.

@anka-213
Copy link

Sweet!

@srid srid closed this as completed in 6cb20ef Jul 18, 2020
@domenkozar
Copy link

Nice!

@Shados
Copy link

Shados commented Dec 27, 2021

The problem here I think is that the final neuron executable derivation has a runtime dependency on the Haskell library derivations, which pulls in a whole slew of dependencies. But these should not have been runtime dependencies, but only build dependencies.

As a NixOS user, having installed this via Nix, this aspect remains a problem. The closure size of pkgs.neuron-notes in a recent nixpkgs is ~4.6GB, and because I don't actually use much else in the way of Haskell-based packages, about 4.3GB of that is unique to Neuron. That's pretty damn bloaty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Forum like question asked; not exactly a feature or a bug. such-is-nix "Why Nix?", "Nix alternative", etc.
Projects
None yet
Development

No branches or pull requests

6 participants