Intensional store model #296

Mathnerd314 · 2014-07-16T02:06:31Z

Currently, if any change is made to the build package script, e.g. adding an extra newline in installPhase, then the package and all of its dependencies will be rebuilt because the derivation hash changed. With an intensional store model, only the package will be rebuilt, and the dependencies will remain unchanged, reducing build times.

http://nixos.org/~eelco/pubs/phd-thesis.pdf refers to a "prototype implementation" of the intensional store, which appears to be in https://github.com/NixOS/nix/tree/secure; maybe that could be resurrected and merged?

Mathnerd314 · 2014-07-16T19:36:52Z

Another interesting possibility is to use OSTree as the underlying store (which already hashes and deduplicates) and then turn /nix/store/* into hardlinks. So then we'd have two levels of store, the intensional and the extensional, which would mostly coexist.

lucabrunox · 2014-07-24T20:11:54Z

Does OSTree hardcode paths in libraries with rpath? As far as I understood when I looked at OSTree, it was less granular than nix, as it switches a whole file system. I can't imagine OSTree being used with nix.
For example, after switching tree, it triggers an ldconfig. Now you lost the ability to run service X with library L1.0, and service Y with library L1.0-custom. Back to the classic global state problems. You only changed the global state.

Mathnerd314 · 2014-07-24T21:28:04Z

OSTree is just a store of read-only files with extended attributes identified by the hash of their contents, together with code for hardlinking those files into a directory tree. RPM-OSTree is the software that manages and switches the filesystem; I'm not proposing we use that, since our activation scripts are about as featureful and are easier to work with.

lucabrunox · 2014-07-24T21:55:04Z

You didn't answer how should it be applied to nix. Shall the whole /nix/store tree change whenever a new derivation is stored or what?

Mathnerd314 · 2014-07-24T23:11:27Z

See the API. Maybe you can see the similarity to nix-store's internal API. My plan was to store each derivation as a commit and check it out to /nix/store/whatever. The runtime dependencies can be parent commits or we can just ignore that part and do the GC ourselves (the SQL database is not going away).

tomberek · 2014-07-24T23:45:25Z

Any thoughts about the "intensional store"?

Mathnerd314 · 2014-07-25T02:14:31Z

@tomberek I'm not certain who you were speaking to, but my thoughts are that it should be implemented ASAP.

vcunat · 2014-07-25T07:32:17Z

I thought OSTree doesn't allow accessing multiple versions at once, just as git doesn't. Anyway, we already do have the bare store-part that we need, with always-on top-level-path deduplication and optional file-level deduplication. That's IMHO the easy part.

I thought much about the intensional store many months ago, and we certainly do want it at some point. After delving deeper I was very surprised that the consequences are not at all as straightforward as they first appeared. IIRC derivation handling is the main stumbling block and can't be as straightforward as it is now. I have no idea if/how all is dealt with in that prototype code.

Also, using the intensional store will put much larger pressure on real binary determinism of the outputs. Our nixpkgs is most likely far from ready ATM. Currently if there's some slight semantics-preserving impurity (like programs wanting to print build date), we don't even notice it, just as in any usual distro. With intensional store these packages would change their output path on every build, including paths of anything that depends on it (transitive runtime dependents).

Mathnerd314 · 2014-07-26T17:24:14Z

So, here is the intensional model as I understand it.

Data types (all can be hashed and/or stored on disk and/or streamed if necessary)

Checkouts contain arbitrary data with no pointers
Storeballs (NAR's) contain arbitrary data and pointers to tokens
Builders contain input lists of tokens and of other builder outputs, a list of outputs, and arbitrary metadata
Tokens refer to ephemeral resources such as "the world as it was at a specific time" (arbitrary symbols) or to an output token of a builder together with a mapping from its requirements to specific storeballs.

Containers

Systems contain checkouts, generated manually by NixOps
Stores contain storeballs, generated strictly by NixOS (note that storeballs are rarely used)
Databases contain output tokens, generated automatically by Hydra.
Programs contain builders, generated lazily by Nixpkgs.

Operations

Parsing transforms checkouts into builders and is done by nix-instantiate
Hashing transforms builders into tokens and is done by nix-hash (this part needs work, because it is currently not reversible)
Realizing transforms tokens into storeballs and is done by nix-store and a checkout with the required tokens (this part is signed since it can be hijacked)
Configuring transforms storeballs into checkouts and is done by nix-env

From this, I can see 4 things:

Determinism is a nice-to-have, as it allows multiple signatures for the same storeball and thus encourages security, but is not at all necessary for the model to function.
Derivations, substitutions, and sources are simply a subset of the main building functionality.
Hash-rewriting is a subset of configuring. Unlike the extensional model, where the file system does most of the configuring, the intensional model allows (and requires) complete control over this process.
Nixpkgs can be a lot cleaner than it is now.

I've started by rebasing secure onto master, but unfortunately most of the changes were just commenting things out and the rest referred to things that don't exist anymore, so it was mostly useful for learning my way around the code. OSTree can only store checkouts, so it is a feature rather than part of the design.

copumpkin · 2014-10-06T01:10:31Z

Nixpkgs can be a lot cleaner than it is now.

Can you elaborate on that?

Mathnerd314 · 2014-10-08T19:59:47Z

The main one is keeping the checksums out of Nixpkgs; they're already stored in the token-storeball
mapping. I was also thinking about omitting the version numbers, but I have concluded that's better dealt with in Nixpkgs.

Ericson2314 · 2015-02-01T01:56:22Z

I believe this has some interesting interactions with recursive nix (#13).

First of all, once nix exprs can be developed upstream, it will be even more useful to have an easy way to keep HEAD packages up to date. This implies three phases:

Query repo for master tip, potentially a pre-fetch for other hashs -- non-deterministic.
Download srcs, build dependencies (due to prefetch, packge maybe be downloaded) -- deterministic.
eval and build package's nix expr -- deterministic, provided proper hygiene.

The building of intentionally non-deterministic pkgs seems a lot safer with an intensional store. Whereas most builds would automatically extend the user's trusted build mapping (the one inducing an equivalence set over output paths), intentionally non-deterministic builds such as the repo pre-fetch could create a new mapping which the user could optionally subscribe too.

This makes me wonder if even the actions relating to nix-channels could be conceived of as installing non-deterministic packages.

Ericson2314 · 2015-02-01T02:03:39Z

On another note, I don't know about OSTree, but http://ipfs.io/ once it is ready would make a fantastic intentional store for Nix--we could really be its killer app. I mentioned it on IRC, but thought i should here too. [Disclaimer: I am not associated with IPFS in any way, but neither have I tried it. I just read its paper once and immediately thought it perfect for Nix.]

CMCDragonkai · 2015-02-06T03:48:55Z

Just to clarify the intensional model is explained on page 143 of the thesis: http://nixos.org/~eelco/pubs/phd-thesis.pdf

I was wondering why it was called "intensional"?

shlevy · 2015-02-06T12:53:58Z

http://en.wikipedia.org/wiki/Intensional_definition

CMCDragonkai · 2015-02-06T13:03:25Z

I have read that before. Could you elaborate as to how this applies to Nix?
On 06/02/2015 11:54 PM, "Shea Levy" notifications@github.com wrote:

http://en.wikipedia.org/wiki/Intensional_definition

—
Reply to this email directly or view it on GitHub
#296 (comment).

shlevy · 2015-02-06T13:45:46Z

The idea is that the store path name reflects the entirety of the properties of the path by containing a hash of its contents.

zimbatm · 2016-03-15T19:38:31Z

When I started using nix I was confused on why we need to calculate the checksum of git repos since the git sha is relatively unique. Now I know the distinction but if we could store git checkouts by their sha it would be really nice and remove a lot of boilerplate.

ehmry · 2016-03-16T10:04:05Z

A cheap and easy thing to do could be to store each store path at a content addressable hard link, and then make a symlink from the input hash to the hard location. Multiple hard links can be made for different hashing schemes, and multiple input symlinks can point to a single output.

I don't know how complicated it would be to perform the switch after build jobs complete and how costly it would be to dereference a symlink for each package reference by inputs.

jbenet · 2016-03-24T01:07:00Z

the IPFS community would love to help with this! let us know how we can.

cc @whyrusleeping @lgierth @diasdavid @noffle @davidar

Ericson2314 · 2016-03-24T01:30:34Z

@jbenet Glad to here it! The PHD thesis is still probably the best resource on the idea itself. #378 while superficially not about this at all, I think is actually serves as a good resource on the quirks of the current system, and the usecase where it is most wanting.

I'm not any sort of official Nix developer, but happy to answer any questions you may have.

vcunat · 2016-03-24T10:11:27Z

@jbenet: actually, I strongly believe that IPFS has much larger potential of use to nix than the intensional store itself (i.e. forcing the use of hashes from content instead of derivations for path references). Let's split that thread to #859.

wmertens · 2017-08-07T17:15:45Z

@ehmry I just had the same idea as you :) https://groups.google.com/forum/#!topic/nix-devel/m8Rrv3VpdBo

The difference is that I propose that the build step gets the CAS entries as inputs, not the input hashes. The input hashes would only be used in case the build product needs to refer to itself.

Obviously, this means that build outputs that need to access themselves will have a different $cas for different input hashes, even if the build output is otherwise the same.

Perhaps builds should be done in /nix/store/build-$randomstring, then build-$randomstring should be replaced with zeroes before calculating the output hash $cas, and then replaced again by $cas. The $cas will be slightly more work to calculate, but still unique and predictable.

ehmry · 2017-08-09T13:07:46Z

@wmertens Yes, I had considered supporting multiple hashing schemes, but I no longer think that is worth the effort so replacing the input hashes seems practical. I had a system like this running, I don't remember any specific problems with CAS entries but the whole eventually collapsed from making too many changes to Nix.

edolstra · 2018-03-29T23:12:31Z

Some progress on this: edolstra@236e87c

wmertens · 2018-03-30T08:02:31Z

@edolstra wonderful! I just read the Intensional Store section of your thesis, I now wish I did that long ago ;)

I see that there is still quite a bit of work to do to get to the Intensional Store you laid out there. One thing that stands out is storing the equivalent hashes (refClasses) in the database.

I'm particularly curious about how this will play out with Hydra, how the refClasses will be provided over the network.

It would also be interesting to have a crowd-sourced refClasses database, where many builds by somewhat-trusted users show that a certain input hash leads to some CAS hash.

wmertens · 2018-03-30T09:53:00Z

I just realized that this initial progress is already enough to rewrite the entire store into CAS equivalents with a script: move+link all the outputs to CAS paths and then rewrite all the hash references to their CAS hash.

The refClasses "database table" is then simply the set of symlinks that point from original to CAS.

Enough to already play with it :) I'll see if I can cook something up this weekend, but I will be happy if someone beats me to it ;)

EDIT: the CAS linking + rewriting should happen depth-first and rewrite first, otherwise the CAS hash changes. So if a depends on b depends on c, first calculate c', then rewrite b to use c' into b', then calculate b'', then rewrite a to a' with b'', then calculate a''.

rrnewton · 2018-06-01T13:38:31Z

I read the intensional store thesis chapter and I think it will be a big improvement, but at the same time there seems to be a small conflict with strict determinism. The hash rewriting policy (sec 6.3.2), allows the derivation to build with its own temporary output location in its environment, and a random hash is suggested for such a purpose. But such a random hash puts entropy into that build that it could use to create a different output.

If I'm understanding correctly, the minimum example of recompilation avoidance is something like this:

bar-1.2 depends on foo-3.4
foo-3.4 builds to bits XYZ
bar-1.2 builds to bits ABC

Then foo-3.4 receives a trivial tweak (e.g. README file), that changes its derivation hash, but not its output bits (still XYZ). As a result, bar-1.2 now has its derivation hash change as well, but we want a rock-solid guarantee that bar-1.2 need not be rebuilt, because it only really depends on bits XYZ, and it would just compute bits ABC once again.

That is, the "from scratch guarantee" (like in any incremental computing) is that the full rebuild would have created the same expected bits if it were run. But that's why visibility of bar-1.2's own output path during the build (either random, or based on its derivation hash) would inject entropy that could break this from-scratch guarantee.

Simple solution: Why not just set the output path to a constant? As long as the bar-1.2 rebuild only sees a constant output path, plus the (intensional) content-based paths of its dependencies, it should have no way to produce a separate output, assuming proper determinism enforcement (CC @RyanGlScott).

As a variation on that, it could be the determinism-enforcement sandbox itself that simulates a directory-rename of the $out path. That is, there could be some random destination on the real file system, /nix/store/2c8d367ae0c4...-bar-1.2, but the build process thinks it is mapped simply to /nix/store/bar-1.2 or something. (Directory "renaming" via syscall rewriting.)

P.S. The current nix make-content-addressable patch linked above seems to use a post-facto mechanism for rewriting pointers from derivation-paths to contents-paths. But post-facto rewriting would put more paths in the environment that the build should treat as opaque to guarantee from-stratch consistency. (Perhaps this is fine, as its generally an assumption made by Nix: store paths should be treated as opaque symbols, even if that is unenforceable. But again, some rewriting tricks could guarantee that these bits never make their way into any build process's memory.)

stale · 2021-02-13T20:39:13Z

I marked this as stale due to inactivity. → More info

siraben · 2021-03-23T13:50:40Z

Still important to me.

stale · 2021-09-19T19:01:22Z

I marked this as stale due to inactivity. → More info

tomberek · 2021-09-19T19:28:01Z

Still important, but perhaps this specific issue can be closed. Seems to be well underway with the CA effort. Well done @regnat !

stale · 2022-04-16T08:27:37Z

I marked this as stale due to inactivity. → More info

Ericson2314 · 2022-04-16T14:06:14Z

Yes, we do have this now!

edolstra added this to the nix-2.0 milestone Jul 16, 2014

edolstra added the feature label Jul 16, 2014

vcunat mentioned this issue Mar 5, 2015

"Cycle detected in the references of '/nix/store/...'" message could be more useful #481

Open

Ericson2314 mentioned this issue Oct 28, 2015

Integrate nix with deterministic threading library to get closer to determinism? #669

Open

zimbatm mentioned this issue Mar 15, 2016

Output flag to save separate generated nix for overridding in default.nix kamilchm/go2nix#10

Closed

vcunat mentioned this issue Mar 24, 2016

Nix and IPFS #859

Open

abbradar mentioned this issue Apr 3, 2017

Derivation inputs determined by another derivation? (metaprogramming question) NixOS/nixpkgs#24590

Closed

Mathnerd314 mentioned this issue Oct 8, 2017

Changes to make Nix output more informative errors #612

Open

shlevy added the backlog label Apr 1, 2018

shlevy assigned edolstra Apr 1, 2018

CMCDragonkai mentioned this issue Apr 15, 2018

Artifact Specification MatrixAI/Architect#8

Open

edolstra mentioned this issue May 7, 2018

Separate trust from caching (feature suggestion) #2122

Closed

domenkozar removed the backlog label Apr 30, 2020

domenkozar removed this from the nix-2.0 milestone Apr 30, 2020

stale bot added the stale label Feb 13, 2021

stale bot removed the stale label Mar 23, 2021

stale bot added the stale label Sep 19, 2021

stale bot removed the stale label Sep 19, 2021

stale bot added the stale label Apr 16, 2022

Ericson2314 closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intensional store model #296

Intensional store model #296

Mathnerd314 commented Jul 16, 2014

Mathnerd314 commented Jul 16, 2014

lucabrunox commented Jul 24, 2014

Mathnerd314 commented Jul 24, 2014

lucabrunox commented Jul 24, 2014

Mathnerd314 commented Jul 24, 2014

tomberek commented Jul 24, 2014

Mathnerd314 commented Jul 25, 2014

vcunat commented Jul 25, 2014

Mathnerd314 commented Jul 26, 2014

copumpkin commented Oct 6, 2014

Mathnerd314 commented Oct 8, 2014

Ericson2314 commented Feb 1, 2015

Ericson2314 commented Feb 1, 2015

CMCDragonkai commented Feb 6, 2015

shlevy commented Feb 6, 2015

CMCDragonkai commented Feb 6, 2015

shlevy commented Feb 6, 2015

zimbatm commented Mar 15, 2016

ehmry commented Mar 16, 2016

jbenet commented Mar 24, 2016

Ericson2314 commented Mar 24, 2016

vcunat commented Mar 24, 2016

wmertens commented Aug 7, 2017

ehmry commented Aug 9, 2017

edolstra commented Mar 29, 2018

wmertens commented Mar 30, 2018

wmertens commented Mar 30, 2018 •

edited

Loading

rrnewton commented Jun 1, 2018

stale bot commented Feb 13, 2021

siraben commented Mar 23, 2021

stale bot commented Sep 19, 2021

tomberek commented Sep 19, 2021

stale bot commented Apr 16, 2022

Ericson2314 commented Apr 16, 2022

Intensional store model #296

Intensional store model #296

Comments

Mathnerd314 commented Jul 16, 2014

Mathnerd314 commented Jul 16, 2014

lucabrunox commented Jul 24, 2014

Mathnerd314 commented Jul 24, 2014

lucabrunox commented Jul 24, 2014

Mathnerd314 commented Jul 24, 2014

tomberek commented Jul 24, 2014

Mathnerd314 commented Jul 25, 2014

vcunat commented Jul 25, 2014

Mathnerd314 commented Jul 26, 2014

copumpkin commented Oct 6, 2014

Mathnerd314 commented Oct 8, 2014

Ericson2314 commented Feb 1, 2015

Ericson2314 commented Feb 1, 2015

CMCDragonkai commented Feb 6, 2015

shlevy commented Feb 6, 2015

CMCDragonkai commented Feb 6, 2015

shlevy commented Feb 6, 2015

zimbatm commented Mar 15, 2016

ehmry commented Mar 16, 2016

jbenet commented Mar 24, 2016

Ericson2314 commented Mar 24, 2016

vcunat commented Mar 24, 2016

wmertens commented Aug 7, 2017

ehmry commented Aug 9, 2017

edolstra commented Mar 29, 2018

wmertens commented Mar 30, 2018

wmertens commented Mar 30, 2018 • edited Loading

rrnewton commented Jun 1, 2018

stale bot commented Feb 13, 2021

siraben commented Mar 23, 2021

stale bot commented Sep 19, 2021

tomberek commented Sep 19, 2021

stale bot commented Apr 16, 2022

Ericson2314 commented Apr 16, 2022

wmertens commented Mar 30, 2018 •

edited

Loading