Assertion quasi-derivations #11955

Ericson2314 · 2024-11-25T19:06:28Z

A fixed-output derivation is nothing but a regular "floating" content-addressing derivation, along with an assertion that the output is in fact a certain content-address.

Other "output checks" can likewise be turned into assertions that the input satisfies the checks.

It would be conceptually cleaner to instead of making these checks part of a derivation (and thus have more knobs on what is a derivation) instead decompose them to special assertion nodes / in the derivation dependency graph --- or equivalently, special quasi derivations.

I had long thought this would be conceptually more elegant, but hadn't yet found a real tangible concrete use-case they would make things better from the user's perspective. But in #11954 I believe I finally found one. The short version of that is that for CA derivations, it is impossible simultaneously satisfy all of these:

Keep our current notation of unconditional immediate dependencies that always must be downloaded
Only do derivation substitutions (of placeholders today, or generalized versions of this one might imagine)
Support allowed/disallowed dependencies of things we might not end up having in our input runtime closure at all
Don't unnecessarily download stuff we don't actually need at build-time

Assertion quasi-derivations however provide a way out of that:

Dependencies can still be considered regular and unconditional (we have more types of nodes in our drv graph, but the dependency edge structure of each node is the same for all node types)
Rewriting nodes is still done exclusively for incoming immediate edges --- no "non-local" rewriting for transitive deps is ever required
Support arbitrary allowed/disallowed dependencies based on inputs
Since we can/must special-case these new node types, simply don't download anything at all based on them, we just need the rewrites to end up with the concrete store paths we need to scan for.

This nicely gets us all 4 desiderata, plus the separation of concerns (building vs linting), with minimal extra complexity.

Ericson2314 · 2024-11-25T19:31:44Z

@emilazy tells me about https://ninja-build.org/manual.html#validations, great prior art!

roberth · 2024-11-26T11:58:07Z

This would be an complement or alternative to #7662, which proposes to track validation using the string context, i.e. something handled above the store+build layer.

It would only be a complete alternative if it provides speculative building for increased concurrency, as mentioned in the issue, but also echoed in the ninja docs:

Marking the static analysis rule as an implicit input of the main build rule of the source files or of the rules that depend on the main build rule would slow down the critical path of the build, but using a validation would allow the build to proceed in parallel with the static analysis rule once the main build rule is complete.

A small amount of extra concurrency could be extracted by allowing to evaluate these validation nodes after evaluating the main derivation and after its dependents.

An combination of validation nodes and dynamic derivations may be of interest as well, as previously raised by amjoseph-nixpkgs:

Opt-in for running installCheckPhase on cross builds nixpkgs#273110 (comment)

(This would be independent of aforementioned evaluation optimization, unless we perform Nixpkgs evaluation as part of a set of (generated) dynamic derivations, which would require making Nixpkgs available to the builder (ie the store))

roberth · 2024-11-26T12:05:35Z

How exactly do these quasi-derivations fit into a build graph, and does that allow for speculative execution of dependent builds, i.e. assuming that the validation succeeds?
If the validation nodes are not between the dependency and dependent, this could still be compensated for by complicating the execution rules (i.e. the scheduling, or the graph-based "evaluation").

Ericson2314 · 2024-11-26T16:08:05Z

@roberth I initially wasn't thinking of fancier checks, and not speculative execution either. But it would also work for that.

(Indeed, doing a speculative build of an unverified impure floating CA derivation is a bit scary, but we can simply say there is no speculation if the input is impure.)

I would still have the validation nodes between the dependency and dependent, because otherwise we are scooping validations globally (which I don't think makes sense store-wide). But yes the scheduler can simply "rewrite" the graph so that the validations don't block anything, but it doesn't declare things finished until they pass.

roberth · 2024-12-03T13:17:31Z

A fixed-output derivation is nothing but a regular "floating" content-addressing derivation, along with an assertion that the output is in fact a certain content-address.

While that is true in a mechanical sense, Nix does impose extra restrictions on builders that don't have such a strong validation, notably isolating the builder from the network.
This is a good policy that I assume we keep.

Also the uniqueness of store paths that can pass a hash check lets us optimize away the FOD derivation, and perform a lookup instead.

roberth · 2024-12-03T13:36:30Z

In order to be able to test fetchers (FOD-producing functions), I have previously added a function to Nixpkgs to work around this missing feature. invalidateFetcherByDrvHash employs a scheme that repurposes FODs as checks, which permit access to the network, as is often needed for fetchers.

It calls the fetcher for the first time to get a hash that is unique for the fetcher implementation (i.e. taken from drvPath instead of the non-unique output path), and then calls it again to produce a unique desired output path by embedding that hash into the store path name.

This requires the current Nix implementation to perform the fetch uniquely for each "version" of the implementation, but this is not guaranteed. A more clever store could detect that the required hash is already available somewhere based on the content address instead of the whole path that includes a name and this would be a natural way to implement this necessary optimization if the store implementation is layered atop a purely content addressed data store (e.g. how tvix does it).
A similar trick can be applied to checks that don't seek to reuse Nix's output check, in which case the store path content can be used to make the "proof" unique instead of the name, but this still relies on convention, which isn't good enough.

Ericson2314 added the ca-derivations Derivations with content addressed outputs label Nov 25, 2024

This was referenced Nov 25, 2024

Better allowed/disallowed references, sound, and not causing spurious deps #11954

Open

stdenv: don't discard string context from ContentAddressed derivations NixOS/nixpkgs#214044

Closed

roberth mentioned this issue Dec 16, 2024

packaging: move tests to passthru #12063

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion quasi-derivations #11955

Assertion quasi-derivations #11955

Ericson2314 commented Nov 25, 2024

Ericson2314 commented Nov 25, 2024

roberth commented Nov 26, 2024

roberth commented Nov 26, 2024

Ericson2314 commented Nov 26, 2024

roberth commented Dec 3, 2024 •

edited

Loading

roberth commented Dec 3, 2024

Assertion quasi-derivations #11955

Assertion quasi-derivations #11955

Comments

Ericson2314 commented Nov 25, 2024

Ericson2314 commented Nov 25, 2024

roberth commented Nov 26, 2024

roberth commented Nov 26, 2024

Ericson2314 commented Nov 26, 2024

roberth commented Dec 3, 2024 • edited Loading

roberth commented Dec 3, 2024

roberth commented Dec 3, 2024 •

edited

Loading