Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion quasi-derivations #11955

Open
Ericson2314 opened this issue Nov 25, 2024 · 6 comments
Open

Assertion quasi-derivations #11955

Ericson2314 opened this issue Nov 25, 2024 · 6 comments
Labels
ca-derivations Derivations with content addressed outputs

Comments

@Ericson2314
Copy link
Member

A fixed-output derivation is nothing but a regular "floating" content-addressing derivation, along with an assertion that the output is in fact a certain content-address.

Other "output checks" can likewise be turned into assertions that the input satisfies the checks.

It would be conceptually cleaner to instead of making these checks part of a derivation (and thus have more knobs on what is a derivation) instead decompose them to special assertion nodes / in the derivation dependency graph --- or equivalently, special quasi derivations.

I had long thought this would be conceptually more elegant, but hadn't yet found a real tangible concrete use-case they would make things better from the user's perspective. But in #11954 I believe I finally found one. The short version of that is that for CA derivations, it is impossible simultaneously satisfy all of these:

  • Keep our current notation of unconditional immediate dependencies that always must be downloaded
  • Only do derivation substitutions (of placeholders today, or generalized versions of this one might imagine)
  • Support allowed/disallowed dependencies of things we might not end up having in our input runtime closure at all
  • Don't unnecessarily download stuff we don't actually need at build-time

Assertion quasi-derivations however provide a way out of that:

  • Dependencies can still be considered regular and unconditional (we have more types of nodes in our drv graph, but the dependency edge structure of each node is the same for all node types)
  • Rewriting nodes is still done exclusively for incoming immediate edges --- no "non-local" rewriting for transitive deps is ever required
  • Support arbitrary allowed/disallowed dependencies based on inputs
  • Since we can/must special-case these new node types, simply don't download anything at all based on them, we just need the rewrites to end up with the concrete store paths we need to scan for.

This nicely gets us all 4 desiderata, plus the separation of concerns (building vs linting), with minimal extra complexity.

@Ericson2314
Copy link
Member Author

@emilazy tells me about https://ninja-build.org/manual.html#validations, great prior art!

@roberth
Copy link
Member

roberth commented Nov 26, 2024

This would be an complement or alternative to #7662, which proposes to track validation using the string context, i.e. something handled above the store+build layer.

It would only be a complete alternative if it provides speculative building for increased concurrency, as mentioned in the issue, but also echoed in the ninja docs:

Marking the static analysis rule as an implicit input of the main build rule of the source files or of the rules that depend on the main build rule would slow down the critical path of the build, but using a validation would allow the build to proceed in parallel with the static analysis rule once the main build rule is complete.

A small amount of extra concurrency could be extracted by allowing to evaluate these validation nodes after evaluating the main derivation and after its dependents.

An combination of validation nodes and dynamic derivations may be of interest as well, as previously raised by amjoseph-nixpkgs:

(This would be independent of aforementioned evaluation optimization, unless we perform Nixpkgs evaluation as part of a set of (generated) dynamic derivations, which would require making Nixpkgs available to the builder (ie the store))

@roberth
Copy link
Member

roberth commented Nov 26, 2024

How exactly do these quasi-derivations fit into a build graph, and does that allow for speculative execution of dependent builds, i.e. assuming that the validation succeeds?
If the validation nodes are not between the dependency and dependent, this could still be compensated for by complicating the execution rules (i.e. the scheduling, or the graph-based "evaluation").

@Ericson2314
Copy link
Member Author

@roberth I initially wasn't thinking of fancier checks, and not speculative execution either. But it would also work for that.

(Indeed, doing a speculative build of an unverified impure floating CA derivation is a bit scary, but we can simply say there is no speculation if the input is impure.)

I would still have the validation nodes between the dependency and dependent, because otherwise we are scooping validations globally (which I don't think makes sense store-wide). But yes the scheduler can simply "rewrite" the graph so that the validations don't block anything, but it doesn't declare things finished until they pass.

@roberth
Copy link
Member

roberth commented Dec 3, 2024

A fixed-output derivation is nothing but a regular "floating" content-addressing derivation, along with an assertion that the output is in fact a certain content-address.

While that is true in a mechanical sense, Nix does impose extra restrictions on builders that don't have such a strong validation, notably isolating the builder from the network.
This is a good policy that I assume we keep.

Also the uniqueness of store paths that can pass a hash check lets us optimize away the FOD derivation, and perform a lookup instead.

@roberth
Copy link
Member

roberth commented Dec 3, 2024

In order to be able to test fetchers (FOD-producing functions), I have previously added a function to Nixpkgs to work around this missing feature. invalidateFetcherByDrvHash employs a scheme that repurposes FODs as checks, which permit access to the network, as is often needed for fetchers.

It calls the fetcher for the first time to get a hash that is unique for the fetcher implementation (i.e. taken from drvPath instead of the non-unique output path), and then calls it again to produce a unique desired output path by embedding that hash into the store path name.

This requires the current Nix implementation to perform the fetch uniquely for each "version" of the implementation, but this is not guaranteed. A more clever store could detect that the required hash is already available somewhere based on the content address instead of the whole path that includes a name and this would be a natural way to implement this necessary optimization if the store implementation is layered atop a purely content addressed data store (e.g. how tvix does it).
A similar trick can be applied to checks that don't seek to reuse Nix's output check, in which case the store path content can be used to make the "proof" unique instead of the name, but this still relies on convention, which isn't good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ca-derivations Derivations with content addressed outputs
Projects
None yet
Development

No branches or pull requests

2 participants