-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Path library design #200718
Path library design #200718
Conversation
For path values, having a single representation was a great design choice. It reduces the number of possibilities that users need to consider. It reduces the intrinsic complexity of the path operations. It reduces the extrinsic complexity by avoiding the possibility of doing something slightly surprising when a path happens to be relative instead of absolute. It simplifies code that deals with path values. Of course a relative path pops up every now and then, but not to the point where I've felt the need more than single small operations. What are the use cases? Do they really need to work with relative paths? Small braindump:
|
@roberth Relative paths are needed on a very basic level of just appending path components, e.g.
should work, the second element is a relative path. And the opposite:
outputs relative paths. Another case is for your source combinators, where you want to pass a subpath via e.g.
That's like your Regarding |
Boy do I have some path routines for you 😁 |
@aakropotkin I'd rather discuss the design of such a library beforehand, that's the hard part to get right before implementing it :) Please take a look at the document and what you think of it. I'm also very interested in use cases for such a library, let me know about yours! We should have justification for each part of this library |
Gotcha. One I'm a fan of for relative paths in cases where I'm loading several files is:
Mostly this cuts down boilerplate, but I also like being able to override basedir later with a simple |
This is not a use case.
You're putting an absolute path in a list of path components? Again, this is not a use case. Why would I need this?
Exactly. I didn't need a relative path library for this. Why would you make it stringly typed? That's not something we should encourage.
If we throw it out, path components are all we need, and those are just lists of strings. Please, please, show me a use case where users need a relative path library. Relative paths are a mess, with tons of corner cases and silly representations. You've shown that very clearly in the design document. By engaging with bad data types or representations, you'll only make code more confusing and error prone. If everyone jumps off a cliff, that's not a reason for us to jump off a cliff. |
What was the boilerplate? |
Alright, alternative proposition: Use Nix-provided paths first and foremost. Avoid the concept of relative paths, because it comes with a lot of baggage that we'd rather avoid. Instead, work with path differences represented as lists of path component strings. Prefer to keep this as an internal representation only, returning Nix paths instead. This way we don't create a split between functions that take paths as usual and functions that take relative paths, and we aren't tempted to do add a ton of dynamic type matching and half-baked implicit conversion code to each and every filesystem related function. (We already have half a ton of this in lib.sources, and the interactions between every path-like representation will make this quadratically worse) |
In practice I usually pack a number of processing routines in the base functor. For example simplifying Having that stuff centered around a common structure was helpful to me. Having said that those things might not be useful in Nixpkgs. |
@aakropotkin You're talking about your functions, but you didn't show me any boilerplate. Should I assume that you just use the term very liberally?
These are already equivalent path values. Why would they be strings?
What does this mean?
I have a well tested
What kind of preprocessing? |
While "joining a path" is not a use-case on its own, such an operation is needed for a lot of things. These are ones I can think of right now:
Note that such a
This is mainly for
It's not a use case on its own no, but it's mainly there to allow people to implement their own path functions without having to use strings for that. Ideally the path library provides everything that's needed though. Can't think of a use case right now though. The design document talks about perhaps making this function internal because of that.
Even if it's represented as a list, it's still a relative path. This is definitely a use case. Note that in your source combinators PR there's a "manual" |
Edit for clarity: This comment is about whether Nix paths or strings should be used for a path library. It's not related to the above arguments about support for relative paths.
Here's some arguments against Nix paths (should be added to the document for sure):
Here's the arguments I can think of for Nix paths:
|
Already supported without a relative path library.
No, because that would let you add an absolute path after a relative path, which is not a valid operation, but with the air of a legitimate operation because we have a function for it, obscuring the nastiness that's going on. You could forbid paths starting with
It's the devil we know.
Useful. Where it's not desirable, you're not going to fix that with a relative path library.
Often not a big deal, especially since flakes filter out at least the minimum, which is the gitignored stuff. A relative path library does not fix this user error.
Ok, show me a real world example.
I don't think I've encountered these. Could you explain?
Not a problem that a relative path library solves.
Where would you use these? In IFD? Does it get complicated enough to warrant a library?
Or it may be correct. The behavior you call correct is still very surprising, even if POSIX(?) suggests it. Most people expect to operate on file system trees, not graphs. In a tree (no symlinks), the normalisation always holds up.
Relative paths would be strings, according to your proposal, so this is a reason not to implement a relative path library.
This is unfortunate, but with a relative path library, you'll have the opposite problem that users may append a redundant slash. Created NixOS/nix#7301
|
@roberth Please note that my arguments in this comment are not related to the discussion of supporting relative paths. This is about a path library in general, irrespective of whether relative paths are supported in any way. I'm not replying to all your arguments where that's the counter.
Yes I simplified it.. The
Nix can't assume Flakes is used, but yes they do make this a bit better.
This would happen if a path is accidentally interpolated when not meant to
No just a simple
Or it may be correct, but the Nix path type has no business assuming it doesn't matter. Yes,
Such as? |
Yeah, we're discussing multiple things at once. Maybe I was too focused on relative paths.
That seems like a sensible function then.
True. I don't think there's much we can do on the library side to improve this unless we reject Nix paths altogether, turn them into a library implementation detail and forbid path expressions in user code. This seems excessive.
Seems fixable: NixOS/nix#7303
Build outputs can not be converted to path values. This would tend to cause IFD for no good reason.
I'm ok with a path join function, but I am not convinced that we need an extensive path library.
So symlinks are a leaky abstraction. This is more reason to avoid over-abstraction in the form of a path library.
This is a general observation. If your (total) functions support one type, all functions can be combined arbitrarily. This is useful. Often this is not quite achievable, but an approximation of this ideal is already pleasant to work with. This is the "algebraic" or "combinator" flavor of libraries. On the other end of the spectrum, you have libraries with many types, and few functions that operate on each particular type. For a library that is about a "single" type, paths, the latter would be a disaster. A concrete example would be |
I like the progress. If we keep this up, we can solve our problems without creating a mess for ourselves. |
There's no problem with
This proposal is actually kept fairly minimal imo, which functions do you think are unnecessary?
Perhaps symlinks are leaky, but people are still using them and expect them to work in a certain way. There's no abstraction introduced by this current proposal, it's really just the same absolute and relative paths strings used all throughout Linux, so not sure what you mean by over-abstracted.
That is an argument. Specifically something like This does give me an idea though: How about making it such that this path library returns the same data type as it gets as an input. So
Then |
I almost forgot about this, since recently, Nix supports paths with interpolation:
Only works when starting with a
Not sure about this, it's not very pretty, but a
|
One example of boilerplate is this example. Here I have wrappers around any
Nothing too special, but this would be tedious to write in multiple routines, so often times I was wrapping things in a path construct. Again though, probably not useful in Nixpkgs, just wanted to provide an example of what I was talking about. |
The current path problems related to Looking at
But I'm also just realizing that lazy trees in its current state essentially breaks such a string-based path library, since it makes it such that
This notably also breaks the normalization function you're using in the source combinators PR:
Why is it broken? Because the lazy trees PR deprecates anything related to the
And while you seem to be able to access a subdirectory, it's deprecated:
(Oh also the error is wrong, I'm applying I can't see why this should be deprecated though, I'll mention this in the lazy trees PR: NixOS/nix#6530 (comment) |
Earlier in the thread I was asked why I always immediately I simply didn't trust Nix to keep my path as a regular FS path outside of the store, but if I I'm pretty glad I went that way now that I see how lazy paths are going to blow up for any "late" attempts to stringize. EDIT: I read the other PR thread. If I'm understanding it correctly it looks like they are going to flood my log with depreciation warnings for my approach... This is all well and good as long as the brand new replacement they just took out of the oven is well documented, backwards compatible, and bug free 😅. It's been a while since I locked into an old release but I might let the dust settle on this one for a few months. Hopefully I misunderstood the PR. |
No, they're examples of why Nix needs to be backwards compatible.
I do see a couple of uses for
The idea of "absoluteness" is only relevant to file system paths, not lazy trees.
This introduces a lot of uncertainty. |
That too. Something can be an example of multiple issues. The issue of being unsure about which paths get imported into the store being hard to reason about is something that can be seen from these PRs. E.g. this diff causes the result to be a path instead of a string which can cause it to be imported into the store when it didn't previously, which is entirely non-obvious from this code: - gitRepo = "${toString ./..}/.git";
+ gitRepo = ./.. + "/.git";
diff --git a/lib/sources.nix b/lib/sources.nix
index 3ad7dc63355..d8d15a9333b 100644
--- a/lib/sources.nix
+++ b/lib/sources.nix
@@ -185,12 +185,12 @@ let
# not exported, used for commitIdFromGitRepo
_commitIdFromGitRepoOrError =
let readCommitFromFile = file: path:
- let fileName = path + "/${file}";
- packedRefsName = path + "/packed-refs";
+ let fileName = join [ path file ];
+ packedRefsName = join [ path "packed-refs" ];
absolutePath = base: path:
- if lib.hasPrefix "/" path
+ if isAbsolute path
then path
- else toString (/. + "${base}/${path}");
+ else join [ base path ];
in if pathIsRegularFile path
# Resolve git worktrees. See gitrepository-layout(5)
then
@@ -199,12 +199,11 @@ let
then { error = "File contains no gitdir reference: " + path; }
else
let gitDir = absolutePath (dirOf path) (lib.head m);
- commonDir'' = if pathIsRegularFile "${gitDir}/commondir"
- then lib.fileContents "${gitDir}/commondir"
+ commonDir' = if pathIsRegularFile "${gitDir}/commondir"
+ then normalise (lib.fileContents "${gitDir}/commondir")
else gitDir;
- commonDir' = lib.removeSuffix "/" commonDir'';
commonDir = absolutePath gitDir commonDir';
- refFile = lib.removePrefix "${commonDir}/" "${gitDir}/${file}";
+ refFile = relativeTo commonDir (join [ gitDir file ]);
in readCommitFromFile refFile commonDir
else if pathIsRegularFile fileName |
Adding more code isn't going to make this problem go away.
Conversion to a string actually is a problem in this function, but the fix wasn't complete, causing trouble. |
- Changes the agreed-upon design slightly to make types of functions clearer: - Previously `path.join` worked on a list of paths, but required all but the first component to relative. This is now split into two functions: - `path.append <path> <string>` takes care of appending a relative path to an absolute path. - `path.relative.join [ <string> ]` takes care of joining relative paths together - `path.normalise` -> `path.relative.normalise`, because we don't need normalisation on absolute paths, Nix already takes care of that, and we use the `path.relative` namespace for anything only relating to relative paths - Some more bikeshedding for the `relativeTo` name. I think `relativeTo` is pretty good, but @fricklerhandwerk likes other suggestions more - Adds some suggestions for partial ordering checks on paths - Adds a `difference` function, which can take care of common prefix and subpath calculations between any number of paths.
I paired with @fricklerhandwerk today and made some changes:
|
Yeah, @infinisil argued this opens potential for confusion with string functions, but that's what namespaces are for. Getting the local scope consistent and unsurprising is more important for me. I also like his idea of structuring the namespace by purpose in a fine-grained manner, such that the symbols are self-explanatory. That is really helpful reading such code, given we don't have a type system to assist us. |
difference { left = /foo; right = /foo; } = { commonPrefix = /foo; suffix = { left = "."; right = "."; }; } | ||
|
||
# Requires at least one path | ||
difference {} = <error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function reminds me more of the greatest common divisor than difference.
The word difference applied to paths makes me think of set difference.
tangent
..., which something I've considered implementing but purposefully haven't
(noting that trees are quite different from sets of paths)
Set difference is not very different from sources.filter
and might suggest a usage that may not perform well.
```nix | ||
difference { path = /foo/bar } = { commonPrefix = /foo/bar; suffix = { path = "."; }; } | ||
difference { left = /foo/bar; right = /foo/baz; } = { commonPrefix = /foo; suffix = { left = "bar"; right = "baz"; }; } | ||
difference { left = /foo; right = /foo/bar; } = { commonPrefix = /foo; suffix = { left = "."; right = "bar"; }; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
difference { left = /foo; right = /foo/bar; } = { commonPrefix = /foo; suffix = { left = "."; right = "bar"; }; } | |
difference { a = /foo; b = /foo/bar; c = /foo/baz } = { commonPrefix = /foo; suffix = { a = "."; b = "bar"; c = "baz"; }; } |
Illustrate that the names are up to the caller, for those who don't normally read types.
### `difference` | ||
|
||
```haskell | ||
difference :: AttrsOf Path -> { commonPrefix :: Path; suffix :: AttrsOf String; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideas:
commonAncestry :: f Path -> { commonAncestor :: Path; relativePaths :: f String; }
where f is one of
attrsOf
listOf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a path is a sequence of nodes in a tree, the correct term for the object of interest is "prefix", not "ancestor". The latter applies to the relations of nodes, not elements in a path.
commonPrefix :: AttrsOf Path -> { prefix :: Path; suffix :: AttrsOf String }
Using suffix
does not only save us characters, but also reads more natural when accessing it:
(commonPrefix { a = /abc/def; b = /abc/xzy; }).suffix.b
I'm not the biggest fan of taking a list. With @infinisil we crystallised a few key ideas about the library, also based on your feedback (specifically: keeping the type surface small), and one of them was that it should be as general as possible such that one can implement convenience functions on top of it in a straighforward manner.
That way we can keep the library itself slim, since consumers can just make their own thing easily, and we can adopt the convenience functions into the library once they start to proliferate.
A list has strictly less information than an attrset, and can always be projected down from the attrset, but not the other way around.
{
commonPrefixList = xs:
let
intermediate = commonPrefix (listToAttrs (imap0 (i: x: { ${toString i} = x; })));
in { prefix = intermediate.prefix; suffix = attrValues intermediate.suffix; };
}
Sure that's slow, but right now I don't even know if we have a use case for that interface.
Also lays down the assumptions we're making about paths, assumptions which notably also make the library work with the lazy trees Nix PR (without relying or interfering with any of its bugs)
This library makes only these assumptions about paths and no others: | ||
- `dirOf path` returns the path to the parent directory of `path`, unless `path` is the filesystem root, in which case `path` is returned | ||
- There can be multiple filesystem roots: `p == dirOf p` and `q == dirOf p` does not imply `p == q` | ||
- While there's only a single filesystem root in stable Nix, the [lazy trees PR](https://github.com/NixOS/nix/pull/6530) introduces [additional filesystem roots](https://github.com/NixOS/nix/pull/6530#discussion_r1041442173) | ||
- `path + ("/" + string)` returns the path to the `string` subdirectory in `path` | ||
- If `string` contains no `/` characters, then `dirOf (path + ("/" + string)) == path` | ||
- If `string` contains no `/` characters, then `baseNameOf (path + ("/" + string)) == string` | ||
- `path1 == path2` returns true only if `path1` points to the same filesystem path as `path2` | ||
|
||
Notably we do not make the assumption that we can turn paths into strings using `toString path`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR with such a commonAncestry
function now (just picked @roberth's recent suggestion for now, I don't have a strong opinion on the name yet).
I also updated the PR to lay down the assumptions about the path type that the library is based on. Notably also including one small assumption which makes it harmonize with the eventual lazy trees PR.
As a next step I'll split this PR into smaller parts
I created a smaller PR for just the |
Adds initial work towards a `lib.path` library Originally proposed in #200718, but has since gone through some revisions Co-Authored-By: Valentin Gagarin <valentin.gagarin@tweag.io>
Adds initial work towards a `lib.path` library Originally proposed in #200718, but has since gone through some revisions Co-Authored-By: Valentin Gagarin <valentin.gagarin@tweag.io> Co-Authored-By: Robert Hensing <robert@roberthensing.nl>
Adds initial work towards a `lib.path` library Originally proposed in #200718, but has since gone through some revisions Co-Authored-By: Valentin Gagarin <valentin.gagarin@tweag.io> Co-Authored-By: Robert Hensing <robert@roberthensing.nl>
#205190 is now merged with a much better design document and initial I might use some API descriptions and implementations from this draft PR, but this one can be closed. |
Adds initial work towards a `lib.path` library Originally proposed in NixOS/nixpkgs#200718, but has since gone through some revisions Co-Authored-By: Valentin Gagarin <valentin.gagarin@tweag.io> Co-Authored-By: Robert Hensing <robert@roberthensing.nl>
Adds initial work towards a `lib.path` library Originally proposed in NixOS/nixpkgs#200718, but has since gone through some revisions Co-Authored-By: Valentin Gagarin <valentin.gagarin@tweag.io> Co-Authored-By: Robert Hensing <robert@roberthensing.nl>
Description of changes
There is a need for path operations in nixpkgs and third-party tools. This PR drafts a design document for what such a path library should look like.
Also ping @aakropotkin
This work is sponsored by Antithesis ✨