-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAR parser/generator #7
Conversation
@shlevy This is ready for review. Not doing anything yet with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thank you! Had some comments in line, biggest ones are about ensuring we're lazy enough and testing, but a great step!
hnix-store-core/README.md
Outdated
@@ -6,4 +6,12 @@ Core effects for interacting with the Nix store. | |||
See `StoreEffects` in [System.Nix.Store] for the available operations | |||
on the store. | |||
|
|||
[System.Nix.Nar]: ./src/System/Nix/Nar.hs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these aren't referenced anywhere in the README no need to add them.
text, regex-base, regex-tdfa-text, | ||
hashable, unordered-containers, bytestring | ||
hashable, unordered-containers, bytestring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing space?
, tasty-hunit | ||
, tasty-hspec | ||
, tasty-discover | ||
default-language: Haskell2010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing newline
data NarEffects (m :: * -> *) = NarEffets { | ||
readFile :: FilePath -> m BSL.ByteString | ||
, listDir :: FilePath -> m [FileSystemObject] | ||
, narFromFileBytes :: BSL.ByteString -> m Nar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this as an effect? It's just a pure function right?
readFile :: FilePath -> m BSL.ByteString | ||
, listDir :: FilePath -> m [FileSystemObject] | ||
, narFromFileBytes :: BSL.ByteString -> m Nar | ||
, narFromDirectory :: FilePath -> m Nar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we define this with readFile and listDir?
import qualified Data.Text.Encoding as E | ||
import GHC.Int (Int64) | ||
|
||
import System.Nix.Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, in the Directory
constructor for FileSystemObject
I use PathName
.
Description : Allowed effects for interacting with Nar files. | ||
Maintainer : Shea Levy <shea@shealevy.com> | ||
|-} | ||
module System.Nix.Nar where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an explicit export list? Or do we want to export padLen
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make the export list. padLen
is shared between put
and get
but otherwise is an internal implementation detail.
@@ -1,3 +1,5 @@ | |||
{-# LANGUAGE PackageImports #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover, will remove.
@@ -110,6 +66,7 @@ data StoreEffects rootedPath validPath m = | |||
derivationOutputNames :: !(validPath -> m (HashSet Text)) | |||
, -- | Get a full 'Path' corresponding to a given 'Digest'. | |||
pathFromHashPart :: !(Digest PathHashAlgo -> m Path) | |||
, narEffects :: NarEffects m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this around later, but I'm not sure it necessarily needs to be here. The addNarToStore
effect can just take a Nar
. We'll see.
hnix-store-core/tests/Nar.hs
Outdated
@@ -0,0 +1,150 @@ | |||
{-# LANGUAGE OverloadedStrings #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I feel like nar serialization/deserialization is a perfect case for property testing (quickcheck or whatever). Would you mind giving that a spin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in addition to tests derived from seeing what the nix cli does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually maybe we can just for now assume that the CLI is available and run it during the tests, instead of pre-generating fixtures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an Arbitrary
sized instance and some tests that shell out to nix-store
. Will keep the hand-copied fixtures for when "we win" and there is no more reference nix-store
to use for tests :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm stumped about how you could ever encode a string's length, followed by the string itself, without forcing the full string....
But looking at the source for ByteString
, things look promising! Strict bytestrings give length
in O(1), and it doesn't look like it forces the underlying bytestring. https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/src/Data.ByteString.html#length ... and in the lazy case, BSL.length bsl
is defined with a foldl' over the chunks ( https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/src/Data.ByteString.Lazy.html#length ), which is strict in the Int64
being accumulated, but shouldn't force the whole of bsl
.... I'm not really sure I get the laziness going on here correctly, need to do an experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imalsogreg You can do it by trusting the filesystem to correctly report the size of the file being read, and hoping the file doesn't change in the mean time (if it does, then you need to just error out in a streaming context)
Couldn't this be a completely standalone package? It doesn't depend on any Nix related haskell libs :) |
hnix-store-core is pretty independent already... |
|
||
data FileSystemObject = | ||
Regular IsExecutable BSL.ByteString | ||
| Directory (Map.Map PathName FileSystemObject) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PathName
is incorrect here. PathName
is limited to just what Nix allows in the top-level name of an output, but files within that can be named anything. The real rule here is an arbitrary byte-string with no nulls or /
s.
|
||
|
||
-- | Unpack a FileSystemObject into a non-nix-store directory (e.g. for testing) | ||
localUnpackNar :: FilePath -> Nar -> IO () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, but I'm not sure it belongs here, as we're trying to be effect-agnostic. Can this be implemented in terms of a coherent set of filesystem effects? Or if not maybe we just move this to a module giving us the plain POSIX filesystem implementation of NarEffects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
-- Create regular files and directories, | ||
-- But save link creation until the end, by writing link commands out | ||
-- to a queue, since otherwise we may try to create a link before | ||
-- its target exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why does this matter? symlinks don't need to be valid, you can absolutely have a nar with foo -> bar
but nothing at bar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nar may contain invalid links, but we have to sequence creation of those links on the disk, since the trying to create a link will fail if the target does not exist.
Or so I thought!! TIL that ln -s path1 path2
does not check that path1
exists! So I'll write localUnpackNar
without the WriterT
part.
@imalsogreg ping me when this is ready for fresh review or if you have any questions! |
@shlevy Ping! New Arbitrary instance and shell-out-to-
I would very much like to test/optimize the laziness, but I think that may be a good thing for a second PR. But, up to your preference! |
@shlevy actually there are a couple things in your prior comments left for me to do. I'll update the TODO list. Feel free to edit it if you would like go grow or shrink the scope for this PR :) |
@imalsogreg For scope cutting, I think we can move actually computing a store hash and IO instances out of this initial PR... For me the main thing at this stage is just avoiding the memory explosion seen in C++ Nix 😄 What is the binary instance used for again? |
Note that "computing the store hash" is in principle a |
import System.Nix.Path | ||
|
||
|
||
data NarEffects (m :: * -> *) = NarEffects { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO for separating out read effects from write effects. Certainly not necessary to start though.
data NarEffects (m :: * -> *) = NarEffects { | ||
narReadFile :: FilePath -> m BSL.ByteString | ||
, narWriteFile :: FilePath -> BSL.ByteString -> m () | ||
, narListDir :: FilePath -> m [FilePath] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically these can't be just any file paths, as they can't contain directory separators, and really it's a set not a list. Also, on most filesystems, readdir will return whether the file is a directory or not in the same call, which may be worth exposing at some point.
@shlevy So there is enough code in place now to decode/encode NARs in at least the naive way. I'd like to get some more help figuring out how to test the main use cases. So what do you think about the sequence: (1) merge this PR as is, (2) open issues about perf/laziness testing (3) open issues about changing the API (either to make it fit the domain better, or changing it for performance reasons)?
|
Will give this another round of review this weekend. |
When I wrote the previous comment, I had missed one of the |
Now, tests can be run with an environment variable that gives large sizes to temporary files. Combined with RTS options that limit heap size, we can prove that
This produces some 1G single files and directories with 1G files in the test suite, while locking the heap size to 100M. With this combination of parameters, and the recent change in encoding/decoding strategy (being more careful not to force the intermediate lazy bytestrings) the tests pass. The tests can be made to fail by shrinking heap size below 100M. |
Small tip if it's helpful how to check max memory within a single test: https://github.com/plow-technologies/servant-streaming/blob/master/servant-streaming-server/test/Servant/Streaming/ServerSpec.hs#L83-L84 |
@domenkozar Cool trick, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! Sorry there's so much going on here... I definitely should have broken this down more from the get-go
hnix-store-core/LICENSE
Outdated
@@ -1 +0,0 @@ | |||
../LICENSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When LICENSE was a symlink, I had trouble bringing hnix-store-core
in as an override within hnix
. Short-term stopgap. Want me to revert to the symlink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, unless this is too difficult to work around
Tests | ||
====== | ||
|
||
- `ghcid --command "cabal repl test-suite:format-tests" --test="Main.main"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mightybyte @jwiegley What's the best way to set up CI on PRs?
hnix-store-core/default.nix
Outdated
@@ -0,0 +1,24 @@ | |||
{ mkDerivation, base, base64-bytestring, basement, binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like not to check in generated code. Can we use developPackage
or some such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just drop this file. If we check in a develop.nix, we'll do that in a different PR.
|
||
import System.Nix.Path | ||
|
||
data NarEffects (m :: * -> *) = NarEffects { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something feels a bit off about the interface we're exposing here... Conflating packing and unpacking, doing everything with absolute paths, separating out permissions from filesystem type... I think we may be able to do something more succinct here. Let's have a call about this? Maybe makes sense to move the NarEffects
stuff to a later PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interface design is definitely going to be up to your taste. I'd rather get something committed and defer experimentation to later PRs. Sure, we can discuss further in a call.
|
||
-- Directly taken from Eelco thesis | ||
-- https://nixos.org/%7Eeelco/pubs/phd-thesis.pdf | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get some haddocks on the types and fields/constructors defined in this file?
} | ||
|
||
-- | A valid filename or directory name | ||
newtype FilePathPart = FilePathPart { unFilePathPart :: Text } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike PathName
, which is actually intended to be human-readable text, I think FilePathPart
semantically (and representationally) is more akin to ByteString
than Text
. It's just an arbitrary, well, string of bytes, with two disallowed bytes.
|
||
------------------------------------------------------------------------------ | ||
-- | Deserialize a Nar from lazy ByteString | ||
getNar :: B.Get Nar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be interesting at some point to have some way to actually stream out entries as they come in (e.g. have some kind of handlers for each kind of FSO, or usesome higher-level streaming API), but this seems good for now.
mname <- E.decodeUtf8 . BSL.toStrict <$> str | ||
assertStr "node" | ||
file <- parens getFile | ||
maybe (fail $ "Bad FilePathPart: " ++ show mname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, there are some errors that are just us taking advantage of Alternative
for backtracking, and some that are actually fatal... We should make sure our messages reflect this.
assertStr "entry" | ||
parens $ do | ||
assertStr "name" | ||
mname <- E.decodeUtf8 . BSL.toStrict <$> str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why m
name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used to be a Maybe
. Fixed name no reflect pureness.
-> FilePath -- ^ Link | ||
-> FileSystemObject -- ^ FileSystemObject to add link to | ||
-> FileSystemObject | ||
mkLink = undefined -- TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't referenced anywhere, should we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather keep it as a TODO. It's a reminder to come up with a sensible way to add symlinks to the arbitrary instance.
…ow-add-Nixpkgs-GHCJS GitHub CI: workflow: add Nixpkgs GHCJS
NAR parser and generator. Moved
Path
types and functions intoPaths.hs
module.Todo
binary
instancelocalUnpackNar
- useNarEffects
monad insteadNarEffects
Testing Done
Our encoder and decoder, filesystem packer and filesystem unpacker are all checked against each other and against the output of
nix-store --dump
.Some of our tests use generated files about 8MB in size, and a realistic directory (the
src
directory of this project) for head-to-head comparisons withnix-store --dump
.