Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAR parser/generator #7

Merged
merged 3 commits into from
May 15, 2018
Merged

NAR parser/generator #7

merged 3 commits into from
May 15, 2018

Conversation

imalsogreg
Copy link
Collaborator

@imalsogreg imalsogreg commented Apr 30, 2018

NAR parser and generator. Moved Path types and functions into Paths.hs module.

Todo

  • cleanup
  • test the generator manually
  • create tests (parse . gen === id)
  • finish the binary instance
  • test encoding/decoding for large NARs (tried 10 and 12 Mb - found problems at 12)
    • can we stream in constant space? (moved to its own ticket)
    • what do we do with NARs beyond using them as intermediate form between download and unpack for computing store path hash? Provide API for doing these things (moved to its own ticket)
  • factor IO out of localUnpackNar - use NarEffects monad instead
  • Write an IO instance for NarEffects

Testing Done

Our encoder and decoder, filesystem packer and filesystem unpacker are all checked against each other and against the output of nix-store --dump.

Some of our tests use generated files about 8MB in size, and a realistic directory (the src directory of this project) for head-to-head comparisons with nix-store --dump.

*NarFormat Main> main
tests/Driver.hs
  narEncoding
    parser-roundtrip
      roundtrips regular:    OK
      roundtrips regular 2:  OK
      roundtrips executable: OK
      roundtrips symlink:    OK
      roundtrips directory:  OK
    matches-nix-store fixture
      matches regular:       OK
      matches regular':      OK
      matches executable:    OK
      matches symlink:       OK
      matches directory:     OK
  nixStoreRegular:           OK (0.03s)
  nixStoreDirectory:         OK (0.03s)
  nixStoreDirectory':        OK (0.03s)
  nixStoreBigFile:           OK (0.15s)
  nixStoreBigDir:            OK (0.34s)
  narEncodingArbitrary:      OK (0.18s)
    +++ OK, passed 100 tests.
  packSelfSrcDir:            OK (0.03s)

All 17 tests passed (0.80s)

@imalsogreg
Copy link
Collaborator Author

imalsogreg commented May 1, 2018

@shlevy This is ready for review. Not doing anything yet with NarEffects m.. this PR is just about the NAR type and a Binary encoding for NARs. But we could start fleshing out the effects to load NARs to/from files in an IO instance for NarEffects.

@imalsogreg imalsogreg changed the title WIP NAR parser/generator NAR parser/generator May 1, 2018
Copy link
Member

@shlevy shlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thank you! Had some comments in line, biggest ones are about ensuring we're lazy enough and testing, but a great step!

@@ -6,4 +6,12 @@ Core effects for interacting with the Nix store.
See `StoreEffects` in [System.Nix.Store] for the available operations
on the store.

[System.Nix.Nar]: ./src/System/Nix/Nar.hs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these aren't referenced anywhere in the README no need to add them.

text, regex-base, regex-tdfa-text,
hashable, unordered-containers, bytestring
hashable, unordered-containers, bytestring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing space?

, tasty-hunit
, tasty-hspec
, tasty-discover
default-language: Haskell2010
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing newline

data NarEffects (m :: * -> *) = NarEffets {
readFile :: FilePath -> m BSL.ByteString
, listDir :: FilePath -> m [FileSystemObject]
, narFromFileBytes :: BSL.ByteString -> m Nar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this as an effect? It's just a pure function right?

readFile :: FilePath -> m BSL.ByteString
, listDir :: FilePath -> m [FileSystemObject]
, narFromFileBytes :: BSL.ByteString -> m Nar
, narFromDirectory :: FilePath -> m Nar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we define this with readFile and listDir?

import qualified Data.Text.Encoding as E
import GHC.Int (Int64)

import System.Nix.Path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually use this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, in the Directory constructor for FileSystemObject I use PathName.

Description : Allowed effects for interacting with Nar files.
Maintainer : Shea Levy <shea@shealevy.com>
|-}
module System.Nix.Nar where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an explicit export list? Or do we want to export padLen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the export list. padLen is shared between put and get but otherwise is an internal implementation detail.

@@ -1,3 +1,5 @@
{-# LANGUAGE PackageImports #-}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover, will remove.

@@ -110,6 +66,7 @@ data StoreEffects rootedPath validPath m =
derivationOutputNames :: !(validPath -> m (HashSet Text))
, -- | Get a full 'Path' corresponding to a given 'Digest'.
pathFromHashPart :: !(Digest PathHashAlgo -> m Path)
, narEffects :: NarEffects m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this around later, but I'm not sure it necessarily needs to be here. The addNarToStore effect can just take a Nar. We'll see.

@@ -0,0 +1,150 @@
{-# LANGUAGE OverloadedStrings #-}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I feel like nar serialization/deserialization is a perfect case for property testing (quickcheck or whatever). Would you mind giving that a spin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in addition to tests derived from seeing what the nix cli does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually maybe we can just for now assume that the CLI is available and run it during the tests, instead of pre-generating fixtures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an Arbitrary sized instance and some tests that shell out to nix-store. Will keep the hand-copied fixtures for when "we win" and there is no more reference nix-store to use for tests :P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm stumped about how you could ever encode a string's length, followed by the string itself, without forcing the full string....

But looking at the source for ByteString, things look promising! Strict bytestrings give length in O(1), and it doesn't look like it forces the underlying bytestring. https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/src/Data.ByteString.html#length ... and in the lazy case, BSL.length bsl is defined with a foldl' over the chunks ( https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/src/Data.ByteString.Lazy.html#length ), which is strict in the Int64 being accumulated, but shouldn't force the whole of bsl.... I'm not really sure I get the laziness going on here correctly, need to do an experiment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imalsogreg You can do it by trusting the filesystem to correctly report the size of the file being read, and hoping the file doesn't change in the mean time (if it does, then you need to just error out in a streaming context)

@domenkozar
Copy link
Contributor

Couldn't this be a completely standalone package? It doesn't depend on any Nix related haskell libs :)

@shlevy
Copy link
Member

shlevy commented May 1, 2018

hnix-store-core is pretty independent already...


data FileSystemObject =
Regular IsExecutable BSL.ByteString
| Directory (Map.Map PathName FileSystemObject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathName is incorrect here. PathName is limited to just what Nix allows in the top-level name of an output, but files within that can be named anything. The real rule here is an arbitrary byte-string with no nulls or /s.



-- | Unpack a FileSystemObject into a non-nix-store directory (e.g. for testing)
localUnpackNar :: FilePath -> Nar -> IO ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, but I'm not sure it belongs here, as we're trying to be effect-agnostic. Can this be implemented in terms of a coherent set of filesystem effects? Or if not maybe we just move this to a module giving us the plain POSIX filesystem implementation of NarEffects?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

-- Create regular files and directories,
-- But save link creation until the end, by writing link commands out
-- to a queue, since otherwise we may try to create a link before
-- its target exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why does this matter? symlinks don't need to be valid, you can absolutely have a nar with foo -> bar but nothing at bar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nar may contain invalid links, but we have to sequence creation of those links on the disk, since the trying to create a link will fail if the target does not exist.

Or so I thought!! TIL that ln -s path1 path2 does not check that path1 exists! So I'll write localUnpackNar without the WriterT part.

@imalsogreg imalsogreg mentioned this pull request May 3, 2018
@shlevy
Copy link
Member

shlevy commented May 5, 2018

@imalsogreg ping me when this is ready for fresh review or if you have any questions!

@imalsogreg
Copy link
Collaborator Author

@shlevy Ping!

New Arbitrary instance and shell-out-to-nix store --dump provide a couple more test cases:

[nix-shell:~/code/hnix-store/hnix-store-core]$ dist/build/format-tests/format-tests
tests/Driver.hs
  narEncoding
    parser-roundtrip
      roundtrips regular:    OK
      roundtrips regular 2:  OK
      roundtrips executable: OK
      roundtrips symlink:    OK
      roundtrips directory:  OK
    matches-nix-store fixture
      matches regular:       OK
      matches regular':      OK
      matches executable:    OK
      matches symlink:       OK
      matches directory:     OK
  nixStoreRegular:           OK (0.04s)
  nixStoreDirectory:         OK (0.03s)
  nixStoreDirectory':        OK (0.03s)
  narEncodingArbitrary:      OK (0.04s)
    +++ OK, passed 100 tests.

All 14 tests passed (0.16s)

I would very much like to test/optimize the laziness, but I think that may be a good thing for a second PR. But, up to your preference!

@imalsogreg
Copy link
Collaborator Author

@shlevy actually there are a couple things in your prior comments left for me to do. I'll update the TODO list. Feel free to edit it if you would like go grow or shrink the scope for this PR :)

@shlevy
Copy link
Member

shlevy commented May 6, 2018

@imalsogreg For scope cutting, I think we can move actually computing a store hash and IO instances out of this initial PR... For me the main thing at this stage is just avoiding the memory explosion seen in C++ Nix 😄

What is the binary instance used for again?

@shlevy
Copy link
Member

shlevy commented May 6, 2018

Note that "computing the store hash" is in principle a StoreEffect, as when we talk to the daemon there's no real need to compute it in advance, we can just stream the nar to the daemon and it will give us a path back. We'll need a haskell implementation though for readonly and mock mode.

import System.Nix.Path


data NarEffects (m :: * -> *) = NarEffects {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO for separating out read effects from write effects. Certainly not necessary to start though.

data NarEffects (m :: * -> *) = NarEffects {
narReadFile :: FilePath -> m BSL.ByteString
, narWriteFile :: FilePath -> BSL.ByteString -> m ()
, narListDir :: FilePath -> m [FilePath]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically these can't be just any file paths, as they can't contain directory separators, and really it's a set not a list. Also, on most filesystems, readdir will return whether the file is a directory or not in the same call, which may be worth exposing at some point.

@imalsogreg
Copy link
Collaborator Author

@shlevy So there is enough code in place now to decode/encode NARs in at least the naive way. I'd like to get some more help figuring out how to test the main use cases. So what do you think about the sequence: (1) merge this PR as is, (2) open issues about perf/laziness testing (3) open issues about changing the API (either to make it fit the domain better, or changing it for performance reasons)?

binary instance here is just to connect the names getNar and putNar to the instance methods get and put of the class Binary. So, finishing that definition was only a matter of: instance Binary Nar where; get = getNar; put = putNar.

@shlevy
Copy link
Member

shlevy commented May 9, 2018

Will give this another round of review this weekend.

@imalsogreg
Copy link
Collaborator Author

imalsogreg commented May 9, 2018

Ok, even after replacing BSL.length bs <> str bs style (probably-not-lazy-enough) encoding decoding with a version that should not force the strings, a call to packNar >>= writeBSL will load the whole file into memory. I'll try to rewrite with streaming.

When I wrote the previous comment, I had missed one of the length bs <> str bs occurrences. After fixing that, we can localPackNar on a 10Gb file and stream the nar to disk in constant space. 🎉

@imalsogreg
Copy link
Collaborator Author

imalsogreg commented May 10, 2018

Now, tests can be run with an environment variable that gives large sizes to temporary files. Combined with RTS options that limit heap size, we can prove that Nars can serve as intermediate data structures when streaming nar encoded bytestrings to unpacked filesystem objects and vice versa

~/code/hnix-store/hnix-store-core]$ HNIX_BIG_FILE_SIZE=1000000000 dist/build/format-tests/format-tests +RTS -M100M

This produces some 1G single files and directories with 1G files in the test suite, while locking the heap size to 100M. With this combination of parameters, and the recent change in encoding/decoding strategy (being more careful not to force the intermediate lazy bytestrings) the tests pass. The tests can be made to fail by shrinking heap size below 100M.

@domenkozar
Copy link
Contributor

@imalsogreg
Copy link
Collaborator Author

@domenkozar Cool trick, thank you!

Copy link
Member

@shlevy shlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Sorry there's so much going on here... I definitely should have broken this down more from the get-go

@@ -1 +0,0 @@
../LICENSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When LICENSE was a symlink, I had trouble bringing hnix-store-core in as an override within hnix. Short-term stopgap. Want me to revert to the symlink?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, unless this is too difficult to work around

Tests
======

- `ghcid --command "cabal repl test-suite:format-tests" --test="Main.main"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mightybyte @jwiegley What's the best way to set up CI on PRs?

@@ -0,0 +1,24 @@
{ mkDerivation, base, base64-bytestring, basement, binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like not to check in generated code. Can we use developPackage or some such?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just drop this file. If we check in a develop.nix, we'll do that in a different PR.


import System.Nix.Path

data NarEffects (m :: * -> *) = NarEffects {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something feels a bit off about the interface we're exposing here... Conflating packing and unpacking, doing everything with absolute paths, separating out permissions from filesystem type... I think we may be able to do something more succinct here. Let's have a call about this? Maybe makes sense to move the NarEffects stuff to a later PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface design is definitely going to be up to your taste. I'd rather get something committed and defer experimentation to later PRs. Sure, we can discuss further in a call.


-- Directly taken from Eelco thesis
-- https://nixos.org/%7Eeelco/pubs/phd-thesis.pdf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get some haddocks on the types and fields/constructors defined in this file?

}

-- | A valid filename or directory name
newtype FilePathPart = FilePathPart { unFilePathPart :: Text }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike PathName, which is actually intended to be human-readable text, I think FilePathPart semantically (and representationally) is more akin to ByteString than Text. It's just an arbitrary, well, string of bytes, with two disallowed bytes.


------------------------------------------------------------------------------
-- | Deserialize a Nar from lazy ByteString
getNar :: B.Get Nar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be interesting at some point to have some way to actually stream out entries as they come in (e.g. have some kind of handlers for each kind of FSO, or usesome higher-level streaming API), but this seems good for now.

mname <- E.decodeUtf8 . BSL.toStrict <$> str
assertStr "node"
file <- parens getFile
maybe (fail $ "Bad FilePathPart: " ++ show mname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☹️ Not necessary for now, but we should figure out a really solid error handling story here. Bad error messages will be the death of us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, there are some errors that are just us taking advantage of Alternative for backtracking, and some that are actually fatal... We should make sure our messages reflect this.

assertStr "entry"
parens $ do
assertStr "name"
mname <- E.decodeUtf8 . BSL.toStrict <$> str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mname?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to be a Maybe. Fixed name no reflect pureness.

-> FilePath -- ^ Link
-> FileSystemObject -- ^ FileSystemObject to add link to
-> FileSystemObject
mkLink = undefined -- TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't referenced anywhere, should we remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather keep it as a TODO. It's a reminder to come up with a sensible way to add symlinks to the arbitrary instance.

@imalsogreg imalsogreg merged commit 2075aae into haskell-nix:master May 15, 2018
Anton-Latukha added a commit to Anton-Latukha/hnix-store that referenced this pull request Jul 10, 2020
…ow-add-Nixpkgs-GHCJS

GitHub CI: workflow: add Nixpkgs GHCJS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants