Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hnix-0.7 #1695

Merged
merged 1 commit into from
Mar 15, 2020
Merged

Use hnix-0.7 #1695

merged 1 commit into from
Mar 15, 2020

Conversation

sjakobi
Copy link
Collaborator

@sjakobi sjakobi commented Mar 3, 2020

No description provided.

@dhall-lang dhall-lang deleted a comment Mar 3, 2020
@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 3, 2020

Why does Hydra complain about

Setup: Encountered missing dependencies:
contravariant >=1.4.1 && <1.6

…when we have a contravariant.nix with v1.5?

@Gabriella439
Copy link
Collaborator

@sjakobi: I just pushed a change that should fix the aeson build failure on Nix

The issue is that when using cabal2nix to generate the ./nix/*.nix files it has to guess which GHC version to use when translating the .cabal file to Nix code. This is because the .cabal file may have conditional logic gated on the compiler version and cabal2nix is not set up to generate Nix code for multiple GHC versions.

In this particular case, the issue was that aeson.cabal has this logic:

  if !impl(ghc >= 8.6)
    build-depends:
      contravariant >=1.4.1    && <1.6

... so GHC 8.6 versions or older depend on contravariant. However, when cabal2nix was run it defaulted to a newer GHC version so it assumed that the contravariant dependency was not necessary and didn't supply it via Nix.

So the issue wasn't that contravariant was the wrong version. Rather the problem was that the contravariant dependency was completely missing. You can see this if you look at the old ./nix/aeson.nix which didn't reference contravariant at all.

The way I fixed it for the change I pushed was to run:

$ cabal2nix --compiler ghc-8.4 cabal://aeson > ./nix/aeson.nix

... which then caused cabal2nix to correctly pick up the contravariant dependency.

If we were using a newer version of Nixpkgs there is a more reliable way to fix this problem, which is to use pkgs.haskellPackages.callHackage, which you can give the specified package name and version and it will basically call cabal2nix for you with the right --compiler version. There's also a pkgs.haskell.lib.packageSourceOverrides utility which provides a higher-level interface to this, too.

In general, this trick only works for packages that are older than the Nixpkgs revision you are using, so we'd need a Nixpkgs revision newer than, say, aeson-1.4.6.0 to be able to use this for aeson specifically. The advantage of using cabal2nix explicitly is that you can use package versions newer than the Nixpkgs revision.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 6, 2020

Many thanks for the fix and the explanation @Gabriel439! I'll take a closer look at the .nix file of the package that fails to build, next time!

Those recent Nixpkgs features sound nice! I hope we can update our version of Nixpkgs once
support for GHC 8.8.3 has stabilised!

nix/shared.nix Show resolved Hide resolved
@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 6, 2020

The latest aeson builds on Hydra failed with exit code 137 which indicates OOM. The problem seems to be the .p_o file for Data.Aeson.Types.ToJSON. Not sure how to best proceed. We could:

  • Stop building profiling versions of dependencies. (What do we need those for anyway?)
  • Try picking a different (older) aeson version which requires less memory to compile.
  • Enable aeson's fast Cabal flag, which disables optimizations.

@Gabriella439
Copy link
Collaborator

@sjakobi: We can stop building profiled builds. I think all you need to do is add this line:

enableLibraryProfiling = false;

... right below here:

doCheck = false;

@sjakobi sjakobi force-pushed the sjakobi/hnix-0.7 branch from 010ce25 to edaedb8 Compare March 6, 2020 20:58
@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 6, 2020

Somehow that last commit fails to trigger a new job at https://hydra.dhall-lang.org/jobset/dhall-haskell/1695…

@Gabriella439
Copy link
Collaborator

@sjakobi: It's working on it. Nix has two main phases when building things:

  • Instantiation: Converting Nix code to .drv files
  • Building: Converting .drv files to build products

The first phase takes longer when you make large changes at the Nix level (like disabling profiling for all Haskell packages), but on subsequent runs it will speed up again.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 9, 2020

Thanks for the explanation, @Gabriel439! :)

Instantiation seems to have failed due to an OOM situation while compiling Cabal: https://hydra.dhall-lang.org/jobset/dhall-haskell/1695#tabs-errors

Does the build machine have unexpectedly little available memory for some reason?

@Gabriella439
Copy link
Collaborator

@sjakobi: It has 8 GB of memory. It's the $40 / month option here: https://www.linode.com/pricing/

Another thing you can do is disable profiling just for the aeson build. You can do that by adding something like this:

aeson =
  pkgsNew.haskell.lib.disableLibraryProfiling haskellPackagesOld.aeson;

... right below where mkDerivation is overriden:

mkDerivation =
args: haskellPackagesOld.mkDerivation (args // {
doCheck = false;
}
);

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 10, 2020

The Cabal build OOM'd again: https://hydra.dhall-lang.org/jobset/dhall-haskell/1695#tabs-errors

…and so did the dhall build for #1702: https://hydra.dhall-lang.org/build/53308/nixlog/1

When there are multiple builds running at the same time, can they starve each other of memory?!

@Gabriella439
Copy link
Collaborator

@sjakobi: I think the newer evaluation had not yet kicked in, because once you reverted that change it should no longer be trying to rebuild Cabal. Indeed, hydra is now failing on aeson-yaml

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 10, 2020

I think the newer evaluation had not yet kicked in, because once you reverted that change it should no longer be trying to rebuild Cabal.

Wow, I could have sworn that the error log was from today. This is so confusing! :(

The new problem is that when building aeson-pretty, we can't find the profiling libraries for aeson (which we've just disabled): https://hydra.dhall-lang.org/build/53347/nixlog/2

Disabling profiling for all reverse dependencies of aeson seems rather onerous and like a maintenance-hassle with future dependency additions.

Disabling it categorically sounds much better to me. So what can we do to make that Cabal re-build pass?

I saw that Cabal was built with -j4. Can we override that with -j1, so hopefully there will be more memory available for building each Cabal module?

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 10, 2020

Sorry for this endless back and forth with me being so clueless about Nix BTW. Can you recommend some resources that would help me get up to speed?

@Gabriella439
Copy link
Collaborator

@sjakobi: Are you able to run the Nix build locally yourself? Once you set that up you can iterate much more quickly than using CI

In principle, all you need to reproduce what CI does is to run:

$ nix build --file ./release.nix

... and you can build specific top-level attributes by adding them as:

$ nix build --file ./release.nix linux-dhall

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 10, 2020

Are you able to run the Nix build locally yourself?

I think I had it working before, but last time I tried, I aborted the build as it just seemed to take forever.

$ nix build --file ./release.nix linux-dhall

I just tried this from a fresh nix install but got a test failure in a Python dependency:

$ nix build --file ./release.nix linux-dhall
builder for '/nix/store/a455k2ald8l9qw8jbmyylk4gpjprchxi-python2.7-pyOpenSSL-18.0.0.drv' failed with exit code 1; last 10 log lines:
  
  /nix/store/44f3hqpyi391pq9srzkfdzjd9kzh3g4f-python2.7-pyOpenSSL-18.0.0/lib/python2.7/site-packages/OpenSSL/_util.py:54: Error
  ===Flaky Test Report===
  
  test_gmtime_adj_notBefore passed 1 out of the required 1 times. Success!
  test_gmtime_adj_notAfter passed 1 out of the required 1 times. Success!
  test_export_text passed 1 out of the required 1 times. Success!
  
  ===End Flaky Test Report===
  ===================== 3 failed, 493 passed in 7.30 seconds =====================
cannot build derivation '/nix/store/sckx9dxryn3h7v5h3g87sf8s1sp04193-python2.7-urllib3-1.23.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/k5xglp1ws70r21idgiilz789qxkr778s-python2.7-dulwich-0.19.3.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/30wndpj9qcdp91as2p5ngl8zjhb5lrqs-mercurial-4.5.2.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/fkbm6ia4m8ssbiy9f54dzm9nxgdg93xz-nix-prefetch-hg.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/kbr84lyin1g2z6pxv7z82xs13rqzkrim-nix-prefetch-scripts.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/0mrj34k1475q3qyn4kpbda7jbq1pz6qr-cabal2nix-2.9.3.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/g5ik75jrxrfl1ah04kfaz876qm8s8qz1-cabal2nix-dhall.drv': 1 dependencies couldn't be built
[1 built (1 failed), 0.0 MiB DL]
error: build of '/nix/store/g5ik75jrxrfl1ah04kfaz876qm8s8qz1-cabal2nix-dhall.drv' failed
(use '--show-trace' to show detailed location information)

The issue was reported in pyca/pyopenssl#791 and there's a merged fix for nixpkgs: NixOS/nixpkgs#46072.

@Gabriella439
Copy link
Collaborator

@sjakobi: Have you added cache.dhall-lang.org as a cache? You can do so by following these instructions:

https://github.com/dhall-lang/dhall-haskell#nix

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 11, 2020

@Gabriel439 My nix.conf looks like this now:

$ cat /etc/nix/nix.conf 
trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= cache.dhall-lang.org:I9/H18WHd60olG5GsIjolp7CtepSgJmM2CsO813VTmM= dhall.cachix.org-1:8laGciue2JBwD49ICFtg+cIF8ddDaW7OFBjDb/dHEAo=
substituters = https://cache.nixos.org https://cache.dhall-lang.org https://dhall.cachix.org

But the caches don't really seem to help. There's still a lot of rebuilding going on and in the end I get the same failure as above.

@Gabriella439
Copy link
Collaborator

@sjakobi: What version of Nix are you currently using?

To answer your original question, the relevant resource is the Nixpkgs manual: http://nixos.org/nixpkgs/manual/

I'm currently looking into this, but at the moment I'm not able to reproduce the mass rebuild you're seeing

@Gabriella439
Copy link
Collaborator

Gabriella439 commented Mar 12, 2020

@sjakobi: There is also a short-term workaround that we can use to get around the build constraints of the CI machine, which is to build on a machine that has more resources available (e.g. either of our own development machines) and copy the built closure to hydra.dhall-lang.org. Then it will remain in cache until we ever need to bump the aeson dependency

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 12, 2020

What version of Nix are you currently using?

$ nix --version
nix (Nix) 2.3.3

@Gabriella439
Copy link
Collaborator

@sjakobi: There's actually another potential workaround that may be more robust: drop support for GHC 8.0. This appears to be a problem that affects the older version of the compiler due to it being less efficient. Newer versions are building successfully.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 12, 2020

@Gabriel439 Oh, I had totally missed that the OOMs were with GHC 8.0! In that case let's indeed drop support for 8.0.

Should we add CI for GHC 8.2 instead, or simply drop support for 8.2 too? (A local test-install indicates that 8.2 requires much less memory to build aeson than 8.0. So it should be able to handle the aeson upgrade in this PR.)

@Gabriella439
Copy link
Collaborator

@sjakobi: Let's try replacing GHC 8.0 with GHC 8.2 and see if that works. If it's still a problem then we can try dropping it.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 14, 2020

So I managed to resolve my setup / caching issue by cleaning my nix install properly before starting afresh.

But nix build and nix-shell still fail with the same error:

$ nix build --file ./release.nix
builder for '/nix/store/5gijr315pd13gpqa4lkg2ncfpnfqpryx-dhall-1.30.0.drv' failed with exit code 1; last 10 log lines:
   but got: "*** Exception: "
            "\ESC[1;31mError\ESC[0m: Remote host not found"
            ""
            "URL: https://raw.githubusercontent.com/dhall-lang/dhall-haskell/18e4e9a18dc53271146df3ccf5b4177c3552236b/examples/True"
            ""
  
  Examples: 204  Tried: 198  Errors: 0  Failures: 1
  Test suite doctest: FAIL
  Test suite logged to: dist/test/dhall-1.30.0-doctest.log
  0 of 2 test suites (0 of 2 test cases) passed.
[0 built (1 failed), 0.0 MiB DL]
error: build of '/nix/store/1s8lxxyjg89dyf4bw1syv8qapqnk13qy-dhall.drv', '/nix/store/5gijr315pd13gpqa4lkg2ncfpnfqpryx-dhall-1.30.0.drv', '/nix/store/5y7fbw3v44w7m5kyk2pqf09anrdhjixq-dhall-json-1.6.2.drv', '/nix/store/6vmhcrdjavl2ffrl3aqxdg4pnp7c7xfc-image-dhall.drv', '/nix/store/7m2px3k7xwq349qy2jm2h7sq4ahvg38b-binary-tarball-1.0.28.drv', '/nix/store/aa7v9v84f0hqnpma4gxqs33qgbiqwkv6-image-dhall-lsp-server.drv', '/nix/store/bidwnm3wlj7bdqyp23q78sxll6cwz0wb-binary-tarball-1.1.12.drv', '/nix/store/by3jalv3j1sw784p7736m82972rviqhg-image-dhall-json.drv', '/nix/store/c733bcjj3a1z3qn02wqbygxvmy7s3zhf-image-dhall-bash.drv', '/nix/store/ds38bgmc41y7vgzjd5w9d093yrmci7zz-binary-tarball-1.6.2.drv', '/nix/store/f5q90i6216ppvdcf2nmsgj0r1ma4675a-image-dhall-yaml.drv', '/nix/store/lqjpzjmxmf0pl0jikn991srb24hs0l58-dhall-bash-1.0.28.drv', '/nix/store/m4psj3703bkd5bps1aib4vmgkqx5x4bc-binary-tarball-1.30.0.drv', '/nix/store/qz6sqv0pf6fz0mc6w5cp2v05zkf1g2z4-binary-tarball-1.0.5.drv', '/nix/store/s1l0a6bnyrd6xpgv39mm2xqcdspf3gw2-dhall-lsp-server-1.0.5.drv', '/nix/store/skigcfg8kmiszcr84rf5csildwyrnqgy-dhall-1.30.0.drv', '/nix/store/wbpnn5hg73spq0wjb5p6ksqndk75mh2z-image-dhall-nix.drv', '/nix/store/x1vlww1hyrwwr1ac79qss2q8azp9yh53-binary-tarball-1.0.2.drv', '/nix/store/z1465vgms7dg07050lyyblxvbvgzqdy6-dhall-yaml-1.0.2.drv', '/nix/store/zswpf8ir1vnahg5yy104ira2lca5s7rp-dhall-nix-1.1.12.drv' failed

What can I do about this?

EDIT: This is on master BTW.

@Gabriella439
Copy link
Collaborator

@sjakobi: I believe you need to add this to /etc/nix/nix.conf:

sandbox = false

@@ -208,13 +209,13 @@ Right x: y: x + y
dhallToNix :: Expr s Void -> Either CompileError (Fix NExprF)
dhallToNix e = loop (Dhall.Core.normalize e)
where
loop (Const _) = return (Fix (NSet []))
loop (Const _) = return (Fix (NSet NNonRecursive []))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding is that these empty sets are a bit like a default value for things that I can't translate. Should I make a name for this to avoid repetition? Something like

stub = Fix (NSet NNonRecursive [])

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, currently all types are translated to {} in Nix. You can define a reusable term to reduce repetition

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with untranslatable for the name.

@@ -218,7 +212,7 @@ let
);
};

overlayGHC802 = pkgsNew: pkgsOld: {
overlayGHC822 = pkgsNew: pkgsOld: {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually checked whether the overlay is still necessary in this form. It doesn't seem to hurt though…

release.nix Show resolved Hide resolved
release.nix Show resolved Hide resolved
@sjakobi sjakobi changed the title WIP: Use hnix-0.7 Use hnix-0.7 Mar 14, 2020
@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 14, 2020

CI is finally green. Could you review @Gabriel439? Many thanks for all your help so far! :)

sjakobi added a commit that referenced this pull request Mar 14, 2020
@@ -208,13 +209,13 @@ Right x: y: x + y
dhallToNix :: Expr s Void -> Either CompileError (Fix NExprF)
dhallToNix e = loop (Dhall.Core.normalize e)
where
loop (Const _) = return (Fix (NSet []))
loop (Const _) = return (Fix (NSet NNonRecursive []))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, currently all types are translated to {} in Nix. You can define a reusable term to reduce repetition

mergify bot pushed a commit that referenced this pull request Mar 15, 2020
@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 15, 2020

This time the compilation of Dhall.Test.QuickCheck OOM'd with GHC 8.2.2: https://hydra.dhall-lang.org/build/53834/nixlog/1

I'll try again.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Mar 15, 2020

Unfortunately Hydra caches the failure – it can't tell that the failure is due to the environment (unsufficient memory), not the code.

How do I tell Nix not to build and run dhall's tasty testsuite with GHC 8.2.2?

@Gabriella439
Copy link
Collaborator

@sjakobi: In this section:

https://github.com/dhall-lang/dhall-haskell/blob/sjakobi/hnix-0.7/nix/shared.nix#L222-L227

... you can add:

dhall = pkgsNew.haskell.lib.dontCheck haskellPackagesOld.dhall;

This also removes support for GHC 8.0.2 in the Nix utilities,
since recent aeson versions were OOM'ing in CI with 8.0.2.
Instead we add support for GHC 8.2.2.

This also remove the old and no longer used dhall-nix/.travis.yml.
@mergify mergify bot merged commit f715348 into master Mar 15, 2020
@mergify mergify bot deleted the sjakobi/hnix-0.7 branch March 15, 2020 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants