Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.jaxlib-build: share fetch derivation between different build derivations #221390

Merged
merged 1 commit into from
Apr 10, 2023

Conversation

uri-canva
Copy link
Contributor

@uri-canva uri-canva commented Mar 15, 2023

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@uri-canva
Copy link
Contributor Author

@ofborg build python3Packages.jaxlib-build

@uri-canva
Copy link
Contributor Author

Found it: it uses go_sdk transitively, which we have to remove like we've already done in envoy and bazel watcher. This one's a bit more tricky because it's not a direct dependency but a transitive one.

@uri-canva
Copy link
Contributor Author

uri-canva commented Apr 3, 2023

Opened an issue to discuss what we can do in general for these kind of issues: #224446.

@uri-canva
Copy link
Contributor Author

Got it down to one last issue:

On darwin:

$ ln -s broken broken
$ ls -l broken
lrwxr-xr-x@ 1 uri  staff  6  5 Apr 23:11 broken@ -> broken

On linux:

$ ln -s broken broken
$ ls -l broken
lrwxrwxrwx 1 uri uri 6 Apr  5 13:11 broken -> broken

Ignore the different group and slightly different display of ls, the key difference is the mode is different, and that mode is what is stored in the archive. Luckily the solution is quite simple: on linux the mode of symlinks cannot be changed, but on darwin it can, so if we can make the mode the same on darwin it should all be good.

@samuela
Copy link
Member

samuela commented Apr 5, 2023

Do you have a sense of why it's failing in this PR, but not when I tested it in #219778? AFAIU I used the same code as you, although I did have to update the hash value.

@uri-canva
Copy link
Contributor Author

It's not? I can't find ofborg builds of the derivation on the different systems, it looks like it failed the eval?

@samuela
Copy link
Member

samuela commented Apr 6, 2023

It's not? I can't find ofborg builds of the derivation on the different systems, it looks like it failed the eval?

AFAICT it only failed eval for jaxlib-bin (#219778 (comment)). The source builds worked for me with cudaSupport = true and cudaSupport = false on x86_64-linux.

@uri-canva
Copy link
Contributor Author

Oh right sorry, the part of sharing it between different configuration works, it's the part of sharing it between linux and darwin that isn't working. Unfortunately since cuda doesn't work on darwin we can't have a single hash, we need either to split it by os, linux(cuda) / macos(non-cuda), or by whether cuda support is enabled. At least that's where I'm at now. I'm rebasing the PR shortly.

@uri-canva
Copy link
Contributor Author

Pulled out part of the fix here: #224917

@samuela
Copy link
Member

samuela commented Apr 6, 2023

Oh right sorry, the part of sharing it between different configuration works, it's the part of sharing it between linux and darwin that isn't working. Unfortunately since cuda doesn't work on darwin we can't have a single hash, we need either to split it by os, linux(cuda) / macos(non-cuda), or by whether cuda support is enabled. At least that's where I'm at now. I'm rebasing the PR shortly.

Ah gotcha, yeah unfortunately Darwin support is broken for the jaxlib source build atm. aarch64-darwin is not working and I don't think any of us have x86_64-darwin machines to test that build anymore :(

@uri-canva uri-canva force-pushed the uri/jax branch 2 times, most recently from 9cb1430 to 072eacd Compare April 7, 2023 06:22
@uri-canva
Copy link
Contributor Author

@ofborg build python3Packages.jaxlib-build

@uri-canva
Copy link
Contributor Author

@ofborg build python3Packages.jaxlib-build

}.${stdenv.system} or (throw "unsupported system ${stdenv.system}");
"sha256:0pzy18i2pj2hf5rav9x35r44c95ww64cmzgcd0g0ljdhhk5vhaz3"
else
"sha256-CyRfPfJc600M7VzR3/SQX/EAyeaXRJwDQWot5h2XnFU=";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hash matches all the systems on ofborg. Success!

"--config=avx_posix"
] ++ lib.optionals cudaSupport [
# ideally we'd add this unconditionally too, but it doesn't work on darwin
# we make this conditional instead of the system, so that the hash for both the cuda and the non-cuda deps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# we make this conditional instead of the system, so that the hash for both the cuda and the non-cuda deps
# we make this conditional on `cudaSupport` instead of the system, so that the hash for both the cuda and the non-cuda deps

@samuela
Copy link
Member

samuela commented Apr 10, 2023

Result of nixpkgs-review pr 221390 run on x86_64-linux 1

18 packages marked as broken and skipped:
  • python310Packages.dalle-mini
  • python310Packages.dalle-mini.dist
  • python310Packages.distrax
  • python310Packages.distrax.dist
  • python310Packages.dm-sonnet
  • python310Packages.dm-sonnet.dist
  • python310Packages.elegy
  • python310Packages.elegy.dist
  • python310Packages.flax
  • python310Packages.flax.dist
  • python310Packages.rlax
  • python310Packages.rlax.dist
  • python310Packages.tensorflow-datasets
  • python310Packages.tensorflow-datasets.dist
  • python310Packages.treex
  • python310Packages.treex.dist
  • python310Packages.vqgan-jax
  • python310Packages.vqgan-jax.dist
25 packages failed to build:
  • python310Packages.arviz
  • python310Packages.arviz.dist
  • python310Packages.bambi
  • python310Packages.bambi.dist
  • python310Packages.blackjax
  • python310Packages.blackjax.dist
  • python310Packages.jaxopt
  • python310Packages.jaxopt.dist
  • python310Packages.numpyro
  • python310Packages.numpyro.dist
  • python310Packages.pymc
  • python310Packages.pymc.dist
  • python310Packages.pytensor
  • python310Packages.pytensor.dist
  • python311Packages.blackjax
  • python311Packages.blackjax.dist
  • python311Packages.dm-haiku
  • python311Packages.dm-haiku.dist
  • python311Packages.dm-haiku.testsout
  • python311Packages.jaxopt
  • python311Packages.jaxopt.dist
  • python311Packages.jmp
  • python311Packages.jmp.dist
  • python311Packages.numpyro
  • python311Packages.numpyro.dist
41 packages built:
  • python310Packages.aeppl
  • python310Packages.aeppl.dist
  • python310Packages.aesara
  • python310Packages.aesara.dist
  • python310Packages.augmax
  • python310Packages.augmax.dist
  • python310Packages.chex
  • python310Packages.chex.dist
  • python310Packages.dm-haiku
  • python310Packages.dm-haiku.dist
  • python310Packages.dm-haiku.testsout
  • python310Packages.jax
  • python310Packages.jax.dist
  • python310Packages.jaxlib (python310Packages.jaxlib-build ,python310Packages.jaxlibWithoutCuda)
  • python310Packages.jaxlib.dist (python310Packages.jaxlib-build.dist ,python310Packages.jaxlibWithoutCuda.dist)
  • python310Packages.jaxlibWithCuda
  • python310Packages.jaxlibWithCuda.dist
  • python310Packages.jmp
  • python310Packages.jmp.dist
  • python310Packages.objax
  • python310Packages.objax.dist
  • python310Packages.optax
  • python310Packages.optax.dist
  • python310Packages.optax.testsout
  • python310Packages.treeo
  • python310Packages.treeo.dist
  • python311Packages.augmax
  • python311Packages.augmax.dist
  • python311Packages.chex
  • python311Packages.chex.dist
  • python311Packages.jax
  • python311Packages.jax.dist
  • python311Packages.jaxlib (python311Packages.jaxlib-build ,python311Packages.jaxlibWithoutCuda)
  • python311Packages.jaxlib.dist (python311Packages.jaxlib-build.dist ,python311Packages.jaxlibWithoutCuda.dist)
  • python311Packages.jaxlibWithCuda
  • python311Packages.jaxlibWithCuda.dist
  • python311Packages.optax
  • python311Packages.optax.dist
  • python311Packages.optax.testsout
  • python311Packages.treeo
  • python311Packages.treeo.dist

@samuela
Copy link
Member

samuela commented Apr 10, 2023

Errors appear to be unrelated:

error: builder for '/nix/store/2kxa419f2dqajmcig8x3yw89h4clq9hm-python3.10-numpyro-0.11.0.drv' failed with exit code 1;
       last 10 log lines:
       > adding 'numpyro-0.11.0.dist-info/RECORD'
       > removing build/bdist.linux-x86_64/wheel
       > Finished executing setuptoolsBuildPhase
       > installing
       > Executing pipInstallPhase
       > /build/numpyro-0.11.0/dist /build/numpyro-0.11.0
       > Processing ./numpyro-0.11.0-py3-none-any.whl
       > ERROR: Could not find a version that satisfies the requirement jaxlib>=0.4 (from numpyro) (from versions: none)
       > ERROR: No matching distribution found for jaxlib>=0.4
       >
       For full logs, run 'nix log /nix/store/2kxa419f2dqajmcig8x3yw89h4clq9hm-python3.10-numpyro-0.11.0.drv'.
error: 1 dependencies of derivation '/nix/store/if8fdiykkkfwznlh4qd3jdza1r72a88i-python3.10-arviz-0.15.0.drv' failed to build
error: builder for '/nix/store/ymk3i85v7s1a2000z2i64yp4hms6b3nb-python3.11-numpyro-0.11.0.drv' failed with exit code 1;
       last 10 log lines:
       > removing build/bdist.linux-x86_64/wheel
       > Finished executing setuptoolsBuildPhase
       > installing
       > Executing pipInstallPhase
       > /build/numpyro-0.11.0/dist /build/numpyro-0.11.0
       > Processing ./numpyro-0.11.0-py3-none-any.whl
       > Requirement already satisfied: jax>=0.4 in /nix/store/4liq7fd7w8sq7z6h5qjcnxg6nl9q1cxh-python3.11-jax-0.4.1/lib/python3.11/site-packages (from numpyro==0.11.0) (0.4.1)
       > ERROR: Could not find a version that satisfies the requirement jaxlib>=0.4 (from numpyro) (from versions: none)
       > ERROR: No matching distribution found for jaxlib>=0.4
       >
       For full logs, run 'nix log /nix/store/ymk3i85v7s1a2000z2i64yp4hms6b3nb-python3.11-numpyro-0.11.0.drv'.
error: builder for '/nix/store/gp49caxnzj4gkyz2j07g10rx44y6909s-python3.11-jmp-unstable-2021-10-03.drv' failed with exit code 1;
       last 10 log lines:
       >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       >   File "/nix/store/3hcvx2ac8ls2b1zmf28697d4iksgn90s-python3-3.11.2/lib/python3.11/dataclasses.py", line 1210, in wrap
       >     return _process_class(cls, init, repr, eq, order, unsafe_hash,
       >            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       >   File "/nix/store/3hcvx2ac8ls2b1zmf28697d4iksgn90s-python3-3.11.2/lib/python3.11/dataclasses.py", line 958, in _process_class
       >     cls_fields.append(_get_field(cls, name, type, kw_only))
       >                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       >   File "/nix/store/3hcvx2ac8ls2b1zmf28697d4iksgn90s-python3-3.11.2/lib/python3.11/dataclasses.py", line 815, in _get_field
       >     raise ValueError(f'mutable default {type(f.default)} for field '
       > ValueError: mutable default <class 'numpy.ndarray'> for field counter is not allowed: use default_factory
       For full logs, run 'nix log /nix/store/gp49caxnzj4gkyz2j07g10rx44y6909s-python3.11-jmp-unstable-2021-10-03.drv'.
error: 1 dependencies of derivation '/nix/store/vhmi5m95khp33wlbsihfzbpx4g79sjfl-python3.11-dm-haiku-0.0.9.drv' failed to build
error: builder for '/nix/store/nf3lda0jyb1i6nk0gprj4jzxgyz49mx0-python3.10-pytensor-2.10.1.drv' failed with exit code 1;
       last 10 log lines:
       >
       > -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
       > =========================== short test summary info ============================
       > ERROR tests/link/numba/test_elemwise.py::test_fused_elemwise_benchmark
       > = 1334 passed, 39 skipped, 14 deselected, 17 xfailed, 2 xpassed, 246 warnings, 1 error in 1295.55s (0:21:35) =
       > 0.7123582192147246
       > 0.7123582192147246
       > 0.7123582192147246
       > 0.7123582192147246
       > /nix/store/sw36plhp82916wwg6i6097rkzza7d950-stdenv-linux/setup: line 1594: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/nf3lda0jyb1i6nk0gprj4jzxgyz49mx0-python3.10-pytensor-2.10.1.drv'.
error: 2 dependencies of derivation '/nix/store/ds0w79shb4wjvkbm8glqh7ga15dgp917-python3.10-pymc-5.0.2.drv' failed to build
error: builder for '/nix/store/d9sphr454bjs30b015r8pkpbldpdj9cp-python3.10-cvxpy-1.3.0.drv' failed with exit code 1;
       last 10 log lines:
       > 11  +7.389e+00  +7.389e+00  +4e-07  3e-06  6e-07  2e-05  1e-07  0.7833  9e-03   1  0  0 |  1  1
       > 12  +7.389e+00  +7.389e+00  +1e-07  7e-07  1e-07  5e-06  3e-08  0.7814  1e-02   1  0  0 |  1  1
       > 13  +7.389e+00  +7.389e+00  +2e-08  2e-07  3e-08  1e-06  6e-09  0.7833  5e-02   1  0  0 |  2  1
       > 14  +7.389e+00  +7.389e+00  +6e-09  4e-08  7e-09  3e-07  1e-09  0.7833  1e-04   0  0  0 |  0  1
       > 15  +7.389e+00  +7.389e+00  +1e-09  9e-09  2e-09  6e-08  3e-10  0.7833  9e-03   1  0  0 |  1  1
       >
       > OPTIMAL (within feastol=8.9e-09, reltol=1.7e-10, abstol=1.2e-09).
       > Runtime: 0.000135 seconds.
       >
       > /nix/store/sw36plhp82916wwg6i6097rkzza7d950-stdenv-linux/setup: line 1594: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/d9sphr454bjs30b015r8pkpbldpdj9cp-python3.10-cvxpy-1.3.0.drv'.
error: 1 dependencies of derivation '/nix/store/gf57nhj6nvd695zi3rf9w5kingswicmz-python3.10-jaxopt-0.5.5.drv' failed to build
error: 1 dependencies of derivation '/nix/store/vjrjbp99d30b57zgicqa2xhag22hb9bg-python3.10-blackjax-0.9.6.drv' failed to build
error: 4 dependencies of derivation '/nix/store/ph3dxi015vigq9v85sg3n13wsz8s1gbh-python3.10-bambi-0.10.0.drv' failed to build
error: builder for '/nix/store/p22wcrlz7v9dp0vmsvki3gra6sz1fyjg-python3.11-cvxpy-1.3.0.drv' failed with exit code 1;
       last 10 log lines:
       > 11  +7.389e+00  +7.389e+00  +4e-07  3e-06  6e-07  2e-05  1e-07  0.7833  9e-03   1  0  0 |  1  1
       > 12  +7.389e+00  +7.389e+00  +1e-07  7e-07  1e-07  5e-06  3e-08  0.7814  1e-02   1  0  0 |  1  1
       > 13  +7.389e+00  +7.389e+00  +2e-08  2e-07  3e-08  1e-06  6e-09  0.7833  5e-02   1  0  0 |  2  1
       > 14  +7.389e+00  +7.389e+00  +6e-09  4e-08  7e-09  3e-07  1e-09  0.7833  1e-04   0  0  0 |  0  1
       > 15  +7.389e+00  +7.389e+00  +1e-09  9e-09  2e-09  6e-08  3e-10  0.7833  9e-03   1  0  0 |  1  1
       >
       > OPTIMAL (within feastol=8.9e-09, reltol=1.7e-10, abstol=1.2e-09).
       > Runtime: 0.000353 seconds.
       >
       > /nix/store/sw36plhp82916wwg6i6097rkzza7d950-stdenv-linux/setup: line 1594: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/p22wcrlz7v9dp0vmsvki3gra6sz1fyjg-python3.11-cvxpy-1.3.0.drv'.
error: 1 dependencies of derivation '/nix/store/inivgn4vjh08ms0bwda32x593mc9dc9i-python3.11-jaxopt-0.5.5.drv' failed to build
error: 1 dependencies of derivation '/nix/store/y03pnl4dwqdrpc9nshgcjh42za96cac0-python3.11-blackjax-0.9.6.drv' failed to build
error: 12 dependencies of derivation '/nix/store/hfgfwq1lvm801pgyjdsphmlbsh7cibmi-env.drv' failed to build
error: 1 dependencies of derivation '/nix/store/bwmi4zplg5kg69gfwib1iqfdggs1h2gy-review-shell.drv' failed to build

@samuela
Copy link
Member

samuela commented Apr 10, 2023

shockingly, python3Packages.jax is now building fine..... i expected it to fail due to #221739

@samuela samuela merged commit 2d79f0c into master Apr 10, 2023
@samuela samuela deleted the uri/jax branch April 10, 2023 23:14
@samuela
Copy link
Member

samuela commented Apr 10, 2023

Thanks for putting this together @uri-canva ! This change will def improve the DX when working with the jaxlib derivation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants