Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAS/LAPACK switching mechanism does not work #93977

Closed
arnoldfarkas opened this issue Jul 27, 2020 · 3 comments
Closed

BLAS/LAPACK switching mechanism does not work #93977

arnoldfarkas opened this issue Jul 27, 2020 · 3 comments
Labels
0.kind: bug Something is broken

Comments

@arnoldfarkas
Copy link
Contributor

Describe the bug
There is a feature introduced in #83888 to select BLAS/LAPACK implementation globally. According to the documentation (https://github.com/NixOS/nixpkgs/blob/master/doc/using/overlays.xml#L188), the implementation can be selected by using the following overlay:

self: super:
{
  blas = super.blas.override {
    blasProvider = self.mkl;
  }
  lapack = super.lapack.override {
    lapackProvider = self.mkl;
  }
}

When selected MKL as implementation, the default implementation is considered as MKL and linked for armadillo (did not check other BLAS/LAPACK user packages):

$ nix-shell -p armadillo

these derivations will be built:
  /nix/store/wfx9fn6dc0chf5lz5yr5s1ha2dxjz7lj-armadillo-9.900.1.drv
building '/nix/store/wfx9fn6dc0chf5lz5yr5s1ha2dxjz7lj-armadillo-9.900.1.drv'...
...
-- Found MKL libraries: /nix/store/7rj6swqgcs280j8g0hg24ggnm6skw46z-blas-3/lib/libmkl_rt.so
...
checking for references to /tmp/nix-build-armadillo-9.900.1.drv-0/ in /nix/store/v2zif8wfhh4dvh6y99qa7p34yhpkbbvg-armadillo-9.900.1...

[nix-shell:~/work/nixpkgs]$ ldd /nix/store/v2zif8wfhh4dvh6y99qa7p34yhpkbbvg-armadillo-9.900.1/lib/libarmadillo.so
        linux-vdso.so.1 (0x00007ffff7fd2000)
        libblas.so.3 => /nix/store/7rj6swqgcs280j8g0hg24ggnm6skw46z-blas-3/lib/libblas.so.3 (0x00007ffff77f2000)
        libhdf5.so.103 => /nix/store/fk34l3hyi3mnw077xmhx4scc65x6ap7a-hdf5-1.10.6/lib/libhdf5.so.103 (0x00007ffff742c000)
        libz.so.1 => /nix/store/5x6l9xm5dp6v113dpfv673qvhwjyb7p5-zlib-1.2.11/lib/libz.so.1 (0x00007ffff740f000)
        libdl.so.2 => /nix/store/bqbg6hb2jsl3kvf6jgmgfdqy06fpjrrn-glibc-2.30/lib/libdl.so.2 (0x00007ffff740a000)
        libsuperlu.so.5 => /nix/store/yp5nkx522v85sa5bj6ynrpi8jgdzhhh3-superlu-5.2.1/lib/libsuperlu.so.5 (0x00007ffff7385000)
        libstdc++.so.6 => /export/home/release/devtools/devtools-4fdcdea5/lib64/libstdc++.so.6 (0x00007ffff6fa9000)
        libm.so.6 => /nix/store/bqbg6hb2jsl3kvf6jgmgfdqy06fpjrrn-glibc-2.30/lib/libm.so.6 (0x00007ffff6e69000)
        libgcc_s.so.1 => /export/home/release/devtools/devtools-4fdcdea5/lib64/libgcc_s.so.1 (0x00007ffff6c51000)
        libc.so.6 => /nix/store/bqbg6hb2jsl3kvf6jgmgfdqy06fpjrrn-glibc-2.30/lib/libc.so.6 (0x00007ffff6a92000)
        /nix/store/bqbg6hb2jsl3kvf6jgmgfdqy06fpjrrn-glibc-2.30/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fd4000)
        libpthread.so.0 => /nix/store/bqbg6hb2jsl3kvf6jgmgfdqy06fpjrrn-glibc-2.30/lib/libpthread.so.0 (0x00007ffff6a71000)

Looks like blas package is guilty for this:

  ln -s $out/lib/libblas${canonicalExtension} $out/lib/libmkl_rt${stdenv.hostPlatform.extensions.sharedLibrary}

To Reproduce
Steps to reproduce the behavior:

  1. Set up overlay to reference MKL (make sure unfree packages are allowed)
self: super:
{
  blas = super.blas.override {
    blasProvider = self.mkl;
  }
  lapack = super.lapack.override {
    lapackProvider = self.mkl;
  }
}
  1. Build e.g. armadillo package (or spawn a nix-shell with that package)
$ nix-shell -p armadillo
  1. Identify that blas-3 is interpreted as MKL and that library is linked
$ ldd <armadillo out>/lib/libarmadillo.so

Expected behavior
The default BLAS/LAPACKimplementation should not hijack the desired libraries (or, in the implementation was selected in an incorrect way, the documentation could be updated)

Notify maintainers

@matthewbauer

Metadata

  • system: "x86_64-linux"
  • host os: Linux 4.4.110-1.el7.elrepo.x86_64, CentOS Linux, 7 (Core)
  • multi-user?: no
  • sandbox: no
  • version: nix-env (Nix) 2.3.2
  • channels(afarkas): "nixpkgs"
  • nixpkgs: /home/afarkas/work/nixpkgs
@arnoldfarkas arnoldfarkas added the 0.kind: bug Something is broken label Jul 27, 2020
@FRidh
Copy link
Member

FRidh commented Jul 28, 2020

I don't quite understand your issue. The new blas package corresponds to the requested blasProvider, where the libraries are named e.g. libblas.so.3 but are in fact a copy of the provider you asked for, so libopenblasp-r0.3.10.so in case of openblas. Is this not happening in case of MKL? Or is the issue that you cannot see from the file what blas it is?

Note I did notice that numpy depends on the openblas derivation (through lapack) as well as the blas wrapper derivation, which is bad for the closure size.

@arnoldfarkas
Copy link
Contributor Author

Acutally I just did a deep-dive into pkgs/build-support/alternatives/blas/default.nix and pkgs/build-support/alternatives/lapack/default.nix, and recognized that the derivation is picking up the lib from the provider so it might be good then.

When the package was built first, I got rather confused, becase in the configuration phase, libmkl_rt, libblas and liblapack are "found" (I must admit that first I did not even realize that the libmkl_rt was not from the MKL pacakge, just found it later):

-- Found MKL libraries: /nix/store/7rj6swqgcs280j8g0hg24ggnm6skw46z-blas-3/lib/libmkl_rt.so
-- Found BLAS: /nix/store/7rj6swqgcs280j8g0hg24ggnm6skw46z-blas-3/lib/libblas.so
-- Found LAPACK: /nix/store/rrdvhm1kq61pr0b462arwf0g0v4dllyi-lapack-3/lib/liblapack.so

And when I saw libblas.so linked instead of libmkl_rt.so, I was expecting that it is something different - even if I saw libmkl_rt is symlinked as libblas, etc, the size was different (I assume it is because of patchElf, which, to my understanding, is actually not allowed for MKL - or do I misunderstand something here?). Probably I also got confused by the name blas-3 since the netlib implementation has this as libname - so I was assuming that it is the netlib implementation (actually that is called blas-reference but this is again a bit confusing).

Unfortunately there are quite a few packages that we use which depend on blas/lapack and switching onto MKL caused seg faults and very weird observed behavioras well as the build logs of e.g. armadillo give me the false impression that it is not MKL that was linked.

So it looks like that there is no real issue with this, I wonder if it would be possible to make it more clear which implementation when a certain package has been built.

@FRidh
Copy link
Member

FRidh commented Aug 4, 2020

I agree it's confusing. In other scenarios it has been suggested to put a text file in e.g. nix-support describing how it was built, without using store paths. Alternatively, if there is a dev output, I suppose it makes it easier to find out from the filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

2 participants