Skip to content

Conference call notes 20210414

ocaisa edited this page Apr 14, 2021 · 12 revisions

(back to Conference calls)

Notes on the 170th EasyBuild conference call, Wednesday April 14th 2021 (08:00 UTC)

Attendees

Alphabetical list of attendees (9):

  • Mikael Öhman (Chalmers University of Technology, Sweden)
  • Alan O'Cais (Jülich Supercomputing Centre, Germany)
  • Sebastian Achilles (Jülich Supercomputing Centre, Germany)
  • Jörg Saßmannshausen (NIHR Biomedical Research Centre, UK)
  • Terje Kvernes (University of Oslo, Norway)
  • Kurt Lust (Univ. of Antwerp, Belgium + LUMI User Support Team)
  • Robert Mijakovic (LuxProvide)
  • Alex Domingo (Vrije Universiteit Brussel, Belgium)
  • Alexander Grund (TU Dresden, Germany)

Agenda

  • update on recent developments
  • 2021a update of common toolchains
    • outlook to component versions
    • BLAS/LAPACK component: OpenBLAS vs BLIS, maybe FlexiBLAS?
    • collapsing foss and fosscuda toolchains
  • bintray
  • Q&A

Recent developments

  • next release: a month or so since we just released last week
    • project for next release: (not created yet)
      • to maintainers: add issues/PRs you consider important there!
  • recent changes
    • framework
      • bug fixes
        • Catch problems early on if --github-user is not specified for --new-pr & co (PR #3644)
      • enhancements
        • Avoid module call for unuse() for Lmod and set $MODULEPATH directly (PR #3633)
          • Setting MODULEPATH directly is a lot faster
          • When using priority is not exactly the same
            • In another PR have removed the use of priority to further enable this where possible
          • Do the same for module unuse
        • update validate_github_token function to accept GitHub token in new format (PR #3632)
        • mention easyblocks PR in gist when uploading test report for it + fix clean_gists.py script (PR #3622)
        • add templates for architecture independent Python wheels (PR #3618)
      • changes
        • test bootstrap script in separate workflow, and limit test configurations a bit (PR #3646)
        • deprecate --accept-eula, rename to --accept-eula-for + also accept regular expression value (PR #3630)
    • easyblocks
      • bug fixes
        • fix permission on MATLAB installer config file so it can be written to (PR #2385)
        • Improve Python package version check and add unversioned_packages EC param (PR #2377)
        • Make the CUDA stub libs take preference over system libs when linking (PR #2373)
          • If you install CUDA 11 on a system with only CUDA 10 drivers, using stub libraries allows you to link for use on another system
          • Problem was link order meant that priority was given to system paths which lead to missing symbols
        • also set $TORCH_CUDA_ARCH_LIST for PyTorch tests (PR #2363)
      • enhancements
        • add support for post install commands for python extensions (PR #2381)
        • set $R_LIBS_SITE rather than $R_LIBS when installing R packages (PR #2326)
          • don't unpack *.whl files by default in generic PythonPackage easyblock (PR #2366)
        • enhance cuDNN, CUDA, and Java easyblocks to support aarch64 (PR #2356)
      • changes
        • (nothing major)
    • easyconfigs
      • over 50 merged easyconfig PRs since last conf call
      • bug fixes
        • (nothing major)
      • enhancements
        • (nothing major)
      • new software
      • noteworthy software updates
        • GCCcore 10.3.0
          • Will things break?...have to wait and see
          • PRs in preparation
          • the idea of automating version updates was raised
            • this has come up many times in the past but no-one has really tried to tackle it
        • Started work on OpenMPI 4.1.1 (probably foss/2021a), waiting on release
      • noteworthy changes
        • (none)
  • to merge/fix/tackle soon
    • framework
      • bug fixes
        • performance improvements for easyconfig parsing (PR #3555)
        • Re-enable write permissions when installing with read-only-installdir (PR #3649)
          • Only problem would be a group that are responsible for installations of things like Python/R, they would lose the ability to install additional modules
          • JSC had a case where a software package actually silently used pip to install additional packages, which caused the pip check to fail for any other package. This would have prevented that.
      • enhancements
        • support additional features in easystack files
          • support for filtering via labels (PR 3620)
        • Allow for overriding rpath with higher priority paths (PR 3650)
        • Lots of tweaks related to Lmod (PRs 3634, 3636, 3637)
      • changes
        • (nothing major)
    • easyblocks
      • bug fixes

        • treat files/directories of unpacked sources equally in PackedBinary (PR #2306)
      • enhancements

        • enhance CUDA support in CP2K easyblock (WIP) (PR #2349)
          • this could use a review
          • currently requires a single value in --cuda-compute-capabilities EasyBuild configuration option or cuda_compute_capabilities easyconfig parameter
          • do we need custom easyconfig parameters to easily enable/disable the different GPU capabilities supported by CP2K?
        • add Java wrapper support to OpenMPI (PR #2360)
          • still missing a matching easyconfig PR that leverages this?
        • enable installation of samples for CUDA > 10.1 (PR #2374)
      • changes

        • (nothing major)
      • new software

        • new easyblock for NCCL (built from source) (PR #2337)
        • custom easyblock for FlexiBLAS (PR #2369)
    • easyconfigs
      • bug fixes
        • (nothing major)
      • enhancements
        • (nothing major)
      • new software
      • software updates
        • PyTorch 1.8.0 (PR #12347)
          • PyTorch only test against MKL which is probably connected to failures in their testsuite
          • should probably be bumped to 1.8.1

2021a update of common toolchains

  • outlook to component versions
    • GCC 10.3 (ready to go?)
      • Merged
    • OpenMPI 4.1.1 (out soon)
      • Waiting for the release, but in progress
    • Intel oneAPI versions of compilers, MPI, MKL?
      • Still need to look at this, 2021.2 is out
      • What is the GCC compatability?
    • Python version?
      • 3.9 has issues (latest is 3.9.4), should we stick with 3.8? Might be worth some more experiments?
      • Problems will only really show up with complex builds like TensorFlow or PyTorch
        • Python 3.9 is marked for support in TF 2.5 (currently a release candidate)
        • Looks like support may be merged in PyTorch (in next release)
          • Next release of v2 and v1 will probably support this
  • BLAS/LAPACK component: OpenBLAS vs BLIS, maybe FlexiBLAS?
    • It's complicated, no clear answer
    • What is best depends on arch
      • Intel is MKL
      • AMD best with BLIS for BLAS 3, for LAPACK not so clear (especially with threads)
        • Idea would be to use FlexiBLAS to choose best bases
          • Not sure if this is realistic, it would also be moving target
          • Maybe you could write an auto-tuner
          • There are a few variables: size of matrix, and number of cores
          • Would need hooks to choose per function
          • Could lead to toolchains that look the same but are not the same under the hood
            • Could lead to bugs that are hard to resolve
          • If we want to give users the power to change libraries themselves, we need to teach them how to do that
          • We could go conservative and ship a static configuration and document how that can be tuned by the site
  • collapsing foss and fosscuda toolchains
    • see https://github.com/easybuilders/easybuild-easyconfigs/issues/12484
      • Will need to enhance MPI library, can't change UCX
    • OMPI_MCA_mca_component_path approach looks promising, maybe we should check with Jeff Squyres
      • Build UCX as normal, also UCX+CUDA (different name so we can have both at same time)
      • Need 2 OpenMPI builds, build additonal MCA components and set envvar to point to them in second OpenMPI build
        • Second build only enriches first installation
        • For a module hierarchy, the second build would sit in the tree of the first
    • Don't want to force CUDA on people
    • Keeping CUDA as a versionsuffix would allow us to bump a version of the software to an updated CUDA easily
  • HMNS should be updated to be aware of intel-compilers component
    • Pretty trivial, we should do this, need to open an issue
  • If people have stuff that must go in, add it to the project page or ask a maintainer to: Project - 2021a common toolchains

New download URLs for bintray

  • Tracking Issue #12099
    • Boost is the biggest one
    • They give some examples of what you need to do
    • Maybe we should be more aggressive in making copies of sources and making them available
      • If we know the licences we can automate this
      • Number of licences recognised by the EB parameter is very limited and there is no way to introduce a new one
        • What we have could be improved here
      • What happens if a licence changes?
      • Could have a CI job to check sources for us
        • Perhaps more suited to a regression test for releases
          • Might need some work to only
      • Should we have an easier way of making a ticket if a download fails?
        • Could give some advice on the command line on how to trigger this
        • Could we have a webhook?
        • Could have a callback, but that would need to be toggled for privacy

Q&A

  • Perl easyconfigs install the same package multiple times (PR #12575), PR opened to behave similar to R packages. Should cut down on installation times.
  • Feature request open to set group permissions also on build and temporary directories as well
  • For disaster recovery, can there be an automatic way to create an easystack file
    • This is hard, as you need the easyblock, the hooks, the EB version,...
    • The reprod directory of each install does store this information but there is no automated way of picking this information up and using it
    • Need to design your infra to make this possible
  • Running EB unit tests using Lmod 6 (available from Ubuntu/Debian) fails because Lmod 6 is deprecated
    • We should update the docs to reflect this
    • Some test failures can happen due to the language environment
      • If this is not in the test suite, we should fix that
  • AOCC needs to specify Clang version
    • Why is this necessary?
    • Detection mechanism does not work for 3.0 (on CentOS and EB 4.3.4)
      • There is a mapping in the easyblock
Clone this wiki locally