Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI for testing if EESSI stack is available is only checking single architecture #349

Closed
trz42 opened this issue Oct 3, 2023 · 3 comments · Fixed by #389
Closed

CI for testing if EESSI stack is available is only checking single architecture #349

trz42 opened this issue Oct 3, 2023 · 3 comments · Fixed by #389
Labels
bug Something isn't working tests Related to software testing

Comments

@trz42
Copy link
Collaborator

trz42 commented Oct 3, 2023

The CI defined in https://github.com/EESSI/software-layer/blob/2023.06/.github/workflows/test_eessi.yml is not working as intended. It should test if all packages defined by easystack files are available for all supported architectures. It seems it always runs the check for the software directory that is as close as possible to the CPU microarchitecture of the host running the CI script.

For example, for the EESSI pilot 2023.06, tests are run against the x86_64/intel/haswell builds regardless of the software installation to be checked for is for another x86_64 architecture or even for aarch64. See screenshot below.

It seems this is caused by setting the matrix variable EESSI_SOFTWARE_SUBDIR (and not EESSI_SOFTWARE_SUBDIR_OVERRIDE) plus that variable (EESSI_SOFTWARE_SUBDIR_OVERRIDE) not being set when the init script is sourced (source /cvmfs/pilot.eessi-hpc.org/versions/${{matrix.EESSI_VERSION}}/init/bash).

Relevant part of the test_eessi.yml file is

        - name: Test check_missing_installations.sh script
          run: |
              source /cvmfs/pilot.eessi-hpc.org/versions/${{matrix.EESSI_VERSION}}/init/bash
              module load EasyBuild
              eb --version
              export EESSI_PREFIX=/cvmfs/pilot.eessi-hpc.org/versions/${{matrix.EESSI_VERSION}}
              export EESSI_OS_TYPE=linux
              export EESSI_SOFTWARE_SUBDIR=${{matrix.EESSI_SOFTWARE_SUBDIR}}
              env | grep ^EESSI | sort
              echo "just run check_missing_installations.sh (should use eessi-${{matrix.EESSI_VERSION}}.yml)"
              ./check_missing_installations.sh ${{matrix.EASYSTACK_FILE}}

The result can be seen in the screenshot below.

Screenshot 2023-10-03 at 13 34 05

Marked rectangles are as follows

  1. Test should be run against the software directory for aarch64/generic.
  2. However archspec returns the best matching software directory on the CPU microarchitecture where the CI is run is intel/haswell. Then that is used to set up the environment.
  3. EESSI_SOFTWARE_SUBDIR is set according to the definition of the test (but it's the wrong variable and/or too late).
  4. Message shows that this test is run for aarch64/generic.
  5. However the log message from EasyBuild shows that the test is actually run for intel/haswell

Adding the line

              export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${{matrix.EESSI_SOFTWARE_SUBDIR}}

before the init script is sourced, fixes the issue. See changes applied in https://github.com/NorESSI/software-layer/pull/167/files#diff-39e6e5e8c8c229d5ef64936a450e3e0a162dba5c72427a25e6d9e918e0a7d699 and https://github.com/NorESSI/software-layer/actions/runs/6338385822/job/17215379514#logs for the results.

Apparently, the test then uses the compat layer for x86_64 (due to using name -m when setting the EESSI_CPU_FAMILY environment variable in https://github.com/EESSI/software-layer/blob/2023.06/init/minimal_eessi_env#L20), and the EasyBuild installation from the software layer (as defined in the matrix).

@boegel
Copy link
Contributor

boegel commented Oct 5, 2023

@trz42 With the additional output generated in CI by using eb --show-config (see #358), I'm now less sure that this bug report is correct, since I see /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/* being used as installpath by EasyBuild:

 >> Active EasyBuild configuration when checking for missing installations:
#
# Current EasyBuild configuration
# (C: command line argument, D: default value, E: environment variable, F: configuration file)
#
...
installpath          (E) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1
...
>> Checking for missing installations in /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1...

@trz42
Copy link
Collaborator Author

trz42 commented Oct 5, 2023

Could we make a test with the next ingest?

  1. First, approve a build for non-haswell architecture.
  2. Rerun the CI for that architecture to verify if it turns green.
  3. Approve build for haswell architecture.
  4. Rerun the CI for all failed CI jobs to verify if all turn green.

@trz42
Copy link
Collaborator Author

trz42 commented Oct 6, 2023

Could we make a test with the next ingest?

  1. First, approve a build for non-haswell architecture.
  2. Rerun the CI for that architecture to verify if it turns green.
  3. Approve build for haswell architecture.
  4. Rerun the CI for all failed CI jobs to verify if all turn green.

I essentially did that.

Then I adjusted the CI workflow so it runs two steps in each job: the original one without setting EESSI_SOFTWARE_SUBDIR_OVERRIDE before sourcing the init script and an updated one where the environment variable is set.

I think, the run at

https://github.com/NorESSI/software-layer/actions/runs/6431790645

illustrates the issue.

trz42 pushed a commit to trz42/software-layer that referenced this issue May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests Related to software testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants