-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI for testing if EESSI stack is available is only checking single architecture #349
Comments
@trz42 With the additional output generated in CI by using
|
Could we make a test with the next ingest?
|
I essentially did that. Then I adjusted the CI workflow so it runs two steps in each job: the original one without setting I think, the run at https://github.com/NorESSI/software-layer/actions/runs/6431790645 illustrates the issue.
|
{2023.06}[system] CUDA v12.1.1
The CI defined in https://github.com/EESSI/software-layer/blob/2023.06/.github/workflows/test_eessi.yml is not working as intended. It should test if all packages defined by easystack files are available for all supported architectures. It seems it always runs the check for the software directory that is as close as possible to the CPU microarchitecture of the host running the CI script.
For example, for the EESSI pilot 2023.06, tests are run against the
x86_64/intel/haswell
builds regardless of the software installation to be checked for is for anotherx86_64
architecture or even foraarch64
. See screenshot below.It seems this is caused by setting the
matrix
variableEESSI_SOFTWARE_SUBDIR
(and notEESSI_SOFTWARE_SUBDIR_OVERRIDE
) plus that variable (EESSI_SOFTWARE_SUBDIR_OVERRIDE
) not being set when the init script is sourced (source /cvmfs/pilot.eessi-hpc.org/versions/${{matrix.EESSI_VERSION}}/init/bash
).Relevant part of the
test_eessi.yml
file isThe result can be seen in the screenshot below.
Marked rectangles are as follows
aarch64/generic
.archspec
returns the best matching software directory on the CPU microarchitecture where the CI is run isintel/haswell
. Then that is used to set up the environment.EESSI_SOFTWARE_SUBDIR
is set according to the definition of the test (but it's the wrong variable and/or too late).aarch64/generic
.intel/haswell
Adding the line
export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${{matrix.EESSI_SOFTWARE_SUBDIR}}
before the init script is sourced, fixes the issue. See changes applied in https://github.com/NorESSI/software-layer/pull/167/files#diff-39e6e5e8c8c229d5ef64936a450e3e0a162dba5c72427a25e6d9e918e0a7d699 and https://github.com/NorESSI/software-layer/actions/runs/6338385822/job/17215379514#logs for the results.
Apparently, the test then uses the compat layer for
x86_64
(due to usingname -m
when setting theEESSI_CPU_FAMILY
environment variable in https://github.com/EESSI/software-layer/blob/2023.06/init/minimal_eessi_env#L20), and the EasyBuild installation from the software layer (as defined in the matrix).The text was updated successfully, but these errors were encountered: