Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Various updates in RADIUSS Spack Configs #307

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from

Conversation

adrienbernede
Copy link
Member

@adrienbernede adrienbernede commented Nov 26, 2024

Changes to RSC:

  • Spack environments split in separate files so that users can access packages.yaml and compilers.yaml directly
  • Update externals, add new externals to support building other projects (axom)
  • Make sure we do not reuse specs already installed (concretization is not influenced by what's already built)
  • Update rocm tioga toolchain

Changes to Shared CI:

  • Update with improved job command handling

Other changes:

  • Export host config as artifacts

@adrienbernede
Copy link
Member Author

@adayton1 @davidbeckingsale Any idea about what could be causing the error we see everywhere in the CI ?

@adayton1
Copy link
Member

adayton1 commented Jan 8, 2025

The below error should be fixed by setting CARE_ENABLE_IMPLICIT_CONVERSIONS and ENABLE_IMPLICIT_CONVERSIONS to OFF (or the corresponding variant in spack https://github.com/LLNL/radiuss-spack-configs/blob/main/packages/care/package.py#L75). Those options are removed in the develop branch of CHAI and CARE.

There was also an option in the last released version of CHAI, though I don't see it in the spack package: https://github.com/LLNL/CHAI/blob/v2024.07.0/cmake/SetupChaiOptions.cmake#L10. That option needs to be disabled. It is removed in the develop branch of CHAI.

/g/g17/dayton8/.jacamar-ci/builds/PxDL3V6B/000/gitlab/care/CARE/test/TestAlgorithm.cpp:1788:36: error: call to 'ArrayDup' is ambiguous
1788 | care::host_device_ptr dup = care::ArrayDup(from, size);
| ^~~~~~~~~~~~~~~~~~~
/g/g17/dayton8/.jacamar-ci/builds/PxDL3V6B/000/gitlab/care/CARE/src/care/algorithm_impl.h:1606:38: note: candidate function [with T = int]
1606 | CARE_INLINE care::host_device_ptr ArrayDup(care::host_device_ptr from, int len)
| ^
/g/g17/dayton8/.jacamar-ci/builds/PxDL3V6B/000/gitlab/care/CARE/src/care/algorithm_impl.h:1617:38: note: candidate function [with T = int]
1617 | CARE_INLINE care::host_device_ptr ArrayDup(const T* from, int len)
| ^

@adayton1
Copy link
Member

adayton1 commented Jan 8, 2025

I can't reproduce this error locally:

unknown file: Failure
C++ exception with description "! Umpire runtime_error [/dev/shm/tioga15-2334734/spack-user-cache/stage/dayton8/spack-stage-umpire-2024.07.0-h6ho4tozqksmvz4ca3ademytasnzyhw7/spack-src/src/umpire/ResourceManager.cpp:246]: Allocator 1970238309 not found. Available allocators: [HOST,2]

Is there a way to save build artifacts? It would be helpful to see the CMake cache files for CARE and it's dependencies to see if the build configuration matches what I'd expect.

Or can you walk me through doing the spack build manually?

@@ -19,7 +19,7 @@ variables:
# Note: We repeat the reservation, necessary when jobs are manually re-triggered.
RUBY_JOB_ALLOC: "--overlap --reservation=ci --nodes=1"
# Project specific variants for ruby
PROJECT_RUBY_VARIANTS: ""
PROJECT_RUBY_VARIANTS: "+tests"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth enabling the tests in a separate branch.

Copy link
Member Author

@adrienbernede adrienbernede Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @adayton1,
The errors we see here essentially comes from the fact I enabled the tests.
I agree it’s mixed with a lot of changes to the build configs, but not to the point to explain failures in all the specs.
So... yes I can revert that change, I just wouldn't like this to justify postponing addressing those bugs :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adayton1 I can confirm that by switching off tests I get the CI to pass.

… single compiler) + fix intel->oneapi missing migration
@adrienbernede
Copy link
Member Author

adrienbernede commented Jan 10, 2025

I can't reproduce this error locally:

unknown file: Failure C++ exception with description "! Umpire runtime_error [/dev/shm/tioga15-2334734/spack-user-cache/stage/dayton8/spack-stage-umpire-2024.07.0-h6ho4tozqksmvz4ca3ademytasnzyhw7/spack-src/src/umpire/ResourceManager.cpp:246]: Allocator 1970238309 not found. Available allocators: [HOST,2]

Is there a way to save build artifacts? It would be helpful to see the CMake cache files for CARE and it's dependencies to see if the build configuration matches what I'd expect.

Or can you walk me through doing the spack build manually?

Each CI job prints a reproducer early in the log. For the clang 14 job on ruby it is:

working_dir="/usr/workspace/${USER}/CARE/2338584-$(date +%s)" 
mkdir -p ${working_dir} && cd ${working_dir} 
git clone https://github.com/LLNL/CARE.git --single-branch --depth=1 
cd CARE 
git fetch origin --depth=1 7591ade8a0e5e982f09564c5ed90afcb7636dad6 
git checkout 7591ade8a0e5e982f09564c5ed90afcb7636dad6 
git submodule update --init --recursive 
# Required variables 
export MODULE_LIST="" 
export SPEC="+tests %clang@=14.0.6 ^umpire~c~shared ^raja~examples~exercises~tests ^chai~shared~examples " 
# Allow to set job script for debugging (only this differs from CI) 
export DEBUG_MODE=true 
srun --overlap --reservation=ci --nodes=1 ./scripts/gitlab/build_and_test.sh

Running those on ruby does reproduce the issue. Incidentaly, you will get to see the CMake cache file generated directly where the script leaves you (at the root of the CHAI clone). Here, it was named "ruby-toss_4_x86_64_ib-clang@14.0.6-riu3g64z.cmake".

Note: I did set the CI so it exports the .cmake file as artifact in GitLab UI. The problem is that artifacts are only exported on success... That's a shame. I haven't looked for a workaround, maybe there is one.

@adrienbernede
Copy link
Member Author

adrienbernede commented Jan 10, 2025

The below error should be fixed by setting CARE_ENABLE_IMPLICIT_CONVERSIONS and ENABLE_IMPLICIT_CONVERSIONS to OFF (or the corresponding variant in spack https://github.com/LLNL/radiuss-spack-configs/blob/main/packages/care/package.py#L75). Those options are removed in the develop branch of CHAI and CARE.

There was also an option in the last released version of CHAI, though I don't see it in the spack package: https://github.com/LLNL/CHAI/blob/v2024.07.0/cmake/SetupChaiOptions.cmake#L10. That option needs to be disabled. It is removed in the develop branch of CHAI.

/g/g17/dayton8/.jacamar-ci/builds/PxDL3V6B/000/gitlab/care/CARE/test/TestAlgorithm.cpp:1788:36: error: call to 'ArrayDup' is ambiguous 1788 | care::host_device_ptr dup = care::ArrayDup(from, size); | ^~~~~~~~~~~~~~~~~~~ /g/g17/dayton8/.jacamar-ci/builds/PxDL3V6B/000/gitlab/care/CARE/src/care/algorithm_impl.h:1606:38: note: candidate function [with T = int] 1606 | CARE_INLINE care::host_device_ptr ArrayDup(care::host_device_ptr from, int len) | ^ /g/g17/dayton8/.jacamar-ci/builds/PxDL3V6B/000/gitlab/care/CARE/src/care/algorithm_impl.h:1617:38: note: candidate function [with T = int] 1617 | CARE_INLINE care::host_device_ptr ArrayDup(const T* from, int len) | ^

The CI does build the develop version: it builds the local source, but using a cmake cache file generated by Spack. CARE_ENABLE_IMPLICIT_CONVERSIONS is indeed set to OFF in the .cmake file, (variant is off by default) but has no effect since that variable was remove from CARE source.

If that variable was removed, doesn't it mean that error should be solved already?

If options / variants should be added, I can do it. Note that I added CARE 0.14 release to the spack package. I'll work on pushing that upstream to Spack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants