Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Memory Selection to SST, and prototype a mechanism for running st… #3823

Merged
merged 3 commits into from
Sep 27, 2023

Conversation

eisenhauer
Copy link
Member

…andard BP file tests in staging-common with SST

@scottwittenburg
Copy link
Collaborator

scottwittenburg commented Sep 26, 2023

@eisenhauer Here is what I use to reproduce github actions builds locally. Please feel free to reach out if you run into trouble.

Steps

Make a root directory where you'll put the source trees and do the builds, this will be mounted as /builds when you run docker.

mkdir <path-to-some-working-dir>

Check out the code you want to test

Source is checked out twice in ci, once from your PR changes, once from master. Most of the time, I just check out the same branch twice.

cd <path-to-some-working-dir>
git clone <path-to-your-adios2-source> gha
git clone <path-to-your-adios2-source> source

Run the docker container

Run the container where the name matches the compiler you want to test. Note that images for the different gcc/clang compilers are based on the same underlying image where all three spack environments are available. So using the gcc10 image, you could run serial, ompi, or mpich tests.

docker run --rm -v <path-to-some-working-dir>:/builds -ti adios2:ci-spack-ubuntu20.04-gcc10

Once you're inside the container with the compiler you want to test, you have to set a bunch of variables as github actions would do based on the yaml.

Set the parallel value

Pick one of these based on what you want to test:

export GH_YML_MATRIX_PARALLEL=serial
export GH_YML_MATRIX_PARALLEL=mpi
export GH_YML_MATRIX_PARALLEL=mpich

Set the compiler

Here, you should choose from these, and make sure to match the container you're running:

export GH_YML_MATRIX_COMPILER=gcc8
export GH_YML_MATRIX_COMPILER=gcc9
export GH_YML_MATRIX_COMPILER=gcc10
export GH_YML_MATRIX_COMPILER=gcc11
export GH_YML_MATRIX_COMPILER=clang6
export GH_YML_MATRIX_COMPILER=clang10
export GH_YML_MATRIX_COMPILER=oneapi
export GH_YML_MATRIX_COMPILER=icc

Set a branch name

This build/test will get reported to cdash, so pick a branch name that can help you identify what you were testing when you look at the results in cdash, e.g.:

export GITHUB_REF_NAME="mpich_perf_testing_ch4_ofi"

Do the build/test

The remaining steps set a bunch of variables, and eventually do the configure/build/test steps as they would be done in gha:

cd /builds
export GITHUB_WORKSPACE=/builds
export GITHUB_PATH=/dev/null
export GITHUB_JOB=test
export GITHUB_EVENT_NAME="test_pull_request"
export GH_YML_BASE_OS=ubuntu
export RUNNER_TEMP="/"
export GH_YML_MATRIX_OS=ubuntu20.04
export GH_YML_JOBNAME=${GH_YML_MATRIX_OS}-${GH_YML_MATRIX_COMPILER}-${GH_YML_MATRIX_PARALLEL}
gha/scripts/ci/gh-actions/linux-setup.sh
cp /.local/bin/ninja /usr/bin/ninja
gha/scripts/ci/gh-actions/run.sh update
gha/scripts/ci/gh-actions/run.sh configure
gha/scripts/ci/gh-actions/run.sh build
gha/scripts/ci/gh-actions/run.sh test

@eisenhauer
Copy link
Member Author

@scottwittenburg So, we've sorted the problem and I'm going to merge this PR, but the nature of the problem maybe points to other issues. When we have an MPI-enabled build, we build both serial and mpi versions of the ADIOS library (and of many of the tests in testing/engines/bp). However, we only ever build one version of the SST runtime (the bulk of the SST engine), which is its own library largely for historical reasons. So even if we have a serial application, built and linked with a serial version of the ADIOS library, SST doesn't know that. Specifically, in this circumstance SST is built with and depends upon MPI, and it also considers @vicentebolea 's MPI data plane a viable possibility. The deadlock I was seeing was happening because the MPI dataplane was trying to initialize MPI deep in the data transport, having noticed that it hadn't been initialized at the application level (this didn't go well). The fix for this PR was to always use the MPI version of the test if it had been built, even if we were only doing a 1 to 1 SST test.

But I'm wondering if this is sufficient. Because ADIOS always depends upon SST (unless disabled) and SST always depends upon MPI if it's present, building a non-MPI version of the higher level ADIOS library to avoid an MPI dependency seems moot because it'll inherit that from SST. Should we be building a serial and mpi version of SST also? Is it time to abandon the SST-is-a-separate-library thing (no good reason to keep it that way)? Or is there some advantage in the current situation because a one-rank application might use the MPI data plane to connect to an MPI application? (If so, we should test this and make sure it works rather than deadlocking on some platforms.). Anyhow, something to discuss, perhaps when @vicentebolea is back. I believe that Chuck did a lot of the work behind building Serial and MPI versions of ADIOS, and I'm not sure of the reasoning behind not extending that to SST (and the MPI data plane was not a wrinkle that existed at the time).

@caitlinross
Copy link
Collaborator

@vicentebolea this is the PR we're talking about

vicentebolea pushed a commit to vicentebolea/ADIOS2 that referenced this pull request Oct 24, 2023
Add Memory Selection to SST, and prototype a mechanism for running st…

(cherry picked from commit c503940)
vicentebolea pushed a commit to vicentebolea/ADIOS2 that referenced this pull request Oct 24, 2023
Add Memory Selection to SST, and prototype a mechanism for running st…

(cherry picked from commit c503940)
vicentebolea added a commit that referenced this pull request Oct 30, 2023
Merge pull request #3823 from eisenhauer/SstMemSel
vicentebolea added a commit that referenced this pull request Nov 1, 2023
* release_29: (29 commits)
  Bump version to v2.9.2
  ci: update number of task for mpich build
  clang-format: Correct format to old style
  Merge pull request #3878 from anagainaru/test-null-blocks
  Merge pull request #3588 from vicentebolea/fix-mpi-dp
  bp5: make RecMap an static anon namespaced var
  Replace LookupWriterRec's linear search on RecList with an unordered_map. For 250k variables, time goes from 21sec to ~1sec in WSL. The order of entries in RecList was not necessary for the serializer to work correctly. (#3877)
  Fix data length calculation for hash (#3875)
  Merge pull request #3823 from eisenhauer/SstMemSel
  gha,ci: update checkout to v4
  Blosc2 USE ON: Fix Module Fallback
  cmake: correct prefer_shared_blosc behavior
  cmake: correct info.h installation path
  ci: disable MGARD static build
  operators: fix module library
  ci: add downloads readthedocs
  cmake: Add Blosc2 2.10.1 compatibility.
  Fix destdir install test (#3850)
  cmake: update minimum cmake to 3.12 (#3849)
  MPI: add timeout for conf test for MPI_DP (#3848)
  ...
pnorbert added a commit to pnorbert/ADIOS2 that referenced this pull request Nov 20, 2023
* master: (126 commits)
  ReadMe.md: Mention 2.9.2 release
  Cleanup server output a bit (ornladios#3914)
  ci: set openmpi and openmp params
  Example using Kokkos buffers with SST
  Changes to MallocV to take into consideration the memory space of a variable
  Change install directory of Gray scott files again
  ci,crusher: increase supported num branches
  ci: add shellcheck coverage to source and testing
  Change install directory of Gray scott files
  Only rank 0 should print the initialization message in perfstub
  Defining and computing derived variables (ornladios#3816)
  Add Remote "-status" command to see if a server is running and where (ornladios#3911)
  examples,hip: use find_package(hip) once in proj
  Add Steps Tutorial
  Add Operators Tutorial
  Add Attributes Tutorial
  Add Variables Tutorial
  Add Hello World Tutorial
  Add Tutorials' Download and Build section
  Add Tutorials' Overview section
  Improve bpStepsWriteRead* examples
  Rename bpSZ to bpOperatorSZWriter
  Convert bpAttributeWriter to bpAttributeWriteRead
  Improve bpWriter/bpReader examples
  Close file after reading for hello-world.py
  Fix names of functions in engine
  Fix formatting warnings
  Add dataspaces.rst in the list of engines
  Add query.rst
  cmake: find threads package first
  docs: update new_release.md
  Bump version to v2.9.2
  ci: update number of task for mpich build
  clang-format: Correct format to old style
  Merge pull request ornladios#3878 from anagainaru/test-null-blocks
  Merge pull request ornladios#3588 from vicentebolea/fix-mpi-dp
  Adding tests for writing null blocks with and without compression
  bp5: make RecMap an static anon namespaced var
  Replace LookupWriterRec's linear search on RecList with an unordered_map. For 250k variables, time goes from 21sec to ~1sec in WSL. The order of entries in RecList was not necessary for the serializer to work correctly.
  Replace LookupWriterRec's linear search on RecList with an unordered_map. For 250k variables, time goes from 21sec to ~1sec in WSL. The order of entries in RecList was not necessary for the serializer to work correctly. (ornladios#3877)
  Fix data length calculation for hash (ornladios#3875)
  Merge pull request ornladios#3823 from eisenhauer/SstMemSel
  Merge pull request ornladios#3805 from pnorbert/fix-bpls-string-scalar
  Merge pull request ornladios#3804 from pnorbert/fix-aws-version
  Merge pull request ornladios#3759 from pnorbert/bp5dbg-metadata
  new attempt to commit query support of local array. (ornladios#3868)
  MPI::MPI_Fortran should be INTERFACE not PUBLIC
  Fix hip example compilation error (ornladios#3865)
  Server Improvements (ornladios#3862)
  ascent,ci: remove unshallow flag
  Remove Slack as a contact mechanism (ornladios#3866)
  bug fix:  syntax error in json  output (ornladios#3857)
  Update the bpWriterReadHip example's cmake to run on crusher
  Examples: Use BPFile instead of BP3/4/5 for future-proof
  inlineMWE example: Close files at the end
  Examples: Add BeginStep/EndStep wherever it was missing
  BP5Serializer: handle local variables that use operators (ornladios#3859)
  gha,ci: update checkout to v4
  Blosc2 USE ON: Fix Module Fallback
  cmake: correct prefer_shared_blosc behavior
  cmake: correct info.h installation path
  ci: disable MGARD static build
  operators: fix module library
  ci: add downloads readthedocs
  cmake: Add Blosc2 2.10.1 compatibility.
  Blosc2 USE ON: Fix Module Fallback (ornladios#3774)
  Fix destdir install test (ornladios#3850)
  cmake: update minimum cmake to 3.12 (ornladios#3849)
  MPI: add timeout for conf test for MPI_DP (ornladios#3848)
  MPI_DP: do not call MPI_Init (ornladios#3847)
  install: export adios2 device variables (ornladios#3819)
  Merge pull request ornladios#3799 from vicentebolea/support-new-yaml-cpp
  Merge pull request ornladios#3737 from vicentebolea/fix-evpath-plugins-path
  SST,MPI,DP: soft handle peer error
  SST,MPI,DP: improve uniq identifier
  Fix destdir install test (ornladios#3850)
  cmake: include ctest before detectoptions
  ci: enable tau check
  Add/Improve the ReadMe.md files in examples directory
  Disable BUILD_TESTING and ADIOS2_BUILD_EXAMPLES by default
  Remove testing based on ADIOS2-examples
  Fix formatting issue in DetectOptions.cmake
  Add examples from ADIOS2-Examples
  Improve existing examples
  MPI_DP: do not call MPI_Init (ornladios#3847)
  cmake: update minimum cmake to 3.12 (ornladios#3849)
  MPI: add timeout for conf test for MPI_DP (ornladios#3848)
  Tweak Remote class and test multi-threaded file remote access (ornladios#3834)
  Add prototype testing of remote functionality (ornladios#3830)
  Try always using the MPI version
  Try always using the MPI version
  Import tests from bp to staging common, implement memory selection in SST
  ci: fix codeql ignore path (ornladios#3772)
  install: export adios2 device variables (ornladios#3819)
  added support to query BP5 files (ornladios#3809)
  Partial FFS Upstream, only changes to type_id
  ffs 2023-09-19 (67e411c0)
  Fix abs/rel step in BP5 DoCount
  fix dummy Win build
  Pass Array Order of reader to remote server for proper Get() operation
  ...
dmitry-ganyushin added a commit to dmitry-ganyushin/ADIOS2 that referenced this pull request Dec 7, 2023
* master:
  Update readme for heat transfer example with new location and build instructions
  Ignore tests with defects for now
  Adapt libfabric dataplane of SST to Cray CXI provider (ornladios#3672)
  ci: fix path to lsan suppressions, fix broken gh status post
  Use adios2_mode_readRandomAccess in matlab open to make it work for BP5 (ornladios#3956)
  Add Global Array Capabilities and Limitations
  Add Section for Anatomy of an ADIOS Program
  Enable Shell-Check for gh-actions scripts
  Enable Shell-Check for circle CI scripts
  Enable Shell-Check for tau contract scripts
  Enable Shell-Check for scorpio contract scripts
  Enable Shell-Check for lammps contract scripts
  Delete VTK code in examples
  Fix MATLAB bindings for MacOS (ornladios#3950)
  Set the compiler for the Kokkos DataMan example to what is used to build Kokkos
  Fix the HIP architecture CMAKE variable (ornladios#3931)
  perfstubs 2023-11-27 (845d0702) (ornladios#3944)
  Revert "Only rank 0 should print the initialization message in perfstub"
  CI Contract: Build examples with external ADIOS
  Example using DataMan with Kokkos buffers
  Propagating the GPU logic inside the DataMan engine
  ci: Use mpich built with ch3:sock:tp for faster tests
  ReadMe.md: Mention 2.9.2 release
  Cleanup server output a bit (ornladios#3914)
  ci: set openmpi and openmp params
  Example using Kokkos buffers with SST
  Changes to MallocV to take into consideration the memory space of a variable
  Change install directory of Gray scott files again
  ci,crusher: increase supported num branches
  ci: add shellcheck coverage to source and testing
  Change install directory of Gray scott files
  Only rank 0 should print the initialization message in perfstub
  Defining and computing derived variables (ornladios#3816)
  Add Remote "-status" command to see if a server is running and where (ornladios#3911)
  examples,hip: use find_package(hip) once in proj
  Add Steps Tutorial
  Add Operators Tutorial
  Add Attributes Tutorial
  Add Variables Tutorial
  Add Hello World Tutorial
  Add Tutorials' Download and Build section
  Add Tutorials' Overview section
  Improve bpStepsWriteRead* examples
  Rename bpSZ to bpOperatorSZWriter
  Convert bpAttributeWriter to bpAttributeWriteRead
  Improve bpWriter/bpReader examples
  Close file after reading for hello-world.py
  Fix names of functions in engine
  Fix formatting warnings
  Add dataspaces.rst in the list of engines
  Add query.rst
  cmake: find threads package first
  docs: update new_release.md
  Bump version to v2.9.2
  ci: update number of task for mpich build
  clang-format: Correct format to old style
  Merge pull request ornladios#3878 from anagainaru/test-null-blocks
  Merge pull request ornladios#3588 from vicentebolea/fix-mpi-dp
  bp5: make RecMap an static anon namespaced var
  Replace LookupWriterRec's linear search on RecList with an unordered_map. For 250k variables, time goes from 21sec to ~1sec in WSL. The order of entries in RecList was not necessary for the serializer to work correctly. (ornladios#3877)
  Fix data length calculation for hash (ornladios#3875)
  Merge pull request ornladios#3823 from eisenhauer/SstMemSel
  gha,ci: update checkout to v4
  Blosc2 USE ON: Fix Module Fallback
  cmake: correct prefer_shared_blosc behavior
  cmake: correct info.h installation path
  ci: disable MGARD static build
  operators: fix module library
  ci: add downloads readthedocs
  cmake: Add Blosc2 2.10.1 compatibility.
  Fix destdir install test (ornladios#3850)
  cmake: update minimum cmake to 3.12 (ornladios#3849)
  MPI: add timeout for conf test for MPI_DP (ornladios#3848)
  MPI_DP: do not call MPI_Init (ornladios#3847)
  install: export adios2 device variables (ornladios#3819)
  Merge pull request ornladios#3799 from vicentebolea/support-new-yaml-cpp
  Merge pull request ornladios#3737 from vicentebolea/fix-evpath-plugins-path
  Partial FFS Upstream, only changes to type_id
  bpls -l  with scalar string variable: print the value (since min/max is empty). This changes the code for all types using Engine.Get() to get the value now.
  Set AWS version requirement to 1.10.15 and also turn it OFF by default as it is not a stable feature of ADIOS just yet.
  Fix local values block reading
  docs,ci: backport fixes for readthedocs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants