Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdiosView object that allows Get/Put to receive Kokkos::View #3320

Merged
merged 9 commits into from
Sep 14, 2022

Conversation

anagainaru
Copy link
Contributor

@anagainaru anagainaru commented Aug 19, 2022

Alternative to #3271 and #3254

This will allow ADIOS to receive Kokkos views without having to be build with Kokkos (which was the problem of #3254) and without loosing the context of the View in case in the future we want to pass it down the stream to ADIOS core functions (which will be needed when we compute the metadata using Kokkos).

The interface remains the same (the only change is that codes will require to include the KokkosView.h header from adios):

#include <adios2/cxx11/KokkosView.h>

Kokkos::View<float*, Kokkos::CudaSpace> gpuSimData("gpuBuffer", N);
Kokkos::View<float*, Kokkos::HostSpace> hostSimData("simBuffer", N);

bpWriter.Put(data, hostSimData);
bpWriter.Get(data, gpuSimData);

The memory space is extracted from the View object so any data.SetMemorySpace(some_space) will be overwritten to correspond to the View object.

Limitations

  • The Variable and Kokkos::View must have the same type, otherwise building will throw an error no matching function for call to 'Put'
  • Put/Get will only accept Variables with Kokkos:Views and not variable names.

Example to test the implementation

@anagainaru anagainaru changed the title Kokkos adios view AdiosView object that allows Get/Put to receive Kokkos::View Aug 19, 2022
@vicentebolea
Copy link
Collaborator

@anagainaru Great work there ⚡ can you give perms to make changes to the branch of this MR, I want to propose some changes

Copy link
Member

@eisenhauer eisenhauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think the limitation that it requires a variable rather than a variable name is a feature. The whole "give me a variable name so I have to do a string lookup on it" should be a second-class API on the writer side...

@anagainaru
Copy link
Contributor Author

@anagainaru Great work there ⚡ can you give perms to make changes to the branch of this MR, I want to propose some changes

How can I do this? Google didn't help :)

@vicentebolea
Copy link
Collaborator

@anagainaru Great work there ⚡ can you give perms to make changes to the branch of this MR, I want to propose some changes

How can I do this? Google didn't help :)

https://github.blog/2016-09-07-improving-collaboration-with-forks/

@anagainaru
Copy link
Contributor Author

The checkbox was already checked. I thought you meant something else. You should be able to edit or propose changes.

https://github.blog/2016-09-07-improving-collaboration-with-forks/

In the mean time I am still working on the ICEs

@vicentebolea
Copy link
Collaborator

Thanks @anagainaru I think we need to add an unit test or at least an example for it. I can generate new CI images with Kokkos.

@vicentebolea
Copy link
Collaborator

Notes about the last push:

I added an alternative that relies on runtime polymorphism over duck-typing using templates.

  • It allows Inversion-of-dependencies since you can pass around this base pointer without revealing its specific type. This is important since the user code that includes engine.h might not have access to KokkosView rather than a pointer of AdiosView which can be anything.
  • It is simpler and safer (it specifies the interface of the View).
  • Bad side is that we add one more level of indirection using the Virtual Table.

@vicentebolea
Copy link
Collaborator

Please feel free to challenge and push your alternative. This is a proposal :)

Copy link
Contributor Author

@anagainaru anagainaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions. There is an example (link in the PR description), but for now it's outside ADIOS2 just to make sure everything works before I add the example. I don't like code that is in copy pasted in multiple places so I will create the unit test with both CUDA and Kokkos (with Host and Cuda backends) into one test, but later.

Comment on lines 56 to 59
if(ADIOS2_HAVE_KOKKOS)
list(APPEND adios2_cxx11 adios2/cxx11/KokkosView.h)
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and decided against having a ADIOS2_HAVE_KOKKOS because I want to be able to pass a Kokkos::View without having to re-build ADIOS.

bindings/CXX11/CMakeLists.txt Show resolved Hide resolved
cmake/DetectOptions.cmake Outdated Show resolved Hide resolved
bindings/CXX11/adios2/cxx11/KokkosView.h Outdated Show resolved Hide resolved
@vicentebolea
Copy link
Collaborator

@anagainaru I am pushing the new CI CUDA image with Kokkos. As for now it has kokkos without CUDA, later on I will upload the one with kokkos+cuda. The image will be available in like 20 mins from now.

@vicentebolea
Copy link
Collaborator

@anagainaru the latest push of the CI image includes Kokkos+cuda

@anagainaru
Copy link
Contributor Author

@anagainaru the latest push of the CI image includes Kokkos+cuda

Great, I'm pushing the latest changes and will merge this tomorrow morning so everyone can take one final look

@anagainaru anagainaru force-pushed the kokkos-adios-view branch 2 times, most recently from 38bb230 to 199cdf9 Compare August 25, 2022 16:05
examples/CMakeLists.txt Outdated Show resolved Hide resolved
@vicentebolea
Copy link
Collaborator

We got a false positive in the CUDA build, it actually fails with:

   [ 47%] Building CXX object examples/heatTransfer/write/CMakeFiles/heatTransfer_write_h5mixer.dir/HeatTransfer.cpp.o
  /__w/ADIOS2/ADIOS2/source/examples/kokkos/kokkosWriteRead.cpp(84): warning #550-D: variable "size" was set but never used
  
  [ 47%] Building CXX object examples/heatTransfer/write/CMakeFiles/heatTransfer_write_h5mixer.dir/Settings.cpp.o
  [ 47%] Building CXX object examples/heatTransfer/write/CMakeFiles/heatTransfer_write_adios2.dir/Settings.cpp.o
  /opt/spack/opt/spack/linux-rocky8-x86_64/gcc-8.5.0/kokkos-3.6.01-25cpvecrlb2tdkofbh3ehh2j2enjr2mm/include/Cuda/Kokkos_Cuda_Parallel.hpp(466): error: calling a __host__ function("BPWrite(   ::std::__cxx11::basic_string<char,     ::std::char_traits<char> , ::std::allocator<char> > , unsigned long, int,    ::std::__cxx11::basic_string<char,     ::std::char_traits<char> , ::std::allocator<char> > )::[lambda(int) (instance 1)]::operator ()(int) const") from a __device__ function("Kokkos::Impl::ParallelFor<    ::BPWrite(   ::std::__cxx11::basic_string<char,     ::std::char_traits<char> , ::std::allocator<char> > , unsigned long, int,    ::std::__cxx11::basic_string<char,     ::std::char_traits<char> , ::std::allocator<char> > )   ::[lambda(int) (instance 1)],  ::Kokkos::RangePolicy< ::Kokkos::Cuda > ,  ::Kokkos::Cuda> ::exec_range<void>  const") is not allowed
  
  /opt/spack/opt/spack/linux-rocky8-x86_64/gcc-8.5.0/kokkos-3.6.01-25cpvecrlb2tdkofbh3ehh2j2enjr2mm/include/Cuda/Kokkos_Cuda_Parallel.hpp(466): error: identifier "BPWrite(   ::std::__cxx11::basic_string<char,     ::std::char_traits<char> , ::std::allocator<char> > , unsigned long, int,    ::std::__cxx11::basic_string<char,     ::std::char_traits<char> , ::std::allocator<char> > )::[lambda(int) (instance 1)]::operator () const" is undefined in device code

@anagainaru
Copy link
Contributor Author

anagainaru commented Aug 25, 2022

We got a false positive in the CUDA build, it actually fails with:

This is so weird, I will try to reproduce this on Summit. On our local machine I did not get any error.

@vicentebolea
Copy link
Collaborator

We got a false positive in the CUDA build, it actually fails with:

This is so weird, I will try to reproduce this on Summit. On our local machine I did not get any error.

I lost perms to push here, apply this patch for hopefully avoiding this false negative error:

--- scripts/ci/cmake/ci-common.cmake
+++ scripts/ci/cmake/ci-common.cmake
@@ -15,7 +15,7 @@ if(NOT CTEST_BUILD_FLAGS)
   if(CTEST_CMAKE_GENERATOR STREQUAL "Unix Makefiles")
     set(CTEST_BUILD_FLAGS "-k -j${N2CPUS}")
   elseif(CTEST_CMAKE_GENERATOR STREQUAL "Ninja")
-    set(CTEST_BUILD_FLAGS "-k0 -j${N2CPUS}")
+    set(CTEST_BUILD_FLAGS "-k -j${N2CPUS}")
   endif()
 endif()
 if(NOT PARALLEL_LEVEL IN_LIST CTEST_TEST_ARGS)

@anagainaru
Copy link
Contributor Author

I tested the code on Summit and change the example code to be more general so it can be executed on CUDA or Host (with any backend). I was able to run it from all the systems I tested (with threads, serial, openmp and cuda).

Should the patch be applied in this PR or a separate one?

@vicentebolea
Copy link
Collaborator

I tested the code on Summit and change the example code to be more general so it can be executed on CUDA or Host (with any backend). I was able to run it from all the systems I tested (with threads, serial, openmp and cuda).

Should the patch be applied in this PR or a separate one?

here is best

@vicentebolea
Copy link
Collaborator

The build output can be seen at: Build->Execute job step, for a strange reason CDASH does not consider it an error but it is since the example binary fails to build with an error message. I think we need to first make it work in the CI

@vicentebolea
Copy link
Collaborator

@anagainaru I am not sure about the error but it seems that you need to build your example with CUDA.

@anagainaru
Copy link
Contributor Author

@vicentebolea I am not an expert but I really don't think there is a problem with the code. It looks like there is a configuration issue, maybe building or linking Kokkos.

For an example that uses CUDA, Kokkos needs to be linked with ADIOS2 only to be able to run the Kokkos example.
But right now, once I receive a Kokkos::View I am stripping the buffer from Kokkos and passing the CUDA pointer to ADIOS2 so for sure ADIOS2 will need to be build with cuda support.

@vicentebolea
Copy link
Collaborator

@anagainaru please apply remove the previous patch and apply this patch

1.zip

@vicentebolea
Copy link
Collaborator

The patch installs Ninja that correctly parses the ouptut of nvcc and correctly reports the error

@anagainaru anagainaru self-assigned this Aug 31, 2022
@vicentebolea
Copy link
Collaborator

@anagainaru I updated most of the CI images, including the CUDA one. Got ahead drop my commit from your branch and rebase your branch onto the latest master commit.

As for the nvcc_handler of Kokkos while it is not needed for this case it needs to be installed since it is a requirement for KokkosCuda.

I suspect that the compiling errors will be solved if you set the language of the example code as CUDA when ADIOS2_HAVE_CUDA=ON

@vicentebolea
Copy link
Collaborator

@anagainaru the issue seems to be resolved after I install kokkos with the cuda_lambda variant. I will respin your builds

@anagainaru
Copy link
Contributor Author

Great ! I am back from vacation so I will have time to rebase this today if it's still needed.
Please lift the request for changes unless you need some other changes.

@vicentebolea
Copy link
Collaborator

@anagainaru still rebuilding the CI images, it will take about one hour to push them to Github. I will let you know. Welcome back!

Copy link
Collaborator

@vicentebolea vicentebolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the comments, please squash all the changes into a single commit co-authoring the people who contribute to this. The reason why we want a single commit is that is good to ensure that every commit builds in our CI for the purpose of revert or for debugging.

examples/kokkos/CMakeLists.txt Show resolved Hide resolved
@anagainaru
Copy link
Contributor Author

Other than the comments, please squash all the changes into a single commit co-authoring the people who contribute to this. The reason why we want a single commit is that is good to ensure that every commit builds in our CI for the purpose of revert or for debugging.

What other comments were not addressed?

I disagree with merging everything in a single commit, commits are supposed to be small and solve or change one item. We have PRs so we can revert.

@vicentebolea
Copy link
Collaborator

Other than the comments, please squash all the changes into a single commit co-authoring the people who contribute to this. The reason why we want a single commit is that is good to ensure that every commit builds in our CI for the purpose of revert or for debugging.

What other comments were not addressed?

I disagree with merging everything in a single commit, commits are supposed to be small and solve or change one item. We have PRs so we can revert.

The idea about each commit to be a consistent and CI tested snapshot is an industry standard practice.

Imagine this: Let say I git checkout to one of your middle commits in your branch, the code will be inconsistency since that is an intermediate stage, it might not even compile (has each of those commits being tested?). For you they make sense now as you had recently created them but picture another contributor a year from now.

I think this will be a long conversation and there are different points of view on this, so I will not block your PR on this, but lets discuss later about it.

@vicentebolea
Copy link
Collaborator

@anagainaru last thing is to rebase your branch to latest master to trigger the CI with the fixed CUDA build

@anagainaru
Copy link
Contributor Author

@vicentebolea There are many things to discuss about good practices with PR that we don't really follow. We should have a discussion when there are no other topics during one of our weekly meetings.

I agree that I should clean the commit history and merge some of the small fix commits, but I def do not agree this PR has only one commit. The CI will pass with the stump function for example so the AdiosView is not needed in the same commit. The example code should also def be separate from the internal changes, etc. Let's define a standard during one of the meetings for future PRs.

@vicentebolea
Copy link
Collaborator

Kokkos-CUDA CI passes 🎉

@anagainaru anagainaru merged commit 583a85f into ornladios:master Sep 14, 2022
This was referenced Sep 14, 2022
@anagainaru anagainaru deleted the kokkos-adios-view branch September 15, 2022 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants