Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify vecmem/alpaka interaction #670

Merged
merged 13 commits into from
Nov 11, 2024

Conversation

StewMH
Copy link
Contributor

@StewMH StewMH commented Aug 12, 2024

Move all ifdefs to a single file. Might need some further thought (naming, what we do for SYCL) but makes things better for now already.

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me Stewart, how does Alpaka handle the platform selection during its build itself? How can you query Alpaka for the current platform that the compilation is happening for?

Because we should integrate better with that system. I would hope that we could create something like a "trait class" that would provide us with the appropriate VecMem types, based on some "Alpaka type".

But I know very little about the implementation of Alpaka. So you'd have to tell me how that could or couldn't work in practice... 🤔

* Mozilla Public License Version 2.0
*/

#pragma once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could accept such a header as a "private header". Something under https://github.com/acts-project/traccc/tree/main/device/alpaka/src. But we should not expose any of these macros in a "public header".

@StewMH
Copy link
Contributor Author

StewMH commented Aug 12, 2024

At compile time you can query what accelerator you're using via alpaka::accMatchesTags. So we could do e.g.

if (alpaka::accMatchesTags<TAcc, ::alpaka::TagGpuCudaRt>)
  my_device_memory_resource = vecmem::cuda::device_memory_resource;

but I don't know if this runs into problems with types. Not immediately sure how we avoid ifdef'ed includes either.

EDIT: this now reads read compile-time not runtime.

@krasznaa
Copy link
Member

🤔 Again, with very little actual knowledge about Alpaka, it seems to me that the exact type of the "Alpaka queue" would allow you at compile time to decide what type of backend you're running on, no?

In https://github.com/acts-project/traccc/blob/main/device/alpaka/src/utils/utils.hpp the queue type is set up in a pretty simple way at the moment. But from that, I see that there are a bunch of different "accelerator types" that the queue could be created with.

We should find a way to decide about the correct VecMem types based on which of this template type is used in alpaka::Queue. 🤔

Generally, if we want to use Alpaka seriously, we'll need some type of plugin system in our code. One that could determine at runtime which backends the various algorithms were compiled for. And would be able to instantiate the algorithm compiled for the appropriate backend, based on some job configuration option. (This is sort of what CMS does as I understand it.)

For now it's okay to only compile our Alpaka code for a single backend at a time. But even then, the selection of the VecMem types should be done in "some templated way", so that we would only have to choose the appropriate Alpaka types for the build in a single place, and everything else would be derived from those.

So: Do you have experience with using "traits"? I could point you to some examples of what sort of thing I have in mind.

@StewMH
Copy link
Contributor Author

StewMH commented Aug 13, 2024

Not familiar with traits: please do send me examples of what you have in mind.

@krasznaa
Copy link
Member

Forgot about this one... So, a "trait" is something that behaves like:

struct CUDACopy {};
struct SYCLCopy {};

struct CUDAQueue {};
struct SYCLQueue {};

template <typename T>
struct CopySelector {};

template <>
struct CopySelector<CUDAQueue> {
   using copy_type = CUDACopy;
};

template <>
struct CopySelector<SYCLQueue> {
   using copy_type = SYCLCopy;
};

With such a piece of code, you could do something like:

template <typename QUEUE>
void useQueue(QUEUE& q) {
   // Create a copy object that can handle the queue.
   typename CopySelector<QUEUE>::copy_type copy{};
   ...
}

You can find a lot of complicated examples for "traits" for instance in:

Yepp, templating can get pretty complicated pretty quickly... 😦

@StewMH
Copy link
Contributor Author

StewMH commented Aug 14, 2024

Thanks Attila. I can look at this. In the meantime can I go ahead with this version after moving it into a private header?

@krasznaa
Copy link
Member

Yes, a "private header version" would be okay for the time being. 👍

Stewart Martin-Haugh stewart.martin-haugh@stfc.ac.uk added 3 commits November 4, 2024 17:17
@StewMH
Copy link
Contributor Author

StewMH commented Nov 4, 2024

Getting towards a fully type traits version of this. I think there will have to be either conditional vecmem header includes or some kind of CMake magic still

Stewart Martin-Haugh stewart.martin-haugh@stfc.ac.uk added 2 commits November 5, 2024 16:12
Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general direction though. So the final tuning of how the trait works would not have to happen right away...

@@ -0,0 +1,103 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not absolutely necessary to do it in this PR, but this header should eventually be in the src/ directory. Generally we'll have to avoid any public dependence on the Alpaka headers in this library. Just like how we avoid any public CUDA or SYCL dependence in those libraries. (Not counting some coding mishaps, that still appear in some places...)

#pragma once

// VecMem include(s).
#ifdef ALPAKA_ACC_GPU_CUDA_ENABLED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the exact logic should be here... 🤔

Generally, I think I'd try to organise this code like:

// define the generic host_device_types trait

#ifdef TRACCC_HAVE_VECMEM_CUDA
#include <vecmem/memory/cuda/device_memory_resource.hpp>
...
template <>
struct host_device_types<::alpaka::TagGpuCudaRt> {
    using device_memory_resource = ::vecmem::cuda::host_memory_resource;
...
#endif
#ifdef TRACCC_HAVE_VECMEM_SYCL
#include <vecmem/memory/sycl/device_memory_resource.hpp>
...
#endif

I.e. we should just depend on all the headers/libraries in this header that are available to us during the build. And declare the trait specializations for all the available types / environments.

The decision about which environment is being chosen for the Alpaka build, should be a separate one. That choice should be able to select between all the available options, presented by this header.

And yes, the TRACCC_HAVE_VECMEM_CUDA, etc. definitions would have to come from CMake in some clever way.

@StewMH
Copy link
Contributor Author

StewMH commented Nov 11, 2024

Thanks Attila, if we can go ahead with this I will try to take the next steps soon.

Copy link

@krasznaa krasznaa enabled auto-merge (squash) November 11, 2024 16:31
@krasznaa krasznaa merged commit 29ca228 into acts-project:main Nov 11, 2024
20 of 26 checks passed
flg pushed a commit to flg/traccc that referenced this pull request Nov 26, 2024
* Move vecmem ifdefs to separate file

* fix ifdefs

* Use host memory as device memory in host-only mode

* Move seq and seed examples over to typedefs

* Formatting

* Move to type traits

* Fix formatting

* Complete vecmem types implementation

* Formatting

---------

Co-authored-by: Stewart Martin-Haugh stewart.martin-haugh@stfc.ac.uk <smh@cern.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants