Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL Tweaks, main branch (2024.12.07.) #799

Merged
merged 4 commits into from
Dec 8, 2024

Conversation

krasznaa
Copy link
Member

@krasznaa krasznaa commented Dec 7, 2024

Continuing with trying to make #773 work...

That PR by now compiles ~successfully for all platforms, but produces weird runtime issues on anything beside an NVIDIA backend. Since I wasn't sure how to best tackle that issue, decided to rather do some cleanup that's been long overdue. But as it happens, I may have found at least part of the problem.

One of the commits of the PR is just taken from #773 as is, simplifying the sanity checking code a little.

For the rest, I set out to move the non-public parts of the SYCL code under device/sycl/src, so that unsuspecting users would not be exposed to <sycl/sycl.hpp>. (At the same time I also made traccc::sycl publicly depend on traccc::device_common. Since it does.)

While doing that, at first I just wanted to tweak traccc::sycl::thread_id1 and traccc::sycl::barrier a little, so that I'd like them more. But I actually found some implementation issues with both of them, at which point I decided to fully re-design them. 🤔

  • They now both have a template parameter for the dimensionality of the grid/kernel that they should operate on. Set to 1 by default.
  • Moved both of them into the traccc::sycl::details:: namespace, just because I decided I liked them better there. 😛
  • Modified the member functions of traccc::sycl::details::thread_id (the artist formerly known as traccc::sycl::thread_id1) to (hopefully) return correct values finally. As some of the member functions didn't so far. 😦
    • While doing this, I noticed that the traccc::device::concepts::thread_id1 concept didn't require the member functions of these types to be const. Even though our code itself assumed it. So I modified the concept to require constness on the functions.
  • The issue with traccc::sycl:details::barrier (formerly traccc::sycl::barrier) was even weirder. It currently uses a function called ::sycl::nd_item<...>::barrier(). But that function is not actually part of the SYCL2020 standard! (https://github.khronos.org/SYCL_Reference/iface/nd_item.html)
    • It's a function that is there in oneAPI, looking like: https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/nd_item.hpp#L201-L206
    • Notice that it has a default argument, very much suggesting that a global synchronization would be attempted by default.
    • To be SYCL2020 compatible, I rather switched to using sycl::group_barrier.
    • At the same time, I had to realize that the member functions of this class didn't need to be non-const. So I made them const, and updated the concept to require these functions to be const.

I had to tweak some common and test code to account for these changes, but I think this just made those a bit easier to use.

Unfortunately I'm not sure whether these changes will fix my observed issues in #773 or not. (Since these changes ended up diverging a bit too much from that PR.) But I do believe that they are good fixes in either case. And fingers crossed that they'll help with the track finding code as well...

Stopped it from using traccc::sycl::queue_wrapper, since that code
anyway can only be used by a SYCL compiler. So it might as well
just use ::sycl::queue directly.
At the same time use this opportunity to re-design the
thread_id and barrier types into something a bit more
generic, and hopefully more correct.

Update the common device code to work with the modified
(now const) SYCL utility types.
@krasznaa krasznaa added bug Something isn't working cleanup Makes the code all clean and tidy sycl Changes related to SYCL labels Dec 7, 2024
Comment on lines -17 to +25
void blockBarrier() { __syncthreads(); }
__device__ inline void blockBarrier() const { __syncthreads(); }

TRACCC_DEVICE
bool blockAnd(bool predicate) { return __syncthreads_and(predicate); }
__device__ inline bool blockAnd(bool predicate) const {
return __syncthreads_and(predicate);
}

TRACCC_DEVICE
bool blockOr(bool predicate) { return __syncthreads_or(predicate); }
__device__ inline bool blockOr(bool predicate) const {
return __syncthreads_or(predicate);
}

TRACCC_DEVICE
int blockCount(bool predicate) { return __syncthreads_count(predicate); }
__device__ inline int blockCount(bool predicate) const {
return __syncthreads_count(predicate);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why? The previous version felt more consistent with the rest of the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we use a macro in a file that can only be used in CUDA device code? Much like how I removed TRACCC_DEVICE from the SYCL code (as it served really no purpose there), I thought this would also be a good simplification.

The macros are really only needed in the shared code. CUDA specific code should be free to use __device__, __global__, etc.

device/common/include/traccc/device/concepts/barrier.hpp Outdated Show resolved Hide resolved
device/common/include/traccc/device/concepts/thread_id.hpp Outdated Show resolved Hide resolved
@@ -19,7 +19,7 @@ namespace traccc::device::concepts {
* @tparam T The thread identifier-like type.
*/
template <typename T>
concept thread_id1 = requires(T& i) {
concept thread_id1 = requires(const T& i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're making the thread_id objects templated on the dimensionality, why not the thread_id concept too? 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that. But decided that I like the non-templated concept better. 🤔

We use very specific function names in the concept. In a higher dimensionality concept we would probably replace X in a couple of the function names with Y.

I think I'd rather just add thread_id2 as a concept once we need it. (I imagine it would require the thread_id1 concept, and add some more required functions.) That would still allow us to be pretty free with how we provide the implementation classes.

device/sycl/src/utils/barrier.hpp Outdated Show resolved Hide resolved
device/sycl/src/utils/thread_id.hpp Outdated Show resolved Hide resolved
device/sycl/src/utils/barrier.hpp Show resolved Hide resolved
device/sycl/src/utils/thread_id.hpp Outdated Show resolved Hide resolved
device/sycl/src/utils/barrier.hpp Show resolved Hide resolved
@stephenswat
Copy link
Member

Modified the member functions of traccc::sycl::details::thread_id (the artist formerly known as traccc::sycl::thread_id1) to (hopefully) return correct values finally. As some of the member functions didn't so far. 😦

Where? I cant say I see what you mean.

@krasznaa
Copy link
Member Author

krasznaa commented Dec 7, 2024

Modified the member functions of traccc::sycl::details::thread_id (the artist formerly known as traccc::sycl::thread_id1) to (hopefully) return correct values finally. As some of the member functions didn't so far. 😦

Where? I cant say I see what you mean.

Let me give one example, you can then look at the rest. See:

https://github.com/acts-project/traccc/blob/main/device/sycl/include/traccc/sycl/utils/thread_id.hpp#L22

It doesn't return the "X component" of the global thread ID, but rather the "linearized global thread ID".

Since we don't use multi-dimensional kernels actively, I don't think it would've hurt us yet. But these should still do what they advertise.

@stephenswat
Copy link
Member

Modified the member functions of traccc::sycl::details::thread_id (the artist formerly known as traccc::sycl::thread_id1) to (hopefully) return correct values finally. As some of the member functions didn't so far. 😦

Where? I cant say I see what you mean.

Let me give one example, you can then look at the rest. See:

https://github.com/acts-project/traccc/blob/main/device/sycl/include/traccc/sycl/utils/thread_id.hpp#L22

It doesn't return the "X component" of the global thread ID, but rather the "linearized global thread ID".

Since we don't use multi-dimensional kernels actively, I don't think it would've hurt us yet. But these should still do what they advertise.

I see your point but the implementation is fully correct in the context of a 1D thread identifier lookup. Just trying to understand if this is related to your SYCL woes, but I am guessing not then.

@krasznaa
Copy link
Member Author

krasznaa commented Dec 7, 2024

I see your point but the implementation is fully correct in the context of a 1D thread identifier lookup. Just trying to understand if this is related to your SYCL woes, but I am guessing not then.

I suspect / hope that the changes in the barrier class will have an effect on track finding. The ID class indeed probably won't change the behaviour.

@stephenswat
Copy link
Member

Let's hope 🤞

Copy link

sonarqubecloud bot commented Dec 7, 2024

@stephenswat stephenswat merged commit a587bc8 into acts-project:main Dec 8, 2024
27 checks passed
@krasznaa krasznaa deleted the SYCLTweaks-main-20241207 branch December 8, 2024 11:12
@krasznaa
Copy link
Member Author

krasznaa commented Dec 8, 2024

Unfortunately I already tried this morning, that these changes by themselves are not doing much to what I'm observing in #773 with the Intel and AMD backends. 😦

Still, these were good cleanups to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cleanup Makes the code all clean and tidy sycl Changes related to SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants