-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL Tweaks, main branch (2024.12.07.) #799
SYCL Tweaks, main branch (2024.12.07.) #799
Conversation
Stopped it from using traccc::sycl::queue_wrapper, since that code anyway can only be used by a SYCL compiler. So it might as well just use ::sycl::queue directly.
At the same time use this opportunity to re-design the thread_id and barrier types into something a bit more generic, and hopefully more correct. Update the common device code to work with the modified (now const) SYCL utility types.
void blockBarrier() { __syncthreads(); } | ||
__device__ inline void blockBarrier() const { __syncthreads(); } | ||
|
||
TRACCC_DEVICE | ||
bool blockAnd(bool predicate) { return __syncthreads_and(predicate); } | ||
__device__ inline bool blockAnd(bool predicate) const { | ||
return __syncthreads_and(predicate); | ||
} | ||
|
||
TRACCC_DEVICE | ||
bool blockOr(bool predicate) { return __syncthreads_or(predicate); } | ||
__device__ inline bool blockOr(bool predicate) const { | ||
return __syncthreads_or(predicate); | ||
} | ||
|
||
TRACCC_DEVICE | ||
int blockCount(bool predicate) { return __syncthreads_count(predicate); } | ||
__device__ inline int blockCount(bool predicate) const { | ||
return __syncthreads_count(predicate); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why? The previous version felt more consistent with the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we use a macro in a file that can only be used in CUDA device code? Much like how I removed TRACCC_DEVICE
from the SYCL code (as it served really no purpose there), I thought this would also be a good simplification.
The macros are really only needed in the shared code. CUDA specific code should be free to use __device__
, __global__
, etc.
@@ -19,7 +19,7 @@ namespace traccc::device::concepts { | |||
* @tparam T The thread identifier-like type. | |||
*/ | |||
template <typename T> | |||
concept thread_id1 = requires(T& i) { | |||
concept thread_id1 = requires(const T& i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're making the thread_id
objects templated on the dimensionality, why not the thread_id
concept too? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that. But decided that I like the non-templated concept better. 🤔
We use very specific function names in the concept. In a higher dimensionality concept we would probably replace X
in a couple of the function names with Y
.
I think I'd rather just add thread_id2
as a concept once we need it. (I imagine it would require the thread_id1
concept, and add some more required functions.) That would still allow us to be pretty free with how we provide the implementation classes.
Where? I cant say I see what you mean. |
Let me give one example, you can then look at the rest. See: It doesn't return the "X component" of the global thread ID, but rather the "linearized global thread ID". Since we don't use multi-dimensional kernels actively, I don't think it would've hurt us yet. But these should still do what they advertise. |
I see your point but the implementation is fully correct in the context of a 1D thread identifier lookup. Just trying to understand if this is related to your SYCL woes, but I am guessing not then. |
I suspect / hope that the changes in the barrier class will have an effect on track finding. The ID class indeed probably won't change the behaviour. |
Let's hope 🤞 |
Quality Gate passedIssues Measures |
Unfortunately I already tried this morning, that these changes by themselves are not doing much to what I'm observing in #773 with the Intel and AMD backends. 😦 Still, these were good cleanups to have. |
Continuing with trying to make #773 work...
That PR by now compiles ~successfully for all platforms, but produces weird runtime issues on anything beside an NVIDIA backend. Since I wasn't sure how to best tackle that issue, decided to rather do some cleanup that's been long overdue. But as it happens, I may have found at least part of the problem.
One of the commits of the PR is just taken from #773 as is, simplifying the sanity checking code a little.
For the rest, I set out to move the non-public parts of the SYCL code under
device/sycl/src
, so that unsuspecting users would not be exposed to<sycl/sycl.hpp>
. (At the same time I also madetraccc::sycl
publicly depend ontraccc::device_common
. Since it does.)While doing that, at first I just wanted to tweak
traccc::sycl::thread_id1
andtraccc::sycl::barrier
a little, so that I'd like them more. But I actually found some implementation issues with both of them, at which point I decided to fully re-design them. 🤔1
by default.traccc::sycl::details::
namespace, just because I decided I liked them better there. 😛traccc::sycl::details::thread_id
(the artist formerly known astraccc::sycl::thread_id1
) to (hopefully) return correct values finally. As some of the member functions didn't so far. 😦traccc::device::concepts::thread_id1
concept didn't require the member functions of these types to beconst
. Even though our code itself assumed it. So I modified the concept to require constness on the functions.traccc::sycl:details::barrier
(formerlytraccc::sycl::barrier
) was even weirder. It currently uses a function called::sycl::nd_item<...>::barrier()
. But that function is not actually part of the SYCL2020 standard! (https://github.khronos.org/SYCL_Reference/iface/nd_item.html)I had to tweak some common and test code to account for these changes, but I think this just made those a bit easier to use.
Unfortunately I'm not sure whether these changes will fix my observed issues in #773 or not. (Since these changes ended up diverging a bit too much from that PR.) But I do believe that they are good fixes in either case. And fingers crossed that they'll help with the track finding code as well...