[SYCL2020] Add experimental support for optional lambda kernel naming #281
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This (re-)introduces support for optional lambda kernel naming when using clang >= 10.
About kernel naming in hipSYCL
Because the HIP/CUDA programming models do not require kernel naming (even when instantiating a
__global__
kernel with a lambda), it may be surprising that hipSYCL requires lambda kernel naming at all, and indeed, early versions of hipSYCL did not require it. Then came issue #49, where we noticed that clang's HIP/CUDA frontend would not enumerate kernel lambdas consistently across host and device passes, which can potentially cause mismatches in the mangled kernel names between host and device. This in turn can create spurious, and silent kernel launch failures that are very hard to reproduce and very frustrating to debug.To circumvent this issue, mandatory kernel naming as described in the SYCL specification was introduced to hipSYCL.
How this PR works
Originally, my intention was to borrow code from DPC++, which has support for generating unique names for lambda functions. However, due to the way the clang name mangling code is structured, we would have needed to pull in several thousand lines of code from Intel's clang fork, which is impossible to maintain and get working across the multiple clang versions we support.
Thankfully, it seems that in the meantime the good folk working on clang CUDA/HIP have addressed the original issue from clang 10 onwards, so this turned out to be easier than expected:
https://reviews.llvm.org/D68818
This means that if clang is sufficiently new, we can just ignore if the user did not provide a kernel name and hope that clang mangles the kernel correctly.
Because the original issue #49 can be very spurious and hard to reproduce, we should still consider this support somewhat experimental as I might simply have been lucky in my testing.
If it turns out that the issue is still present in upstream clang, we will have to resort to plan B. Because of how name mangling works in clang we cannot just switch out the lambda part that we are interested in - So we would need to partially demangle the clang-generated kernel name, identify all lambda functions and replace their enumerations with a conglomerate of the the line/column numbers of the definition of all involved lambda functions. Sounds annoying, and probably is, so let's cross our fingers that this here works :)