Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Instead of hard-coding it to 6.3, select the highest version available. We now also differentiate between what LLVM supports, which is part of the CompilerTarget, and what the CUDA toolkit does, which we now store in the CUDACompilerParams. For the compute capability, that means emitting code for e.g. sm_89 and passing
-arch=sm_90
toptxas
. For the PTX ISA, that's not possible, so we string-replace the.version
directive in the generated assembly. Feels icky, but I think it should work (on the condition we don't use instructions that are deprecated between the PTX ISA used by LLVM, and the one we replace it with, but that's generally a very small window).One annoying aspect is that the
compute_version()
andptx_isa()
getters for kernel code currently return the LLVM-level compatibility, so we might not generate the best code. However, I don't think we can bump this to the CUDA-level compatibility, as that may risk running into LLVM selection errors. And it doesn't seem worth splitting intollvm_compute_capability
andcuda_compute_capability
, where the latter can only use inline assembly.Fixes #2080