Improve PTX ISA selection #2088

maleadt · 2023-09-20T09:20:11Z

Instead of hard-coding it to 6.3, select the highest version available. We now also differentiate between what LLVM supports, which is part of the CompilerTarget, and what the CUDA toolkit does, which we now store in the CUDACompilerParams. For the compute capability, that means emitting code for e.g. sm_89 and passing -arch=sm_90 to ptxas. For the PTX ISA, that's not possible, so we string-replace the .version directive in the generated assembly. Feels icky, but I think it should work (on the condition we don't use instructions that are deprecated between the PTX ISA used by LLVM, and the one we replace it with, but that's generally a very small window).

One annoying aspect is that the compute_version() and ptx_isa() getters for kernel code currently return the LLVM-level compatibility, so we might not generate the best code. However, I don't think we can bump this to the CUDA-level compatibility, as that may risk running into LLVM selection errors. And it doesn't seem worth splitting into llvm_compute_capability and cuda_compute_capability, where the latter can only use inline assembly.

Fixes #2080

codecov · 2023-09-20T10:32:21Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.10% 🎉

Comparison is base (a0dfc35) 71.59% compared to head (ce6b8cc) 71.70%.
Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2088      +/-   ##
==========================================
+ Coverage   71.59%   71.70%   +0.10%     
==========================================
  Files         157      157              
  Lines       13812    13834      +22     
==========================================
+ Hits         9889     9919      +30     
+ Misses       3923     3915       -8

Files Changed	Coverage Δ
src/compiler/execution.jl	`84.00% <ø> (ø)`
src/utilities.jl	`83.01% <ø> (-0.32%)`	⬇️
src/compatibility.jl	`98.14% <100.00%> (+1.85%)`	⬆️
src/compiler/compilation.jl	`95.37% <100.00%> (+5.44%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maleadt added 6 commits September 20, 2023 10:20

Allow overriding the architecture and ISA from the kernel macro.

f149850

Update compatibility database.

875f8e5

Bump the PTX ISA to the latest available by CUDA.

3525797

Adapt runtime generation helper.

8373f79

Fix PTX logic (should be higher or equal).

1288724

Fix runtime precompilation logic.

243a950

maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Sep 20, 2023

Bound the chosen capability by the PTX ISA.

ce6b8cc

maleadt merged commit 3398c05 into master Sep 20, 2023
1 check passed

maleadt deleted the tb/ptx_isa branch September 20, 2023 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve PTX ISA selection #2088

Improve PTX ISA selection #2088

maleadt commented Sep 20, 2023

codecov bot commented Sep 20, 2023 •

edited

Loading

Improve PTX ISA selection #2088

Improve PTX ISA selection #2088

Conversation

maleadt commented Sep 20, 2023

codecov bot commented Sep 20, 2023 • edited Loading

Codecov Report

codecov bot commented Sep 20, 2023 •

edited

Loading