-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile using the toolkit, not the driver. #892
Conversation
Not too much slower:
Whereas before:
|
12795c7
to
2df1a9f
Compare
CI times seem to suggest this does hurt compile-time performance... |
Codecov Report
@@ Coverage Diff @@
## master #892 +/- ##
==========================================
- Coverage 77.00% 76.32% -0.68%
==========================================
Files 121 121
Lines 7706 7811 +105
==========================================
+ Hits 5934 5962 +28
- Misses 1772 1849 +77
Continue to review full report at Codecov.
|
There still looks to be some difference, about 10% in compile time, but that seems to be caused by |
I'm still bothered by the performance hit... I'm first going to add some timers to |
0b7331c
to
62cb88e
Compare
8bca8f9
to
69cf7ec
Compare
Avoiding cmd-in-cmd interpolation seems to make inference happy, and with it I think this PR is basically at performance parity 🎉 Maybe some context why this is exciting: if we can use |
In response to the driver's compiler bug in #891, which doesn't manifest on 11.3's
ptxas
. Should also enable #832 without having to uselibnvptxcompiler
. Might also help towards fixing #812, because if we additionally exposelibnvvm
, we can do LTO withnvlink
(this is not exposed by the linker API).Currently stuck on https://forums.developer.nvidia.com/t/manually-perform-separate-compilation-with-ptxas-and-nvlink/177183, I can't find how to use
nvlink
to linklibcudadevrt
.Artifacts have not been adapted, so this only works using `JULIA_CUDA_USE_BINARYBUILDER=false. But it does pass all tests, except for the ones requiring the device runtime .