-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for 'delayed kernels' #569
Conversation
Also, I don't particularly like the |
Or maybe it should be |
Codecov Report
@@ Coverage Diff @@
## master #569 +/- ##
==========================================
- Coverage 80.34% 80.28% -0.07%
==========================================
Files 116 116
Lines 6889 6883 -6
==========================================
- Hits 5535 5526 -9
- Misses 1354 1357 +3
Continue to review full report at Codecov.
|
GPUifyLoops used to, but for KA I switched back to the simpler one. But yes this is great |
Changed it from |
When you need to introspect the compiled kernel, e.g. to determine a launch configuration, you either have to do the whole
cudaconvert
andcufunction
dance manually, or use the hackyconfig=callback
argument to@cuda
. Both are pretty cumbersome, so here I introduce an alternative:@cuda delayed=true kernel(args...)
, returning a callable object you can then just introspect and finally call usingkernel_object(args...; threads=..., blocks=..., shmem=...)
.cc @vchuravy, as KA probably uses the lower-level interface.