-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-grained fast-math flags #1991
Comments
I agree, so better file this on the Julia repository? |
Looks like there already is at least one JuliaLang/julia#49890. |
I think we can close this issue then? |
Sure. |
I was thinking we could do |
I like that idea. So all of the code in the kernel (even within function calls) would use contract and reassoc? |
Yeah, kinda inspired by JuliaLang/julia#50239, I think we could solve this with stacked OverlayMethodTables |
I think it would be better to prototype this in an external package, and have CUDA.jl use that package's overlay table. That way the functionality wouldn't be locked into the CUDA.jl ecosystem either. |
Something akin to https://github.com/JuliaSIMD/LLVMLoopInfo.jl? That would be great. Is the idea to use CassetteOverlay to create some standard passes for each fast-math flag and the use those in the kernels via macros? I am not sure how to stack these for combining fast-math flags. |
No more like https://github.com/vchuravy/FastmathOverlay.jl I don't have a good solution for combining flags... yet. |
Okay #2037 is a prototype of that idea. Now that we know it is feasible we have to decide if we like it. Composition of certain things is possible, and for other things it is tedious. As an example say you want to opt into We sadly can't use the same method for composing Right now the only idea I have for |
Is your feature request related to a problem? Please describe.
To get kernel performance matching
clang
we have had to add fast-math flags such ascontract
(whichclang
andnvcc
do by default). Currently, we do this by an ugly-hack, see for exampleCUDA.jl/perf/volumerhs.jl
Lines 21 to 57 in bb37b50
Describe the solution you'd like
I would like a macro like
@fastmath
that had fine-grained control over the fast-math flags.Describe alternatives you've considered
KernelAbstractions used to do this with https://github.com/JuliaLabs/Cassette.jl and other people use macros (although it opens up less optimization and thus not desired) https://github.com/JuliaLabs/Cassette.jl. I don't know if https://github.com/JuliaDebug/CassetteOverlay.jl can be used with kernels but it might be a possible way to implement this.
It would be nice if this functionality eventually got added to base julia.
The text was updated successfully, but these errors were encountered: