-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DemoteFloat16 a conditional pass #43327
Conversation
This needs to integrate with multiversioning so that we can create a sysimage in which this is enabled and disabled and then loaded conditionally #40216 (comment) |
The multiversioning stuff is a bit out of my water so what I did was kind of pattern matching. |
Bump :) |
src/aotcompile.cpp
Outdated
if (optlevel > 1) | ||
PM->add(createGVNPass()); | ||
auto feat_string = TM->getTargetFeatureString(); | ||
if(feat_string.find("+fp16fml") == llvm::StringRef::npos||feat_string.find("+fp16fml") == llvm::StringRef::npos){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of doing this check here, I think we need to do it on a per-function basis, within the pass.
The way this works is that the multi-versioning pass will clone the function if it contains Float16 ops and then we have two copies of the function, one that should have a target feature set and one without it. On the one that is lacking the target-feature we still want to run the DemoteFloat16 pass.
For the JIT context there is likely a similar issue that @vtjnash, @JeffBezanson and I discussed yesterday.
See #43085 (comment) for some more context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the check to inside the pass, but I don't think the way the check is done is the best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiversioning check part seems correct (per-function). The issue is jl_ExecutionEngine->getTargetFeatureString
right now can be looking at the wrong target for many people (e.g. GPUs, static-compilation, sysimg building), which is what https://reviews.llvm.org/D120585 was hoping to solve (edit: fixed link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just did it similarly to how we do FMA. Could we get that the feature flags from TTI? Or does that need your change.
Also it seems the backend already does what demote float16 would do, but I guess it would then miss some possible optimizations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, only builtin LLVM passes are allowed to access that data, but it is forbidden to external passes. That PR would make is available to all passes, but it is possible that LLVM devs will conceptually dislike the idea of fixing this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the "correct" thing would be to add a hasFloat16
function or something like it. But that sounds like a very roundabout way to solve something that seems quite obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any other idea of how this could be done? Also isn't it an issue if the TargetMachine disagrees with TargetFeatureString? What does machine code generation use as the truth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superficially looks OK, but I'm really not that familiar with the multiversioning pass.
return true; | ||
} | ||
#else | ||
if (FS.find("+avx512fp16") != llvm::StringRef::npos){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note https://reviews.llvm.org/D107082 we are not there yet, but LLVM will support _Float16
correctly on SSE2 and above.
Note that the LLVM PR also changes the ABI to match GCC12 and thus is going to break us in fun ways. I haven't found how -fexcess-precision=16
is going to be implemented in LLVM,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added this there because GCC was complaining about FS being unused. That part of the branch doesn't matter for now since x86 is considered as never having Float16 for now.
SSE2 has f16C instructions which are just fast conversions which we might already use, I know we use them on aarch64 at least. The first native operations on float16 are the avx512 ones.
738798f
to
8eff5d1
Compare
8eff5d1
to
893b1a1
Compare
* add TargetMachine check * Add initial float16 multiversioning stuff * make check more robust and remove x86 check * move check to inside the pass * C++ is hard * Comment out the ckeck because it won't work inside the pass * whitespace in the comment * Change the logic not to depend on a TM * Add preliminary support for x86 test * Cosmetic changes (cherry picked from commit d18fd47)
Attempt at #40216
For now it's just an if statement, which might be enough. I wasn't sure if the check should be inside the pass or if the pass should be conditional. For now it's outside it.
For this to work on the m1 #41924 needs to be merged
Fix #40216.