Add BFloat16 runtime intrinsics. #51790
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After switching to LLVM for BFloat16 in #51470 (i.e., relying on
Intrinsics.sub_float
etc instead of hand-rolling bit-twiddling implementations), we also need to provide fallback runtime implementations for these intrinsics. This is too bad; I had hoped to put as much BFloat16-related things as possible in BFloat16s.jl.It required modifying the unary operator preprocessor macros in order to differentiate between Float16 and BFloat16; I didn't generalize that to all intrinsics as the code is hairy enough already (and it's currently only useful for fptrunc/fpext).
@vtjnash @Keno Any suggestions for an alternative approach that keeps more of BFloat16 out of base? Ideally we'd implement these runtime fallbacks in Julia, as part of BFloat16s.jl (in fact, most of them already have an implementation over there), but that seems hard. Alternatively, we could require codegen.