-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison #120718
Conversation
r? @cuviper (rustbot has picked a reviewer for you, use r? to override) |
03e0aa5
to
744641f
Compare
744641f
to
6342116
Compare
I'd like to support this (I mean, I suggested it to @saethlin), it seems that with just the current safe LLVM flags available a lot of optimizations are already possible (and more algebraically justified optimizations could be added in the future). Most notably, autovectorization and FMA conversion both work out of the box. From a quick test of this PR, fn sum(arr: &[f32]) -> f32 {
arr.iter().fold(0.0, |a, b| fadd_algebraic(a, *b))
} generated excellent autovectorized code, completely safely. The current |
Some changes occurred in compiler/rustc_codegen_gcc |
6342116
to
27db5bc
Compare
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 |
Zulip thread with more context: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/.22algebraic.22.20fast-math.20intrinsics Interestingly, the IEEE standard does actually say that a language standard should have functionality like this: It says this should be per 'block' rather than per operation, but that's just syntax. |
I think this needs more compiler review than libs... r? compiler |
27db5bc
to
740338c
Compare
☔ The latest upstream changes (presumably #120500) made this pull request unmergeable. Please resolve the merge conflicts. |
740338c
to
d932173
Compare
I'm on vacation. |
I->setHasAllowReassoc(true); | ||
I->setHasAllowContract(true); | ||
I->setHasAllowReciprocal(true); | ||
I->setHasNoSignedZeros(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about afn
(Approximate functions)? It doesn't poison but isn't mentioned here.
Does the word "algebraic" have a specific meaning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I see from other places that it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
afn
(Approximate functions)? It doesn't poison but isn't mentioned here.
Sure, why not. I'll add it.
Does the word "algebraic" have a specific meaning here?
I didn't want to just defang the existing intrinsics because there are some optimizations which rely on assuming that NaN/Inf do not occur. So I needed to come up with a new name, and the way I think about these intrinsics is that unlike IEEE float operations, they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c
and a / b = a * (1 / b)
.
I don't think that the name is perfect, and I'd be happy to see someone suggest a better name, but @orlp seems perfectly happy calling them "algebraic".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they permit the usual algebraic transformations. Things like
a + (b + b) = (a + b) + c
anda / b = a * (1 / b)
That would be a great explanation to put in a comment somewhere :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.
Sure, why not. I'll add it.
Please don't. Replacing functions by their approximations isn't an algebraically justified optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 And in any case, I don't think it would matter for these intrinsics.
@@ -1882,6 +1882,46 @@ extern "rust-intrinsic" { | |||
#[rustc_nounwind] | |||
pub fn frem_fast<T: Copy>(a: T, b: T) -> T; | |||
|
|||
/// Float addition that allows optimizations based on algebraic rules. | |||
/// | |||
/// This intrinsic does not have a stable counterpart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which meaning of "stable" does this use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to call this intrinsic is to use the core_intrinsics
feature. We do not have a wrapper for these like the atomic intrinsics.
@@ -417,8 +417,7 @@ extern "C" LLVMAttributeRef LLVMRustCreateMemoryEffectsAttr(LLVMContextRef C, | |||
report_fatal_error("bad MemoryEffects."); | |||
} | |||
} | |||
|
|||
// Enable a fast-math flag | |||
// Enable all fast-math flags | |||
// | |||
// https://llvm.org/docs/LangRef.html#fast-math-flags | |||
extern "C" void LLVMRustSetFastMath(LLVMValueRef V) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the fast
intrinsics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep them for now and let people play with both variants. I hope that based on experience we can make an argument that the unsafety of the unsafe ones is not worth the optimizations that they unlock, but I have no data to back that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I am very much the opposite of a floating point expert, but you've been bounced around various reviewers so I'll do my best to review this and allow progress. In general, adding safer variants of FP intrinsics seems fine, as does converting some existing intrinsics to use them when there are known bugs (#120720) with the less safe variants. I've asked some questions above just to give myself a bit more certainty about this change. My final questions here are about the exact meaning of "intrinsics". I think these new algebraic intrinsics are internal only? And they are used to implement |
✌️ @saethlin, you can now approve this pull request! If @nnethercote told you to " |
e27d738
to
41fddb5
Compare
@bors r=nnethercote |
…nethercote Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison. And this uses the algebraic float ops to fix rust-lang#120720 cc `@orlp`
Rollup of 8 pull requests Successful merges: - rust-lang#120718 (Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison) - rust-lang#121195 (unstable-book: Separate testing and production sanitizers) - rust-lang#121205 (Merge `CompilerError::CompilationFailed` and `CompilerError::ICE`.) - rust-lang#121233 (Move the extra directives for `Mode::CoverageRun` into `iter_header`) - rust-lang#121256 (Allow AST and HIR visitors to return `ControlFlow`) - rust-lang#121307 (Drive-by `DUMMY_SP` -> `Span` and fmt changes) - rust-lang#121310 (Remove an old hack for rustdoc) - rust-lang#121311 (Make `is_nonoverlapping` `#[inline]`) r? `@ghost` `@rustbot` modify labels: rollup
@bors r- |
41fddb5
to
cc73b71
Compare
@bors r=nnethercote |
☀️ Test successful - checks-actions |
Finished benchmarking commit (bb8b11e): comparison URL. Overall result: ❌✅ regressions and improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 651.363s -> 651.642s (0.04%) |
…thercote Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison. And this uses the algebraic float ops to fix rust-lang#120720 cc `@orlp`
…thercote Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison. And this uses the algebraic float ops to fix rust-lang#120720 cc `@orlp`
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags
nnan
(no nans) andninf
(no infs) can produce poison.And this uses the algebraic float ops to fix #120720
cc @orlp