Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison #120718

Merged
merged 1 commit into from
Feb 21, 2024

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Feb 6, 2024

Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags nnan (no nans) and ninf (no infs) can produce poison.

And this uses the algebraic float ops to fix #120720

cc @orlp

@rustbot
Copy link
Collaborator

rustbot commented Feb 6, 2024

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 6, 2024
@saethlin saethlin added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Feb 6, 2024
@saethlin saethlin force-pushed the reasonable-fast-math branch from 03e0aa5 to 744641f Compare February 6, 2024 21:56
@saethlin saethlin changed the title Set only fast-math flags which don't produce poison Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison Feb 6, 2024
@saethlin saethlin force-pushed the reasonable-fast-math branch from 744641f to 6342116 Compare February 6, 2024 22:41
@orlp
Copy link
Contributor

orlp commented Feb 7, 2024

I'd like to support this (I mean, I suggested it to @saethlin), it seems that with just the current safe LLVM flags available a lot of optimizations are already possible (and more algebraically justified optimizations could be added in the future). Most notably, autovectorization and FMA conversion both work out of the box. From a quick test of this PR,

fn sum(arr: &[f32]) -> f32 {
    arr.iter().fold(0.0, |a, b| fadd_algebraic(a, *b))
}

generated excellent autovectorized code, completely safely.

The current f*_fast intrinsics are essentially unusable in generic code because they are instant UB on infinities/NaNs. It is for any non-trivial algorithm completely infeasible to verify that no infinities/NaNs exist in the input, let alone are produced as (intermediate) results. In a quick search through Github I've found every place that used f*_fast to be unsound. The safe algebraically optimizeable float operations are desperately needed.

@saethlin saethlin marked this pull request as ready for review February 7, 2024 02:09
@rustbot
Copy link
Collaborator

rustbot commented Feb 7, 2024

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

@saethlin saethlin added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Feb 7, 2024
@saethlin saethlin force-pushed the reasonable-fast-math branch from 6342116 to 27db5bc Compare February 7, 2024 17:03
@rustbot
Copy link
Collaborator

rustbot commented Feb 7, 2024

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@orlp
Copy link
Contributor

orlp commented Feb 8, 2024

Zulip thread with more context: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/.22algebraic.22.20fast-math.20intrinsics

Interestingly, the IEEE standard does actually say that a language standard should have functionality like this:

image

It says this should be per 'block' rather than per operation, but that's just syntax.

@cuviper
Copy link
Member

cuviper commented Feb 15, 2024

I think this needs more compiler review than libs...

r? compiler

@rustbot rustbot assigned petrochenkov and unassigned cuviper Feb 15, 2024
@saethlin saethlin force-pushed the reasonable-fast-math branch from 27db5bc to 740338c Compare February 15, 2024 22:40
@bors
Copy link
Contributor

bors commented Feb 16, 2024

☔ The latest upstream changes (presumably #120500) made this pull request unmergeable. Please resolve the merge conflicts.

@saethlin saethlin force-pushed the reasonable-fast-math branch from 740338c to d932173 Compare February 16, 2024 15:43
@petrochenkov
Copy link
Contributor

I'm on vacation.
r? compiler

@rustbot rustbot assigned nnethercote and unassigned petrochenkov Feb 17, 2024
I->setHasAllowReassoc(true);
I->setHasAllowContract(true);
I->setHasAllowReciprocal(true);
I->setHasNoSignedZeros(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Does the word "algebraic" have a specific meaning here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I see from other places that it does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Sure, why not. I'll add it.

Does the word "algebraic" have a specific meaning here?

I didn't want to just defang the existing intrinsics because there are some optimizations which rely on assuming that NaN/Inf do not occur. So I needed to come up with a new name, and the way I think about these intrinsics is that unlike IEEE float operations, they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c and a / b = a * (1 / b).

I don't think that the name is perfect, and I'd be happy to see someone suggest a better name, but @orlp seems perfectly happy calling them "algebraic".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c and a / b = a * (1 / b)

That would be a great explanation to put in a comment somewhere :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Sure, why not. I'll add it.

Please don't. Replacing functions by their approximations isn't an algebraically justified optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 And in any case, I don't think it would matter for these intrinsics.

@@ -1882,6 +1882,46 @@ extern "rust-intrinsic" {
#[rustc_nounwind]
pub fn frem_fast<T: Copy>(a: T, b: T) -> T;

/// Float addition that allows optimizations based on algebraic rules.
///
/// This intrinsic does not have a stable counterpart.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which meaning of "stable" does this use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to call this intrinsic is to use the core_intrinsics feature. We do not have a wrapper for these like the atomic intrinsics.

@@ -417,8 +417,7 @@ extern "C" LLVMAttributeRef LLVMRustCreateMemoryEffectsAttr(LLVMContextRef C,
report_fatal_error("bad MemoryEffects.");
}
}

// Enable a fast-math flag
// Enable all fast-math flags
//
// https://llvm.org/docs/LangRef.html#fast-math-flags
extern "C" void LLVMRustSetFastMath(LLVMValueRef V) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the fast intrinsics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep them for now and let people play with both variants. I hope that based on experience we can make an argument that the unsafety of the unsafe ones is not worth the optimizations that they unlock, but I have no data to back that up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nnethercote
Copy link
Contributor

I am very much the opposite of a floating point expert, but you've been bounced around various reviewers so I'll do my best to review this and allow progress.

In general, adding safer variants of FP intrinsics seems fine, as does converting some existing intrinsics to use them when there are known bugs (#120720) with the less safe variants. I've asked some questions above just to give myself a bit more certainty about this change.

My final questions here are about the exact meaning of "intrinsics". I think these new algebraic intrinsics are internal only? And they are used to implement simd_reduce_{add,mul}_unordered, which are also internal-only intrinsics? And they are used to implement the externally visible _mm512_reduce_add_pd intrinsic mentioned in #120720? Though I see no mention of _mm512_reduce_add_pd anywhere within the rust-lang/rust repository, so I'm confused by that.

@saethlin saethlin force-pushed the reasonable-fast-math branch from e27d738 to 41fddb5 Compare February 19, 2024 21:02
@saethlin
Copy link
Member Author

@bors r=nnethercote

@bors
Copy link
Contributor

bors commented Feb 19, 2024

📌 Commit 41fddb5 has been approved by nnethercote

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2024
saethlin added a commit to saethlin/rust that referenced this pull request Feb 20, 2024
…nethercote

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison

Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.

And this uses the algebraic float ops to fix rust-lang#120720

cc `@orlp`
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 20, 2024
Rollup of 8 pull requests

Successful merges:

 - rust-lang#120718 (Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison)
 - rust-lang#121195 (unstable-book: Separate testing and production sanitizers)
 - rust-lang#121205 (Merge `CompilerError::CompilationFailed` and `CompilerError::ICE`.)
 - rust-lang#121233 (Move the extra directives for `Mode::CoverageRun` into `iter_header`)
 - rust-lang#121256 (Allow AST and HIR visitors to return `ControlFlow`)
 - rust-lang#121307 (Drive-by `DUMMY_SP` -> `Span` and fmt changes)
 - rust-lang#121310 (Remove an old hack for rustdoc)
 - rust-lang#121311 (Make `is_nonoverlapping` `#[inline]`)

r? `@ghost`
`@rustbot` modify labels: rollup
@saethlin
Copy link
Member Author

@bors r-
#121320 (comment)
The test should be only-x86_64

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 20, 2024
@saethlin saethlin force-pushed the reasonable-fast-math branch from 41fddb5 to cc73b71 Compare February 20, 2024 17:39
@saethlin
Copy link
Member Author

@bors r=nnethercote

@bors
Copy link
Contributor

bors commented Feb 20, 2024

📌 Commit cc73b71 has been approved by nnethercote

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 20, 2024
@bors
Copy link
Contributor

bors commented Feb 21, 2024

⌛ Testing commit cc73b71 with merge bb8b11e...

@bors
Copy link
Contributor

bors commented Feb 21, 2024

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing bb8b11e to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 21, 2024
@bors bors merged commit bb8b11e into rust-lang:master Feb 21, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Feb 21, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (bb8b11e): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.4% [0.2%, 3.4%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [2.8%, 2.8%] 1
Regressions ❌
(secondary)
3.4% [1.1%, 6.5%] 3
Improvements ✅
(primary)
-2.4% [-3.8%, -1.0%] 2
Improvements ✅
(secondary)
-2.7% [-5.3%, -0.9%] 9
All ❌✅ (primary) -0.7% [-3.8%, 2.8%] 3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 651.363s -> 651.642s (0.04%)
Artifact size: 310.99 MiB -> 311.03 MiB (0.01%)

GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this pull request Mar 5, 2024
…thercote

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison

Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.

And this uses the algebraic float ops to fix rust-lang#120720

cc `@orlp`
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Mar 8, 2024
…thercote

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison

Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.

And this uses the algebraic float ops to fix rust-lang#120720

cc `@orlp`
@saethlin saethlin deleted the reasonable-fast-math branch May 3, 2024 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Miscompilation in _mm512_reduce_add_pd, assumes values not NaN
10 participants