-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement abstraction over mul_add #22
Conversation
Apparently, the codegen for mul_add is really bad when FMA is disabled. Let's see if this is true
I'm actually not sure if it'd be worth using |
I read up on libc::fmaf, and it seems like this change will result in reduced accuracy. |
Interesting. I'll accept this for now, but I think I actually want to upstream this to the Rust stdlib or at least have them consider it. It seems like a common footgun. |
It's a somewhat common misconception; can't find the link now but some people have been discussing the advantages and disadvantages of mul_add in some GH issue (clippy I think? they have a lint that recommends mul_add over The thing is that you shouldn't use So I guess it was our fault for using |
It should give improvements when built with FMA in most cases, because without |
Yeah that's what I meant. It's a bit weird because on the one hand, rustc doesn't seem to generate FMA instructions without mul_add. On the other hand, |
Not surprisingly, Rust won't fix this as it alters the result: rust-lang/rust#112192 There is some more in-depth talk about a |
The codegen for
mul_add
is a bit nonsensical when FMA is disabled: Compiler ExplorerI think
fmaf
is a libc function? Anyways, THIS is actually the main cause for non-FMA slowdowns:Tested with
RUSTFLAGS="-Ctarget-cpu=native -Ctarget-feature=-fma
.Before:
After: