-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: new implementations for nearest lib calls #2171
runtime: new implementations for nearest lib calls #2171
Conversation
Thanks for this! Do you have some wall-clock benchmarks as well which show the improvement? |
no, I just compared metrics from llvm-mca. Btw it will be great to have some online benchmark tools like |
Here benchmark results on MacBook Pro (15-inch 2019, 2,3 GHz 8-cores i9):
So it seems new proposed approach in 3 times faster And btw all this was expected from llvm-mca metrics. I choose |
|
The wasmtime/cranelift/codegen/meta/src/isa/x86/encodings.rs Lines 1341 to 1345 in 5c5a30f
wasmtime/cranelift/codegen/src/isa/x64/lower.rs Line 1749 in 8ac4bd1
|
@bjorn3 Good to know. Thanks! |
Also added sse 4.1 intrinsic to gist. Upd
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me; just two possible optimization ideas:
833c27f
to
356dd1e
Compare
Squashed commits |
use approach with copysign for handling negative zero format refactor for better branch prediction move copysign back to internal branch format fix use abs instead branches better comments switch arms for better branch prediction
a7512f0
to
9e58da4
Compare
Great, thanks! |
As @MaxGraey pointed out (thanks!) in bytecodealliance#4397, `round` has different behavior from `nearest`. And it looks like the native rust implementation is still pending stabilization. Right now we duplicate the wasmtime implementation, merged in bytecodealliance#2171. However, we definitely should switch to the rust native version when it is available.
As @MaxGraey pointed out (thanks!) in #4397, `round` has different behavior from `nearest`. And it looks like the native rust implementation is still pending stabilization. Right now we duplicate the wasmtime implementation, merged in #2171. However, we definitely should switch to the rust native version when it is available.
More efficient implementations for
wasmtime_f32_nearest
andwasmtime_f64_nearest
based on musl'srint
andrintf
implementations.new / old comparison: https://godbolt.org/z/Gxz3bP
Also instruction's metrics for new approach with if / else branch for handling
-0.0
:and with new approach but using
copysign
at the end for handling-0.0
:Benchmark results
Upd So I chose the second approach. Also it branchless on ARM32
Upd 2
Another possible approach:
But this approach has lower IPC