-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abnormal long execute time running wasm code with Wasmtime #7246
Comments
It looks like wasmtime wasn't compiled in release mode, would you be able to re-test using the |
Would you be able to help diagnose what's happening locally? I'm unable to reproduce this and the runtime for this module isn't too bad locally. If you could run
and then copy/paste the top 10-20 functions into this issue that'll help explain where all the time is going. One thing I've noticed is that this module is trying to write to file descriptor 0 inside the wasm itself. Wasmtime is returning Nevertheless though I'm not sure what's taking 20+ seconds for you in Wasmtime. Locally this module executes in 1-3 seconds for me. I unfortunately don't have access to native x86_64 hardware though and so this could be specific to that. |
I'm also unable to reproduce this on x86, it takes ~3 seconds for me. With 98.86% of the time being spent on |
An IPC of 0.08 is abnornally low. Could you show the annotated assembly of |
Yes. The amount of time spent in what should be trivial instructions (mov between registers and xor) is weird. |
Do you have any idea about why it happened? Anything I can do to figure it out? |
Could you share the output of |
Here is my output of
|
Hi, I tried the same code on another machine. The result is almost The OS of this machine is Here is the output of
|
Ok thanks again for both reporting this and helping us follow-up! @elliottt was able to reproduce on hardware he has which resulted in us bottoming out in #7283 as the cause of this issue. I'm going to close this in favor of that issue to keep the issue discussion a bit shorter, but if you'd like to confirm locally @unicornt you can comment out these lines in a custom Wasmtime build and the performance of this program should be better. Note though that it's just a proof-of-concept of what the slowdown is and is not a full complete fix, that'll get tracked by #7283 |
I'm using a MacBook Pro with a 2.6 GHz 6-Core Intel Core i7 CPU. I attempted to run the .wasm module provided by @unicornt , and it was indeed quite slow, taking around 27s. I also tried wasmer, and it took about 7s. Following @alexcrichton's advice, I commented out the last 4 lines in |
@alexcrichton Hi, I try to reduce the above
But I find that the performance regression is not only related to the #7283, because if I change the constant in
By the way, I also check the binary generated by I think it is also a strange case I cannot understand and explain... ;; reduce.wat
(module
(type (;0;) (func (param i32)))
(type (;1;) (func))
(type (;2;) (func (result i32)))
(type (;3;) (func (param i32) (result i32)))
(import "wasi_snapshot_preview1" "proc_exit" (func (;0;) (type 0)))
(func (;2;) (type 1)
call 2
i32.const 0
call 0
unreachable)
(func (;2;) (type 1)
(local f64 f64 i32)
f64.const 5.43231e-312 ;; Magic Number!!!
f64.const 1.0
f64.mul
local.set 1
loop ;; label = @1
local.get 1
local.get 0
f64.add
local.set 0
local.get 2 ;; condition variable i
i32.const 1
i32.add
local.tee 2 ;; i++
i32.const 312500000
i32.ne
br_if 0 (;@1;)
end
)
(table (;0;) 2 2 funcref)
(memory (;0;) 8192 8192)
(global (;0;) (mut i32) (i32.const 66576))
(export "memory" (memory 0))
(export "__indirect_function_table" (table 0))
(export "_start" (func 1))) Environment Version
Also, I use the inline assembly to run the binary generated by
#include <stdint.h>
double X = 1.0;
uint64_t test_wasmtime() {
__asm__ __volatile__(
"vxorpd %%xmm5,%%xmm5,%%xmm6\n\t"
"xor %%r11d,%%r11d\n\t"
"START1%=:\n\t"
"add $0x1,%%r11d\n\t"
"movabs $0x100000264d8,%%rsi\n\t"
"vmovq %%rsi,%%xmm0\n\t"
"vmulsd %0,%%xmm0,%%xmm7\n\t"
"vaddsd %%xmm6,%%xmm7,%%xmm6\n\t"
"cmp $0x12a05f20,%%r11d\n\t"
"jne START1%=\n\t":"=m"(X)::"rsi","r11");
return 0;
}
uint64_t test_wasmer() {
__asm__ __volatile__(
"movabs $0x100000264d8,%%r11\n\t"
"movq %%r11,%%xmm9\n\t"
"mulsd %0,%%xmm9\n\t"
"xorpd %%xmm7,%%xmm7\n\t"
"xor %%r8d,%%r8d\n\t"
"START2%=:\n\t"
"movdqa %%xmm9,%%xmm15\n\t"
"addsd %%xmm7,%%xmm15\n\t"
"add $0x1,%%r8d\n\t"
"cmp $0x12a05f20,%%r8d\n\t"
"setne %%dil\n\t"
"movzbl %%dil,%%edi\n\t"
"test %%edi,%%edi\n\t"
"je EXIT2%=\n\t"
"movdqa %%xmm15,%%xmm7\n\t"
"jmp START2%=\n\t"
"EXIT2%=:\n\t":"=m"(X)::"r11","r8","rdi");
return 0;
}
int main() {
test_wasmtime();
// test_wasmer();
return 0;
} |
Thanks for the follow-up @hungryzzz, although I've got a few questions for you. It's known that Wasmtime is slow with respect to this computation on its Generating the x64 assembly with #7306 I get:
where here the conditional jump at 0x81 goes to 0x72 meaning that the |
@alexcrichton Hi, the performance regression disappeared in the latest But I think the performance regression in the old version After searching, we find the reason from this paper On Subnormal Floating Point and Abnormal Timing.
This paper finds that floating point operands ranging from ;; reduce.wat
(module
(type (;0;) (func (param i32)))
(type (;1;) (func))
(type (;2;) (func (result i32)))
(type (;3;) (func (param i32) (result i32)))
(import "wasi_snapshot_preview1" "proc_exit" (func (;0;) (type 0)))
(func (;2;) (type 1)
call 2
i32.const 0
call 0
unreachable)
(func (;2;) (type 1)
(local f64 f64 i32)
f64.const 5.43231e-312 ;; Magic Number!!!
f64.const 1.0
f64.mul
local.set 1
loop ;; label = @1
local.get 1
local.get 0
f64.add
local.set 0
local.get 2 ;; condition variable i
i32.const 1
i32.add
local.tee 2 ;; i++
i32.const 312500000
i32.ne
br_if 0 (;@1;)
end
)
(table (;0;) 2 2 funcref)
(memory (;0;) 8192 8192)
(global (;0;) (mut i32) (i32.const 66576))
(export "memory" (memory 0))
(export "__indirect_function_table" (table 0))
(export "_start" (func 1))) |
Fascinating! Not what I would have expected but helps explain the results for sure! |
Test Case
report.zip
Steps to Reproduce
Actual Results
Wasmtime
takes an abnormal long time to execute this wasm file and no exception is thrown.Expected Results
Other runtime use much less time or report an
out of memory access
exceptionVersions and Environment
Wasmtime version or commit: 15.0.0
Operating system: Ubuntu 20.04
Architecture: x86_64
The text was updated successfully, but these errors were encountered: