-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86(64) runtime performance irregularities #31503
Comments
Doing a Configuration: re1: -2.00, re2: 1.00, img1: -1.50, img2: 1.50, max_iter: 2048, img_size: 1024, num_threads: 2
Time taken for this run (serial): 2762.07442 ms
Time taken for this run (scoped_thread_pool): 2256.02250 ms
Time taken for this run (simple_parallel): 2244.52544 ms
Time taken for this run (rayon_join): 1429.93142 ms
Time taken for this run (rayon_par_iter): 1392.21168 ms
Time taken for this run (rust_scoped_pool): 2252.52324 ms
Time taken for this run (job_steal): 2268.11608 ms
Time taken for this run (job_steal_join): 1417.85656 ms
Time taken for this run (kirk_crossbeam): 2259.48977 ms Looks like the autovectorizer might be too eager and the availability of SSE2 is actually detrimental. |
@eddyb I've profiled just the affected benchmarks together, and turning SSE2 on causes almost a 2x slowdown in multithreaded code. |
@eddyb As the single-threaded version of this benchmark is also affected ( |
Honestly, I don't know what to say other than maybe LLVM's cost model is inaccurate for your CPU? cc @rust-lang/compiler @pcwalton |
More like inhibiting optimisations in the default |
Mandel-rust benchmark produces the following results:
https://gist.github.com/petevine/b70b6e5a434f23b40ab5
TL;DR
32-bit code performance looks like this:
P2(3) > Core2 > P4 (x86_64 too)
P2(3) being the only ones to scale on 2 cores in all benchmarks.
It's either a sign of LLVM being buggy or I was more right about P4 codegen producing suboptimal code than I'd ever suspected. (x86_64 is affected too so it could be something else though)
Naturally, the common theme could be the use of SSE2 which is absent from the fastest code:
The text was updated successfully, but these errors were encountered: