x86(64) runtime performance irregularities #31503

MagaTailor · 2016-02-09T02:40:40Z

Mandel-rust benchmark produces the following results:

https://gist.github.com/petevine/b70b6e5a434f23b40ab5

TL;DR
32-bit code performance looks like this:
P2(3) > Core2 > P4 (x86_64 too)

P2(3) being the only ones to scale on 2 cores in all benchmarks.

It's either a sign of LLVM being buggy or I was more right about P4 codegen producing suboptimal code than I'd ever suspected. (x86_64 is affected too so it could be something else though)

Naturally, the common theme could be the use of SSE2 which is absent from the fastest code:

Configuration: re1: -2.00, re2: 1.00, img1: -1.50, img2: 1.50, max_iter: 2048, img_size: 1024, num_threads: 2
Time taken for this run (serial): 2469.21302 ms
Time taken for this run (scoped_thread_pool): 1248.45883 ms
Time taken for this run (simple_parallel): 1284.73761 ms
Time taken for this run (rayon_join): 1246.36625 ms
Time taken for this run (rayon_par_iter): 1337.93075 ms
Time taken for this run (rust_scoped_pool): 1240.33273 ms
Time taken for this run (job_steal): 1241.20777 ms
Time taken for this run (job_steal_join): 1246.34885 ms
Time taken for this run (kirk_crossbeam): 1244.10723 ms

The text was updated successfully, but these errors were encountered:

MagaTailor · 2016-02-10T14:00:19Z

Doing a -C target-cpu=pentium2 -C target-feature=+sse2 immediately destroys performance (compared to just the first flag):

Configuration: re1: -2.00, re2: 1.00, img1: -1.50, img2: 1.50, max_iter: 2048, img_size: 1024, num_threads: 2
Time taken for this run (serial): 2762.07442 ms
Time taken for this run (scoped_thread_pool): 2256.02250 ms
Time taken for this run (simple_parallel): 2244.52544 ms
Time taken for this run (rayon_join): 1429.93142 ms
Time taken for this run (rayon_par_iter): 1392.21168 ms
Time taken for this run (rust_scoped_pool): 2252.52324 ms
Time taken for this run (job_steal): 2268.11608 ms
Time taken for this run (job_steal_join): 1417.85656 ms
Time taken for this run (kirk_crossbeam): 2259.48977 ms

Looks like the autovectorizer might be too eager and the availability of SSE2 is actually detrimental.
Using target-feature=+sse4.1 improves performance but doesn't get all of it back on x86_64

MagaTailor · 2016-08-15T11:48:54Z

#35662 (comment)

MagaTailor · 2016-08-19T21:24:01Z

@eddyb I've profiled just the affected benchmarks together, and turning SSE2 on causes almost a 2x slowdown in multithreaded code.
x87-profile.txt
sse2-profile.txt

MagaTailor · 2016-08-21T18:38:37Z

@eddyb As the single-threaded version of this benchmark is also affected (20% slower) I profiled just that and produced the main assembly files annotated with operf. I hope they might give you a clue what's wrong here.

x87_asm.txt
sse2_asm.txt

eddyb · 2016-08-21T18:41:16Z

Honestly, I don't know what to say other than maybe LLVM's cost model is inaccurate for your CPU?

cc @rust-lang/compiler @pcwalton

MagaTailor · 2016-08-21T18:50:54Z

More like inhibiting optimisations in the default Pentium 4 or generic x86_64 codegen (or simply +sse2) . I was inquisitive enough to discover basic i686 produces the fastest code for this codebase. (and that includes many multithreaded libs like rayon and crossbeam).

MagaTailor changed the title ~~x86 runtime performance irregularities~~ x86(64) runtime performance irregularities Feb 9, 2016

steveklabnik added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Feb 15, 2016

MagaTailor closed this as completed Sep 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86(64) runtime performance irregularities #31503

x86(64) runtime performance irregularities #31503

MagaTailor commented Feb 9, 2016

MagaTailor commented Feb 10, 2016

MagaTailor commented Aug 15, 2016

MagaTailor commented Aug 19, 2016 •

edited

Loading

MagaTailor commented Aug 21, 2016

eddyb commented Aug 21, 2016

MagaTailor commented Aug 21, 2016

x86(64) runtime performance irregularities #31503

x86(64) runtime performance irregularities #31503

Comments

MagaTailor commented Feb 9, 2016

MagaTailor commented Feb 10, 2016

MagaTailor commented Aug 15, 2016

MagaTailor commented Aug 19, 2016 • edited Loading

MagaTailor commented Aug 21, 2016

eddyb commented Aug 21, 2016

MagaTailor commented Aug 21, 2016

MagaTailor commented Aug 19, 2016 •

edited

Loading