Investiage `-Z mir-opt-level` more. #287

vext01 · 2021-03-05T11:50:13Z

We currently don't optimise the external workspace, as hardware tracing requires that LLVM doesn't re-order our blocks.

But that's not to say that we can't optimise the MIR before it is lowered to SIR. And this pretty much amounts to passing -Z mir-opt-level=X, where X is some value, e.g. 3.

I quickly tried passing -Z mir-opt-level=3 to see if the speedup would be dramatic. I fully expected breakage due to unimplemented codegen paths, but actually it wasn't that bad.

I had to implement comparison with constants, which I did quickly and inefficiently:

--- a/internal_ws/ykcompile/src/arch/x86_64/mod.rs
+++ b/internal_ws/ykcompile/src/arch/x86_64/mod.rs
@@ -864,6 +865,42 @@ impl TraceCompiler {
                 }
                 _ => unreachable!(),
             },
+            Location::Const{val, ..} => {
+                let val = val.i64_cast();
+                // CMP can't work on imm64 operands and the immediate operands it does support must
+                // appear on the right-hand side (which would invert our comparison condition). To
+                // work around this, we borrow RCX.
+                dynasm!(self.asm
+                    ; push rcx
+                    ; mov rcx, QWORD val
+                );
+                match ty.size() {
+                    8 => {
+                        dynasm!(self.asm
+                            ; cmp rcx, Rq(*TEMP_REG)
+                        );
+                    },
+                    4 => {
+                        dynasm!(self.asm
+                            ; cmp ecx, Rd(*TEMP_REG)
+                        );
+                    },
+                    2 => {
+                        dynasm!(self.asm
+                            ; cmp ax, Rw(*TEMP_REG)
+                        );
+                    },
+                    1 => {
+                        dynasm!(self.asm
+                            ; cmp al, Rb(*TEMP_REG)
+                        );
+                    },
+                    _ => unreachable!(),
+                }
+                dynasm!(self.asm
+                    ; pop rcx
+                );
+            }
             _ => todo!(),
         }
         dynasm!(self.asm

And then we don't lower constant structs (to get that right, I'd recommend addressing #139). This is only needed for the simple_multithreaded_interpreter test, which uses a constant ArcInner. I just commented the test.

Then the results for running the test suite (10 runs) look like:

single test thread before

1: cargo xtask test
            Mean        Std.Dev.    Min         Median      Max
real        3.114       0.046       3.032       3.115       3.226
user        15.699      1.312       13.055      15.525      17.881
sys         6.504       0.279       6.046       6.530       7.000

single test thread after

1: cargo xtask test
            Mean        Std.Dev.    Min         Median      Max
real        3.727       0.030       3.684       3.731       3.774
user        2.911       0.040       2.852       2.920       2.972
sys         0.719       0.031       0.655       0.722       0.764

many test threads before

1: cargo xtask test
            Mean        Std.Dev.    Min         Median      Max
real        3.072       0.035       3.030       3.063       3.159
user        15.313      0.485       14.307      15.278      16.120
sys         6.196       0.315       5.898       6.119       7.045

many test threads after

1: cargo xtask test
            Mean        Std.Dev.    Min         Median      Max
real        3.098       0.038       3.041       3.089       3.164
user        15.004      0.655       14.240      14.920      16.349
sys         6.649       0.247       6.398       6.570       7.244

This indicates very little performance change. We put this down to the code we trace in tests being small and probably not benefiting much from MIR (and thus SIR) optimisations.

We agreed to come back to this when we have a larger interpreter to test with.

The text was updated successfully, but these errors were encountered:

bjorn3 · 2021-03-05T12:05:02Z

Did you also enable -Zmir-opt-level=3 for building the sysroot? Crate metadata contains the already optimized MIR. It will not be further optimized in downstream crates.

vext01 · 2021-03-05T13:55:43Z

The sysroot is built as-per-usual (with optimisations), so I guess that will be either 2 or 3? Either way, we can't trace the std lib.

bjorn3 · 2021-03-05T14:00:09Z

-Zmir-opt-level=1 is the default. It doesn't depend on the optimization level. rust-lang/rust#82736 will introduce a level between 1 and 2 that will be the default in release mode though.

Either way, we can't trace the std lib.

Yes, but if any #[inline] or generic method from the std lib is called, it is codegened in the current crate and as such traced. Without -Zmir-opt-level=3 when building the sysroot, it will codegen a version that never had more mir optimizations applied than the default (-Zmir-opt-level=1).

vext01 · 2021-03-05T16:20:31Z

Repeating the experiment with opt-level 2 or 3 for the compiler, I often get this crash:

munmap_chunk(): invalid pointer

I'm guessing that's our fault.

bjorn3 · 2021-03-05T16:33:19Z

What is the backtrace?

vext01 · 2021-03-05T16:57:08Z

There's no rust backtrace, as the process crashes in a dirty way.

Getting a backtrace in gdb, the top frame is a ?? frame, so probably jitcode...

vext01 · 2021-03-05T16:59:25Z

Wait, I'm talking rubbish:

#0  0x00007ffff7b317bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7b1c535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7b73508 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff7b79c1a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff7b7b42c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00005555557ad1b1 in ?? ()
#6  0x00005555557c1ba8 in ?? ()
#7  0x00005555557ccfb0 in ?? ()
#8  0x0000555555b30f4a in ?? ()
#9  0x00007ffff7e4cfa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007ffff7bf34cf in clone () from /lib/x86_64-linux-gnu/libc.so.6

so it's libc cutting us short. That explains why RUST_BACKTRACE doesn't work.

vext01 · 2021-06-24T14:17:52Z

Closing, as this is for the old yk design.

vext01 closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investiage `-Z mir-opt-level` more. #287

Investiage `-Z mir-opt-level` more. #287

vext01 commented Mar 5, 2021

bjorn3 commented Mar 5, 2021

vext01 commented Mar 5, 2021

bjorn3 commented Mar 5, 2021 •

edited

Loading

vext01 commented Mar 5, 2021

bjorn3 commented Mar 5, 2021

vext01 commented Mar 5, 2021

vext01 commented Mar 5, 2021

vext01 commented Jun 24, 2021

Investiage -Z mir-opt-level more. #287

Investiage -Z mir-opt-level more. #287

Comments

vext01 commented Mar 5, 2021

single test thread before

single test thread after

many test threads before

many test threads after

bjorn3 commented Mar 5, 2021

vext01 commented Mar 5, 2021

bjorn3 commented Mar 5, 2021 • edited Loading

vext01 commented Mar 5, 2021

bjorn3 commented Mar 5, 2021

vext01 commented Mar 5, 2021

vext01 commented Mar 5, 2021

vext01 commented Jun 24, 2021

Investiage `-Z mir-opt-level` more. #287

Investiage `-Z mir-opt-level` more. #287

bjorn3 commented Mar 5, 2021 •

edited

Loading