Cranelift: Optimize `op+splat` into `splat+op` in the mid-end #6828

afonso360 · 2023-08-09T21:56:13Z

👋 Hey,

Feature

This was pointed out by @jameysharp in #6815 (review)!

We should try to transform (op (splat x) (splat y) ...) into (splat (op x y ...)) for operations that support this.

Benefit

This transforms SIMD operations into their scalar counterpart which should be beneficial. We also have better constant propagation on scalars, so this is also an opportunity to do that further.

RISC-V specifically really benefits from this optimization since we have opcodes that can eat the splat on the transformed version.

Implementation

The e-graphs mid end is awesome for stuff like this! Here's one rule:

(rule (simplify (imul ty (splat ty x) (splat ty y)))
      (splat ty (imul (lane_type ty) x y)))

Pasting this into cranelift/codegen/src/opts/vector.isle makes this tranform work on imul!

You can try running this test to verify that it works:

test optimize
set opt_level=speed
target x86_64

function %imul_splat_into_splat_imul(i64, i64) -> i64x2 {
block0(v0: i64, v1: i64):
  v2 = splat.i64x2 v0
  v3 = splat.i64x2 v1
  v4 = imul.i64x2 v2, v3
  return v4
  ; check: v5 = imul v0, v1
  ; check: v6 = splat.i64x2 v5
  ; check: return v6
}

(Run this with cargo run -- test ./the-above.clif from the /cranelift directory)

This is also a good test to add to our testsuite in cranelift/filetests/filetests/egraph/....

There are so many opcodes that this optimization works for that I actually can't list them all. Here's a few: iadd,isub,imul,ineg,iabs,umulhi,smulhi,... Just look at our opcode list and a lot of them will work out.

Where it wont work:

Trapping operations (i.e. idiv / urem / srem). This doesn't work because we currently don't perform optimizations on these operations (see cranelift/egraphs: allow simplifying trapping arithmetic #5908)

Where it may not work:

Floating point operations. I'm not 100% sure about this one, it logically makes sense that it wouldn't change the result, but I'm just afraid of doing transforms on floats.

Alternatives

There are so many opcodes for which this rule can be implemented that it may be worth considering auto-generating it. However I doubt it would be worth the effort + maintenance complexity of that. Copy pasting a bunch of times is sometimes better!

The text was updated successfully, but these errors were encountered:

gurry · 2023-08-14T11:24:39Z

@afonso360 I'd like to pick this one up. Will be doing some reading on ISLE first.

gurry · 2023-08-16T10:53:15Z

@afonso360 How do I see the generated code after all the mid-end opts are applied to a func? Some of my tests are failing for some operators. For example for band_not for this simplify rule:

(rule (simplify (band_not ty (splat ty x) (splat ty y)))
      (splat ty (band_not (lane_type ty) x y)))

the following test fails:

function %band_not_splat_into_splat_band_not(i64, i64) -> i64x2 {
block0(v0: i64, v1: i64):
  v2 = splat.i64x2 v0
  v3 = splat.i64x2 v1
  v4 = band_not.i64x2 v2, v3
  return v4
  ; check: v5 = band_not v0, v1
  ; check: v6 = splat.i64x2 v5
  ; check: return v6
}

with the following error:

FAIL filetests\filetests\egraph\simd-splat-simplify.clif: optimize

Caused by:
    filecheck failed for function on line 38:
    #0 check: v5 = band_not v0, v1
    #1 check: v6 = splat.i64x2 v5
    #2 check: return v6
    > function %band_not_splat_into_splat_band_not(i64, i64) -> i64x2 fast {
    Missed #0: \bv5 = band_not v0, v1\b
    > block0(v0: i64, v1: i64):
    >     v2 = splat.i64x2 v0
    >     v3 = splat.i64x2 v1
    >     v5 = bnot v3
    >     v4 = band v2, v5
    >     return v4
    > }

1356 tests
Error: 1 failure
error: test failed, to rerun pass `--test filetests`

So I just wanted to debug and see how the generated code differs from the test expectation.

afonso360 · 2023-08-16T11:01:01Z

Oh! Right I forgot that we have a few special opcodes like band_not that are transformed into multiple ops.

We should be able to ignore those since the mid-end will never see them (I think!) before they are expanded into multiple operations. With everything that you have on that PR, I think you are just missing the transform for bnot and we should be able to transform those two ops.

But to check the optimized code you can use the slightly complicated command:

cargo run -- compile --target x86_64 --set opt_level=speed -p ./the-file.clif

This runs the whole compile pipeline, and -p prints the clif file before being sent to the backends for converting into machine code.

gurry · 2023-08-16T11:42:32Z

Thanks for the info @afonso360 .

The test for bnot was failing for me for some reason. I'll investigate it now with the help of the command you shared.

What about the family of shift and rotate ops such as ishl and rotl? Will this optimization be applicable to them?

afonso360 · 2023-08-16T11:53:49Z

What about the family of shift and rotate ops such as ishl and rotl? Will this optimization be applicable to them?

Yeah, they should be! However note that ishl and friends only support a scalar rhs (the number of bits to shift) and reject vectors in that position.

So the optimization is slightly different. We only need to ensure that the lhs is splatted. So it would look something like this
(ishl (splat lhs) rhs) -> (splat (ishl lhs rhs)).

It's also probably worth adding a comment with that reasoning in the code! I didn't remember it myself until I tried it.

The verifier shouldn't even let you build a ishl with a vector rhs. A good way to check what kinds of values are supported by each operation is to read the documentation for it. We'll usually have a remark stating: "this supports scalar or vectors", or this is scalar only. Something along those lines.

afonso360 mentioned this issue Aug 9, 2023

riscv64: Add SIMD IFMA instruction support #6815

Merged

gurry mentioned this issue Aug 16, 2023

Add splat simplify opt for various ops #6851

Merged

afonso360 closed this as completed in #6851 Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: Optimize `op+splat` into `splat+op` in the mid-end #6828

Cranelift: Optimize `op+splat` into `splat+op` in the mid-end #6828

afonso360 commented Aug 9, 2023

gurry commented Aug 14, 2023

gurry commented Aug 16, 2023

afonso360 commented Aug 16, 2023 •

edited

Loading

gurry commented Aug 16, 2023

afonso360 commented Aug 16, 2023 •

edited

Loading

Cranelift: Optimize op+splat into splat+op in the mid-end #6828

Cranelift: Optimize op+splat into splat+op in the mid-end #6828

Comments

afonso360 commented Aug 9, 2023

Feature

Benefit

Implementation

Alternatives

gurry commented Aug 14, 2023

gurry commented Aug 16, 2023

afonso360 commented Aug 16, 2023 • edited Loading

gurry commented Aug 16, 2023

afonso360 commented Aug 16, 2023 • edited Loading

Cranelift: Optimize `op+splat` into `splat+op` in the mid-end #6828

Cranelift: Optimize `op+splat` into `splat+op` in the mid-end #6828

afonso360 commented Aug 16, 2023 •

edited

Loading

afonso360 commented Aug 16, 2023 •

edited

Loading