riscv64: Add `fmsub`/`fnmsub`/`fnmadd` instruction lowerings #8588

afonso360 · 2024-05-09T10:12:09Z

👋 Hey,

This PR adds lowering rules for the fmsub/fnmsub/fnmadd instructions.

These instructions were already implemented in the backend, but had no lowering rules associated with them, so they were never emitted.

alexcrichton

Nice!

jameysharp · 2024-05-10T03:03:36Z

This is neat!

Out of curiosity, can't you use these same fnmsub/fnmadd instructions if y is negated instead of x? Similarly if both x and y are negated, I think you can use fmadd/fmsub, depending on whether z is negated or not. And of course you could also match on (fneg (fma ...)), leading to 16 possible rules in total.

That said, I'm a little worried about whether fusing negation into an fma can change its results. I'm hoping the answer is "no", but do either of you happen to know for sure and can you explain it to me? I'm reasonably confident that negating the result is equivalent to negating both terms and both cases are safe to fuse without loss of precision. I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

Anyway, I thought way too much about this and I think you can fold those 16 cases into four rules based on how many of x, y, z, or the result are wrapped in fneg:

Even number of fnegs in (result, x, y): use fm*
Odd number of fnegs in (result, x, y): use fnm*
Even number of fnegs in (x, y, z): use *add
Odd number of fnegs in (x, y, z): use *sub

With the right helpers you can write the four rules all at the same priority. This is a little awkward and would be easier if ISLE had or-patterns or match expressions or something, but I think the following should work and generate about as good of a pattern-matching tree as we're capable of right now.

(type IsFneg (enum (Result (negate u64) (value Value))))

(decl is_fneg (Value) IsFneg)
(rule 1 (is_fneg (fneg x)) (IsFneg.Result 1 x))
(rule 0 (is_fneg x) (IsFneg.Result 0 x))

(rule (lower (has_type ty (fma x y z)))
  (if-let (IsFneg.Result neg_x x) (is_fneg x))
  (if-let (IsFneg.Result neg_y y) (is_fneg y))
  (if-let (IsFneg.Result neg_z z) (is_fneg z))
  (rv_fma ty (u64_xor neg_x neg_y) neg_z x y z))

; this rule will need a rule priority to not overlap with other top-level fneg rules
(rule (lower (has_type ty (fneg (fma x y z))))
  (if-let (IsFneg.Result neg_x x) (is_fneg x))
  (if-let (IsFneg.Result neg_y y) (is_fneg y))
  (if-let (IsFneg.Result neg_z z) (is_fneg z))
  (rv_fma ty (u64_xor 1 (u64_xor neg_x neg_y)) (u64_xor 1 neg_z) x y z))

; parity arguments indicate whether to negate the x*y term or the z term, respectively
(decl rv_fma (Type u64 u64 Value Value Value) InstOutput)
(rule 0 (rv_fma (ty_scalar_float ty) 0 0 x y z) (rv_fmadd ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 0 1 x y z) (rv_fmsub ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 1 0 x y z) (rv_fnmsub ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 1 1 x y z) (rv_fnmadd ty (FRM.RME) x y z))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 0 0 x y z) (rv_vfmacc_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 0 1 x y z) (rv_vfmsac_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 1 0 x y z) (rv_vfnmsac_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 1 1 x y z) (rv_vfnmacc_vv z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 0 0 (splat x) y z) (rv_vfmacc_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 0 1 (splat x) y z) (rv_vfmsac_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 1 0 (splat x) y z) (rv_vfnmsac_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 1 1 (splat x) y z) (rv_vfnmacc_vf z y x (unmasked) ty))

Also I think you might be able to handle (splat y) as well because I guess floating-point multiplication is commutative assuming NaN canonicalization. So you can safely swap x and y, if I'm not mistaken…

afonso360 · 2024-05-10T09:56:56Z

I'm not entirely sure, for these 4 instructions I pretty much copied the LLVM lowerings for them. I also briefly looked at our x64 backend and noted that it had some similar rules for fnmadd. I can't really explain these transformations in great detail, FP Math makes my head hurt 😖 .

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

That case isn't represented in our X64 backend, and I'm not entirely sure how to prove it. My z3 skills are non existent, but I probably can implement it for the x64 backend and fuzz it for a few days.

I also don't think magnitudes should matter here, fneg is a bitwise operation, it shouldn't do any early rounding that would be affected by magnitude.

Edit: I was searching for related fma transformations and found this alive2 testcase that seems like a good starting point to try to play around and prove these transformations. I've never used alive, but I'll give it a go this afternoon, seems like it should be fairly easy.

(fneg (fma ...)) I think is a no-go since LLVM has this rule, but only with the nsz (No Signed Zero) flag, which means it probably swaps -0.0 with 0.0 or vice-versa.

Adding the fneg y variations seems like it would work, at least we have those rules in the x64 backend. And de-duplicating those rules with the vector rules seems really neat, since we have pretty much the same instructions for both. I'll try and merge those rules together like you suggested.

afonso360 · 2024-05-11T07:51:20Z

So, I played around with alive for a bit. Really cool tool! Unfortunately the compiler explorer for alive2 has a short timeout so I had to run these locally.

Out of curiosity, can't you use these same fnmsub/fnmadd instructions if y is negated instead of x?

Yes you can! Here's the proof that I built for this and it does validate.

Result

afonso@DESKTOP-1AHKMV2:~/git/alive2/build$ ./alive-tv --disable-undef-input --smt-to=1800000  ./tgt.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %negx = fneg half %x
  %fma = fma half %negx, half %y, half %z
  ret half %fma
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %fma = fma half %x, half %negy, half %z
  ret half %fma
}
Transformation seems to be correct!

I'm reasonably confident that negating the result is equivalent to negating both terms and both cases are safe to fuse without loss of precision.

This one doesn't seem to pan out, it seems to have the issue with the signed zero. Proof

Result

afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./neg_fma_eq_fma_negxz.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %fma = fma half %x, half %y, half %z
  %n = fneg half %fma
  ret half %n
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %negz = fneg half %z
  %fma = fma half %x, half %negy, half %negz
  ret half %fma
}
Transformation doesn't verify!

ERROR: Value mismatch

Example:
half %x = #x0000 (+0.0)
half %y = #x8100 (-0.000015258789?)
half %z = #x0000 (+0.0)

Source:
half %fma = #x0000 (+0.0)
half %n = #x8000 (-0.0)

Target:
half %negy = #x0100 (0.000015258789?)
half %negz = #x8000 (-0.0)
half %fma = #x0000 (+0.0)
Source value: #x8000 (-0.0)
Target value: #x0000 (+0.0)

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y

I can't really test this using alive, since it doesn't have a fmsub intrinsic. But I did something similar using fmul+f{add,sub}, which has 2 rounding steps instead of one, which is what I think really matters here. In this case the proof does check out and you can remove an intermediary fneg. Proof

Result

afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./fma_split_neg.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %negz = fneg half %z
  %res = fadd half %mul, %negz
  ret half %res
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %res = fsub half %mul, %z
  ret half %res
}
Transformation seems to be correct!

We should probably add a mid end rule for this case!

Similarly if both x and y are negated, I think you can use fmadd/fmsub, depending on whether z is negated or not.

I'm not too worried about this case because it's already covered by a mid end rule. That being said, it looks like It comes pretty much for free with the solution that you proposed above, so might as well have it.

I'm going to write up a PR using that. Thanks for looking at this in so much detail!

riscv64: Add fmsub/fnmsub/fnmadd instruction lowerings

3415a4a

afonso360 added the cranelift:area:riscv64 Issues related to the RISC-V 64 backend. label May 9, 2024

afonso360 requested a review from a team as a code owner May 9, 2024 10:12

afonso360 requested review from abrown and removed request for a team May 9, 2024 10:12

github-actions bot added the cranelift Issues related to the Cranelift code generator label May 9, 2024

alexcrichton approved these changes May 9, 2024

View reviewed changes

alexcrichton added this pull request to the merge queue May 9, 2024

Merged via the queue into bytecodealliance:main with commit fe433b7 May 9, 2024
31 checks passed

This was referenced May 11, 2024

riscv64: Improve pattern matching rules for FMA #8596

Merged

x64: Add support for fmsub and fnmsub instructions #8602

Closed

aarch64: Add support for fnmadd/fmsub/fnmsub #8603

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv64: Add `fmsub`/`fnmsub`/`fnmadd` instruction lowerings #8588

riscv64: Add `fmsub`/`fnmsub`/`fnmadd` instruction lowerings #8588

afonso360 commented May 9, 2024

alexcrichton left a comment

jameysharp commented May 10, 2024

afonso360 commented May 10, 2024 •

edited

Loading

afonso360 commented May 11, 2024 •

edited

Loading

riscv64: Add fmsub/fnmsub/fnmadd instruction lowerings #8588

riscv64: Add fmsub/fnmsub/fnmadd instruction lowerings #8588

Conversation

afonso360 commented May 9, 2024

alexcrichton left a comment

Choose a reason for hiding this comment

jameysharp commented May 10, 2024

afonso360 commented May 10, 2024 • edited Loading

afonso360 commented May 11, 2024 • edited Loading

riscv64: Add `fmsub`/`fnmsub`/`fnmadd` instruction lowerings #8588

riscv64: Add `fmsub`/`fnmsub`/`fnmadd` instruction lowerings #8588

afonso360 commented May 10, 2024 •

edited

Loading

afonso360 commented May 11, 2024 •

edited

Loading