Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Add fmsub/fnmsub/fnmadd instruction lowerings #8588

Merged
merged 1 commit into from
May 9, 2024

Conversation

afonso360
Copy link
Contributor

👋 Hey,

This PR adds lowering rules for the fmsub/fnmsub/fnmadd instructions.

These instructions were already implemented in the backend, but had no lowering rules associated with them, so they were never emitted.

@afonso360 afonso360 added the cranelift:area:riscv64 Issues related to the RISC-V 64 backend. label May 9, 2024
@afonso360 afonso360 requested a review from a team as a code owner May 9, 2024 10:12
@afonso360 afonso360 requested review from abrown and removed request for a team May 9, 2024 10:12
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label May 9, 2024
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@alexcrichton alexcrichton added this pull request to the merge queue May 9, 2024
Merged via the queue into bytecodealliance:main with commit fe433b7 May 9, 2024
31 checks passed
@jameysharp
Copy link
Contributor

This is neat!

Out of curiosity, can't you use these same fnmsub/fnmadd instructions if y is negated instead of x? Similarly if both x and y are negated, I think you can use fmadd/fmsub, depending on whether z is negated or not. And of course you could also match on (fneg (fma ...)), leading to 16 possible rules in total.

That said, I'm a little worried about whether fusing negation into an fma can change its results. I'm hoping the answer is "no", but do either of you happen to know for sure and can you explain it to me? I'm reasonably confident that negating the result is equivalent to negating both terms and both cases are safe to fuse without loss of precision. I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

Anyway, I thought way too much about this and I think you can fold those 16 cases into four rules based on how many of x, y, z, or the result are wrapped in fneg:

  • Even number of fnegs in (result, x, y): use fm*
  • Odd number of fnegs in (result, x, y): use fnm*
  • Even number of fnegs in (x, y, z): use *add
  • Odd number of fnegs in (x, y, z): use *sub

With the right helpers you can write the four rules all at the same priority. This is a little awkward and would be easier if ISLE had or-patterns or match expressions or something, but I think the following should work and generate about as good of a pattern-matching tree as we're capable of right now.

(type IsFneg (enum (Result (negate u64) (value Value))))

(decl is_fneg (Value) IsFneg)
(rule 1 (is_fneg (fneg x)) (IsFneg.Result 1 x))
(rule 0 (is_fneg x) (IsFneg.Result 0 x))

(rule (lower (has_type ty (fma x y z)))
  (if-let (IsFneg.Result neg_x x) (is_fneg x))
  (if-let (IsFneg.Result neg_y y) (is_fneg y))
  (if-let (IsFneg.Result neg_z z) (is_fneg z))
  (rv_fma ty (u64_xor neg_x neg_y) neg_z x y z))

; this rule will need a rule priority to not overlap with other top-level fneg rules
(rule (lower (has_type ty (fneg (fma x y z))))
  (if-let (IsFneg.Result neg_x x) (is_fneg x))
  (if-let (IsFneg.Result neg_y y) (is_fneg y))
  (if-let (IsFneg.Result neg_z z) (is_fneg z))
  (rv_fma ty (u64_xor 1 (u64_xor neg_x neg_y)) (u64_xor 1 neg_z) x y z))

; parity arguments indicate whether to negate the x*y term or the z term, respectively
(decl rv_fma (Type u64 u64 Value Value Value) InstOutput)
(rule 0 (rv_fma (ty_scalar_float ty) 0 0 x y z) (rv_fmadd ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 0 1 x y z) (rv_fmsub ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 1 0 x y z) (rv_fnmsub ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 1 1 x y z) (rv_fnmadd ty (FRM.RME) x y z))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 0 0 x y z) (rv_vfmacc_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 0 1 x y z) (rv_vfmsac_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 1 0 x y z) (rv_vfnmsac_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 1 1 x y z) (rv_vfnmacc_vv z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 0 0 (splat x) y z) (rv_vfmacc_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 0 1 (splat x) y z) (rv_vfmsac_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 1 0 (splat x) y z) (rv_vfnmsac_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 1 1 (splat x) y z) (rv_vfnmacc_vf z y x (unmasked) ty))

Also I think you might be able to handle (splat y) as well because I guess floating-point multiplication is commutative assuming NaN canonicalization. So you can safely swap x and y, if I'm not mistaken…

@afonso360
Copy link
Contributor Author

afonso360 commented May 10, 2024

I'm not entirely sure, for these 4 instructions I pretty much copied the LLVM lowerings for them. I also briefly looked at our x64 backend and noted that it had some similar rules for fnmadd. I can't really explain these transformations in great detail, FP Math makes my head hurt 😖 .

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

That case isn't represented in our X64 backend, and I'm not entirely sure how to prove it. My z3 skills are non existent, but I probably can implement it for the x64 backend and fuzz it for a few days.

I also don't think magnitudes should matter here, fneg is a bitwise operation, it shouldn't do any early rounding that would be affected by magnitude.

Edit: I was searching for related fma transformations and found this alive2 testcase that seems like a good starting point to try to play around and prove these transformations. I've never used alive, but I'll give it a go this afternoon, seems like it should be fairly easy.


(fneg (fma ...)) I think is a no-go since LLVM has this rule, but only with the nsz (No Signed Zero) flag, which means it probably swaps -0.0 with 0.0 or vice-versa.


Adding the fneg y variations seems like it would work, at least we have those rules in the x64 backend. And de-duplicating those rules with the vector rules seems really neat, since we have pretty much the same instructions for both. I'll try and merge those rules together like you suggested.

@afonso360
Copy link
Contributor Author

afonso360 commented May 11, 2024

So, I played around with alive for a bit. Really cool tool! Unfortunately the compiler explorer for alive2 has a short timeout so I had to run these locally.

Out of curiosity, can't you use these same fnmsub/fnmadd instructions if y is negated instead of x?

Yes you can! Here's the proof that I built for this and it does validate.

Result
afonso@DESKTOP-1AHKMV2:~/git/alive2/build$ ./alive-tv --disable-undef-input --smt-to=1800000  ./tgt.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %negx = fneg half %x
  %fma = fma half %negx, half %y, half %z
  ret half %fma
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %fma = fma half %x, half %negy, half %z
  ret half %fma
}
Transformation seems to be correct!

I'm reasonably confident that negating the result is equivalent to negating both terms and both cases are safe to fuse without loss of precision.

This one doesn't seem to pan out, it seems to have the issue with the signed zero. Proof

Result
afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./neg_fma_eq_fma_negxz.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %fma = fma half %x, half %y, half %z
  %n = fneg half %fma
  ret half %n
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %negz = fneg half %z
  %fma = fma half %x, half %negy, half %negz
  ret half %fma
}
Transformation doesn't verify!

ERROR: Value mismatch

Example:
half %x = #x0000 (+0.0)
half %y = #x8100 (-0.000015258789?)
half %z = #x0000 (+0.0)

Source:
half %fma = #x0000 (+0.0)
half %n = #x8000 (-0.0)

Target:
half %negy = #x0100 (0.000015258789?)
half %negz = #x8000 (-0.0)
half %fma = #x0000 (+0.0)
Source value: #x8000 (-0.0)
Target value: #x0000 (+0.0)

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y

I can't really test this using alive, since it doesn't have a fmsub intrinsic. But I did something similar using fmul+f{add,sub}, which has 2 rounding steps instead of one, which is what I think really matters here. In this case the proof does check out and you can remove an intermediary fneg. Proof

Result
afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./fma_split_neg.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %negz = fneg half %z
  %res = fadd half %mul, %negz
  ret half %res
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %res = fsub half %mul, %z
  ret half %res
}
Transformation seems to be correct!

We should probably add a mid end rule for this case!

Similarly if both x and y are negated, I think you can use fmadd/fmsub, depending on whether z is negated or not.

I'm not too worried about this case because it's already covered by a mid end rule. That being said, it looks like It comes pretty much for free with the solution that you proposed above, so might as well have it.

I'm going to write up a PR using that. Thanks for looking at this in so much detail!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:riscv64 Issues related to the RISC-V 64 backend. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants