-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codegen fix convert float to int on small types with riscv64. #6015
Conversation
@afonso360 Looks like The test failure indicates |
@afonso360 I have rewritten to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking better! I just noticed that we are applying the clamping to all fcvt_
rules, and I suspect that is going to break the other ones.
I'm going to run the fuzzer on this for a while to double check.
Also @jameysharp could you also take a look at this PR?
Edit: The fuzzer found the following:
Testcase
test interpret
test run
target riscv64gc
function %a(f64) -> i16 sext system_v {
block0(v6: f64):
v16 = fcvt_to_sint_sat.i16 v6
return v16
}
; run: %a(-NaN:0x7ffffff666666) == 0
Results
FAIL ./lmao.clif: run
Caused by:
Failed test: run: %a(-NaN:0x7ffffff666666) == 0, actual: -32768
1 tests
Error: 1 failure
I'll see if I can get an example that is exclusive to the other fcvt instructions
@@ -1798,7 +1798,8 @@ | |||
(let | |||
((result WritableReg (temp_writable_reg out_type)) | |||
(tmp WritableReg (temp_writable_reg $F64)) | |||
(_ Unit (emit (MInst.FcvtToInt is_sat result tmp rs is_signed in_type out_type)))) | |||
(rs2 Reg (clamp_fcvt_from_float in_type out_type rs is_signed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to target the whole family of fcvt_*
instructions, but the clamp here is really only needed for fcvt_*_sat
which do the saturating conversions.
@@ -2195,6 +2223,14 @@ | |||
(rule (flt $F32 a b) (fpu_rrr (FpuOPRRR.FltS) $I64 a b)) | |||
(rule (flt $F64 a b) (fpu_rrr (FpuOPRRR.FltD) $I64 a b)) | |||
|
|||
(decl fmin_ (Type Reg Reg) Reg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can't name this fmin
since it conflicts with the preexisting declaration for the fmin
clif instruction. I guess adding a prefix to all RISCV instruction helpers would make sense? The x86 backend has x64_*
so we could add something like rv_*
, what do you guys think about this?
If everyone agrees I don't mind doing a pass on the backend adding the helpers and making sure everything is aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a prefix like rv_
is a good idea. But I also think it would be okay to merge this with the fmin_
/fmax_
names, and clean it up later.
(max_reg Reg (imm in_ty (i32_2_float_bits max_value in_ty)))) | ||
(fmin_ in_ty tmp max_reg))) | ||
|
||
(decl i8_i16_max_value (Type bool) i32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I have a slight preference for having these on ISLE as well, but I'm not too sure. @jameysharp what do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I agree that I'd like to do as much in ISLE as possible.
I think there are a number of tricks we can use which might clean this up, but I'm having trouble putting them together into a complete suggestion.
The pure ISLE version of this probably would return the appropriate hex-literal float for each combination of type and is_signed. I'm not sure how easy that'll be to write, understand, or maintain.
Leaving some of the work to Rust, we can factor it differently to shorten the code. With signed conversion, we can start with i64::MAX
or i64::MIN
, then x >> (64 - ty.bits())
, then use f32_bits(x as f32)
or f64_bits(x as f64)
. Unsigned is the same except for starting with u64::MAX
or u64::MIN
and using unsigned shifts. Since we apparently only need to clamp for fits_in_16
types (why is that though?) we can replace 64
with 16
in the first two steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we apparently only need to clamp for fits_in_16 types (why is that though?)
We have native instructions for this operation in both 32 and 64 bit variants, but for <16 we clamp the values to the maximum range supported by i8/i16 and then do the 32 bit conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I have super helpful advice. Mostly I can say that @afonso360, I think your feedback makes sense.
@@ -2195,6 +2223,14 @@ | |||
(rule (flt $F32 a b) (fpu_rrr (FpuOPRRR.FltS) $I64 a b)) | |||
(rule (flt $F64 a b) (fpu_rrr (FpuOPRRR.FltD) $I64 a b)) | |||
|
|||
(decl fmin_ (Type Reg Reg) Reg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a prefix like rv_
is a good idea. But I also think it would be okay to merge this with the fmin_
/fmax_
names, and clean it up later.
(max_reg Reg (imm in_ty (i32_2_float_bits max_value in_ty)))) | ||
(fmin_ in_ty tmp max_reg))) | ||
|
||
(decl i8_i16_max_value (Type bool) i32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I agree that I'd like to do as much in ISLE as possible.
I think there are a number of tricks we can use which might clean this up, but I'm having trouble putting them together into a complete suggestion.
The pure ISLE version of this probably would return the appropriate hex-literal float for each combination of type and is_signed. I'm not sure how easy that'll be to write, understand, or maintain.
Leaving some of the work to Rust, we can factor it differently to shorten the code. With signed conversion, we can start with i64::MAX
or i64::MIN
, then x >> (64 - ty.bits())
, then use f32_bits(x as f32)
or f64_bits(x as f64)
. Unsigned is the same except for starting with u64::MAX
or u64::MIN
and using unsigned shifts. Since we apparently only need to clamp for fits_in_16
types (why is that though?) we can replace 64
with 16
in the first two steps.
Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Jamey Sharp <jamey@minilop.net>
@afonso360 any idea about this failure.
|
Well, looks like we are not returning 0 for all NaN's. When testing this locally I got Disassembly of the test case``` Disassembly of 112 bytes: 0: 97 06 00 00 auipc a3, 0 4: 83 b6 c6 00 ld a3, 0xc(a3) 8: 6f 00 c0 00 j 0xc c: 00 00 00 00 .byte 0x00, 0x00, 0x00, 0x00 10: 00 00 e0 c0 .byte 0x00, 0x00, 0xe0, 0xc0 14: 53 83 06 f2 fmv.d.x ft6, a3 18: d3 15 a3 2a fmax.d fa1, ft6, fa0 1c: 97 0e 00 00 auipc t4, 0 20: 83 be ce 00 ld t4, 0xc(t4) 24: 6f 00 c0 00 j 0xc 28: 00 00 00 00 .byte 0x00, 0x00, 0x00, 0x00 2c: c0 ff df 40 .byte 0xc0, 0xff, 0xdf, 0x40 30: 53 87 0e f2 fmv.d.x fa4, t4 34: d3 88 e5 2a fmin.d fa7, fa1, fa4 38: 53 a8 18 a3 feq.d a6, fa7, fa7 3c: 63 02 08 02 beqz a6, 0x24 40: 53 98 08 c2 fcvt.w.d a6, fa7, rtz 44: 37 8f 00 00 lui t5, 8 48: 93 0f ff ff addi t6, t5, -1 4c: b3 7f f8 01 and t6, a6, t6 50: 13 58 f8 01 srli a6, a6, 0x1f 54: 13 18 f8 00 slli a6, a6, 0xf 58: 33 68 f8 01 or a6, a6, t6 5c: 6f 00 80 00 j 8 60: 13 08 00 00 mv a6, zero 64: 13 15 08 03 slli a0, a6, 0x30 68: 13 55 05 43 srai a0, a0, 0x30 6c: 67 80 00 00 ret ```I haven't looked too close at this, but it looks like we only do the NaN check after the value is clamped, so could that be the reason for the issues? I also tested building an equivalent test case using LLVM It looks like they do the NaN check on the original input. |
@afonso360 thanks. |
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Apologies I didn't see this before I ended up making #7327, but I believe that that PR covers this one so I've annotated that to close this when merged. |
This will fix #5992 and #5993
We should normalize float type when convert float to int on {i8,i16}.
Because
riscv64
has no direct instruction to support it.