Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegen fix convert float to int on small types with riscv64. #6015

Closed
wants to merge 21 commits into from

Conversation

yuyang-ok
Copy link
Contributor

@yuyang-ok yuyang-ok commented Mar 14, 2023

This will fix #5992 and #5993
We should normalize float type when convert float to int on {i8,i16}.
Because riscv64 has no direct instruction to support it.

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 14, 2023
@yuyang-ok
Copy link
Contributor Author

@afonso360 Looks like The test failure indicates x64 not implement for small types.

cranelift/codegen/src/isa/riscv64/lower/isle.rs Outdated Show resolved Hide resolved
tests/spec_testsuite Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
@yuyang-ok
Copy link
Contributor Author

yuyang-ok commented Mar 15, 2023

@afonso360 I have rewritten to ISLE .
fmax_ is That I can't figure out a better name.
Maybe you can review code first.

Copy link
Contributor

@afonso360 afonso360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking better! I just noticed that we are applying the clamping to all fcvt_ rules, and I suspect that is going to break the other ones.

I'm going to run the fuzzer on this for a while to double check.

Also @jameysharp could you also take a look at this PR?


Edit: The fuzzer found the following:

Testcase
test interpret
test run
target riscv64gc

function %a(f64) -> i16 sext system_v {
block0(v6: f64):
    v16 = fcvt_to_sint_sat.i16 v6
    return v16
}

; run: %a(-NaN:0x7ffffff666666) == 0
Results
FAIL ./lmao.clif: run

Caused by:
    Failed test: run: %a(-NaN:0x7ffffff666666) == 0, actual: -32768
1 tests
Error: 1 failure

I'll see if I can get an example that is exclusive to the other fcvt instructions

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
@@ -1798,7 +1798,8 @@
(let
((result WritableReg (temp_writable_reg out_type))
(tmp WritableReg (temp_writable_reg $F64))
(_ Unit (emit (MInst.FcvtToInt is_sat result tmp rs is_signed in_type out_type))))
(rs2 Reg (clamp_fcvt_from_float in_type out_type rs is_signed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to target the whole family of fcvt_* instructions, but the clamp here is really only needed for fcvt_*_sat which do the saturating conversions.

@@ -2195,6 +2223,14 @@
(rule (flt $F32 a b) (fpu_rrr (FpuOPRRR.FltS) $I64 a b))
(rule (flt $F64 a b) (fpu_rrr (FpuOPRRR.FltD) $I64 a b))

(decl fmin_ (Type Reg Reg) Reg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can't name this fmin since it conflicts with the preexisting declaration for the fmin clif instruction. I guess adding a prefix to all RISCV instruction helpers would make sense? The x86 backend has x64_* so we could add something like rv_*, what do you guys think about this?

If everyone agrees I don't mind doing a pass on the backend adding the helpers and making sure everything is aligned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a prefix like rv_ is a good idea. But I also think it would be okay to merge this with the fmin_/fmax_ names, and clean it up later.

(max_reg Reg (imm in_ty (i32_2_float_bits max_value in_ty))))
(fmin_ in_ty tmp max_reg)))

(decl i8_i16_max_value (Type bool) i32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I have a slight preference for having these on ISLE as well, but I'm not too sure. @jameysharp what do you think about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree that I'd like to do as much in ISLE as possible.

I think there are a number of tricks we can use which might clean this up, but I'm having trouble putting them together into a complete suggestion.

The pure ISLE version of this probably would return the appropriate hex-literal float for each combination of type and is_signed. I'm not sure how easy that'll be to write, understand, or maintain.

Leaving some of the work to Rust, we can factor it differently to shorten the code. With signed conversion, we can start with i64::MAX or i64::MIN, then x >> (64 - ty.bits()), then use f32_bits(x as f32) or f64_bits(x as f64). Unsigned is the same except for starting with u64::MAX or u64::MIN and using unsigned shifts. Since we apparently only need to clamp for fits_in_16 types (why is that though?) we can replace 64 with 16 in the first two steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we apparently only need to clamp for fits_in_16 types (why is that though?)

We have native instructions for this operation in both 32 and 64 bit variants, but for <16 we clamp the values to the maximum range supported by i8/i16 and then do the 32 bit conversion.

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have super helpful advice. Mostly I can say that @afonso360, I think your feedback makes sense.

@@ -2195,6 +2223,14 @@
(rule (flt $F32 a b) (fpu_rrr (FpuOPRRR.FltS) $I64 a b))
(rule (flt $F64 a b) (fpu_rrr (FpuOPRRR.FltD) $I64 a b))

(decl fmin_ (Type Reg Reg) Reg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a prefix like rv_ is a good idea. But I also think it would be okay to merge this with the fmin_/fmax_ names, and clean it up later.

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
(max_reg Reg (imm in_ty (i32_2_float_bits max_value in_ty))))
(fmin_ in_ty tmp max_reg)))

(decl i8_i16_max_value (Type bool) i32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree that I'd like to do as much in ISLE as possible.

I think there are a number of tricks we can use which might clean this up, but I'm having trouble putting them together into a complete suggestion.

The pure ISLE version of this probably would return the appropriate hex-literal float for each combination of type and is_signed. I'm not sure how easy that'll be to write, understand, or maintain.

Leaving some of the work to Rust, we can factor it differently to shorten the code. With signed conversion, we can start with i64::MAX or i64::MIN, then x >> (64 - ty.bits()), then use f32_bits(x as f32) or f64_bits(x as f64). Unsigned is the same except for starting with u64::MAX or u64::MIN and using unsigned shifts. Since we apparently only need to clamp for fits_in_16 types (why is that though?) we can replace 64 with 16 in the first two steps.

yuyang-ok and others added 5 commits March 16, 2023 09:26
@yuyang-ok yuyang-ok requested a review from a team as a code owner March 30, 2023 04:12
@yuyang-ok yuyang-ok requested review from jameysharp and removed request for a team March 30, 2023 04:12
@yuyang-ok
Copy link
Contributor Author

@afonso360 any idea about this failure.

test interpret
test run
target riscv64gc

function %a(f64) -> i16 sext system_v {
block0(v6: f64):
    v16 = fcvt_to_sint_sat.i16 v6
    return v16
}

; run: %a(-NaN:0x7ffffff666666) == 0

@afonso360
Copy link
Contributor

Well, looks like we are not returning 0 for all NaN's. When testing this locally I got run: %b(-NaN:0x7ffffff666666) == 0, actual: -32768 so it seems like something is going wrong in the NaN checking.

Disassembly of the test case ``` Disassembly of 112 bytes: 0: 97 06 00 00 auipc a3, 0 4: 83 b6 c6 00 ld a3, 0xc(a3) 8: 6f 00 c0 00 j 0xc c: 00 00 00 00 .byte 0x00, 0x00, 0x00, 0x00 10: 00 00 e0 c0 .byte 0x00, 0x00, 0xe0, 0xc0 14: 53 83 06 f2 fmv.d.x ft6, a3 18: d3 15 a3 2a fmax.d fa1, ft6, fa0 1c: 97 0e 00 00 auipc t4, 0 20: 83 be ce 00 ld t4, 0xc(t4) 24: 6f 00 c0 00 j 0xc 28: 00 00 00 00 .byte 0x00, 0x00, 0x00, 0x00 2c: c0 ff df 40 .byte 0xc0, 0xff, 0xdf, 0x40 30: 53 87 0e f2 fmv.d.x fa4, t4 34: d3 88 e5 2a fmin.d fa7, fa1, fa4 38: 53 a8 18 a3 feq.d a6, fa7, fa7 3c: 63 02 08 02 beqz a6, 0x24 40: 53 98 08 c2 fcvt.w.d a6, fa7, rtz 44: 37 8f 00 00 lui t5, 8 48: 93 0f ff ff addi t6, t5, -1 4c: b3 7f f8 01 and t6, a6, t6 50: 13 58 f8 01 srli a6, a6, 0x1f 54: 13 18 f8 00 slli a6, a6, 0xf 58: 33 68 f8 01 or a6, a6, t6 5c: 6f 00 80 00 j 8 60: 13 08 00 00 mv a6, zero 64: 13 15 08 03 slli a0, a6, 0x30 68: 13 55 05 43 srai a0, a0, 0x30 6c: 67 80 00 00 ret ```

I haven't looked too close at this, but it looks like we only do the NaN check after the value is clamped, so could that be the reason for the issues?

I also tested building an equivalent test case using LLVM It looks like they do the NaN check on the original input.

@yuyang-ok
Copy link
Contributor Author

@afonso360 thanks.

@github-actions github-actions bot added the isle Related to the ISLE domain-specific language label May 9, 2023
@github-actions
Copy link

github-actions bot commented May 9, 2023

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton
Copy link
Member

Apologies I didn't see this before I ended up making #7327, but I believe that that PR covers this one so I've annotated that to close this when merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cranelift: RISC-V wrong result for fcvt_to_uint_sat.{i8,i16}
4 participants