rustc: Improving safe wasm float->int casts #74695

alexcrichton · 2020-07-23T19:35:47Z

This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the nontrapping-fptoint feature is not enabled, but the feature
is not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in #74659.

Some more background for this issue is present on #73591, but the
general gist of the issue is that in LLVM the fptosi and fptoui
instructions are defined to return an undef value if they execute on
out-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing i32.trunc_f32_s (for
example) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the fptosi instruction
doesn't trap. Overall this means that a function like this:

#[no_mangle]
pub unsafe extern "C" fn cast(x: f64) -> u32 {
    x as u32
}

will generate this WebAssembly today:

(func $cast (type 0) (param f64) (result i32)
  (local i32 i32)
  local.get 0
  f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
  f64.gt
  local.set 1
  block  ;; label = @1
    block  ;; label = @2
      local.get 0
      f64.const 0x0p+0 (;=0;)
      local.get 0
      f64.const 0x0p+0 (;=0;)
      f64.gt
      select
      local.tee 0
      f64.const 0x1p+32 (;=4.29497e+09;)
      f64.lt
      local.get 0
      f64.const 0x0p+0 (;=0;)
      f64.ge
      i32.and
      i32.eqz
      br_if 0 (;@2;)
      local.get 0
      i32.trunc_f64_u
      local.set 2
      br 1 (;@1;)
    end
    i32.const 0
    local.set 2
  end
  i32.const -1
  local.get 2
  local.get 1
  select)

This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing fptosi, and instead LLVM's raw intrinsic for
the WebAssembly instruction is used instead. This effectively extends
the support added in #74659 to checked casts. After this commit the
codegen for the above Rust function looks like:

(func $cast (type 0) (param f64) (result i32)
  (local i32)
  block  ;; label = @1
    local.get 0
    f64.const 0x0p+0 (;=0;)
    f64.ge
    local.tee 1
    i32.const 1
    i32.xor
    br_if 0 (;@1;)
    local.get 0
    f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
    f64.le
    i32.eqz
    br_if 0 (;@1;)
    local.get 0
    i32.trunc_f64_u
    return
  end
  i32.const -1
  i32.const 0
  local.get 1
  select)

For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:

(func $cast (type 0) (param f64) (result i32)
  block  ;; label = @1
    local.get 0
    f64.const 0x1p+32 (;=4.29497e+09;)
    f64.lt
    local.get 0
    f64.const 0x0p+0 (;=0;)
    f64.ge
    i32.and
    i32.eqz
    br_if 0 (;@1;)
    local.get 0
    i32.trunc_f64_u
    return
  end
  i32.const 0)

So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the i32.trunc_sat_f32_s instruction rather than always
replacing out-of-bounds values with zero.

There is still work that could be done to improve casts such as f32 to
u8. That form of cast still uses the fptosi instruction which
generates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...

Closes #73591

rust-highfive · 2020-07-23T19:35:50Z

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

src/librustc_codegen_ssa/mir/rvalue.rs

davidtwco

LGTM, but someone else should probably take a look too.

davidtwco · 2020-07-24T09:29:32Z

r? @nagisa

CryZe · 2020-07-24T14:35:00Z

Actually for the 8-bit and 16-bit cases, what I did in my prototype was to emit the cast to u32 / i32 and then downcast the int from there, that should be a reasonably easy change here too. Should we do that in a follow up PR or do you want to squeeze it into this PR? (I could look into it again as well)

nagisa · 2020-07-24T15:00:42Z

src/librustc_codegen_ssa/mir/rvalue.rs

+        //
+        //      ;; ... previous basic block
+        //      %load_result = alloca i32, align 4
+        //      %less_or_nan = fcmp ult double %x, 0xC1E0000000000000


This looks more complicated than it needs to be. I’m aware that this is not very constructive in terms of feedback as it is… but I’m experimenting with some alternative ways to write this down as LLVM IR and hopefully will come back to update this before monday.

Here’s something I have so far (caveat untested, unchecked and not validated):

; check that the number is within the expected bounds and just convert if the number is in range %inbound_lower = fcmp oge double %0, 0xC1E0000000000000 %inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000 %inbounds = and i1 %inbound_lower, %inbound_upper br i1 %inbounds, label %convert, label %specialcase specialcase: %isnan = fcmp ord double %0, 0.000000e+00 ; can probably also be `!inbound_lower && !inbound_upper`? ; select a max/min value for the case where float is not nan %result_notnan = select i1 %inbound_lower, i32 2147483647, i32 -2147483648 ; nan or min/max %result_nan = select i1 %isnan, i32 0, i32 %result_notnan ret i32 %result_nan convert: %result = fptosi double %0 to i32 ret i32 %result

This is what I ended up with in the end

;; block so far... %0 is the argument %inbound_lower = fcmp oge double %0, 0xC1E0000000000000 %inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000 ;; match (inbound_lower, inbound_upper) { ;; (true, true) => %0 can be converted without trapping ;; (false, false) => %0 is a NaN ;; (true, false) => %0 is too large ;; (false, true) => %0 is too small ;; } ;; ;; The (true, true) check, go to %convert if so. %inbounds = and i1 %inbound_lower, %inbound_upper br i1 %inbounds, label %convert, label %specialcase specialcase: ;; Handle the cases where the number is NaN, too large or too small ;; Either (true, false) or (false, true) %is_not_nan = or i1 %inbound_lower, %inbound_upper ;; Figure out which saturated value we are interested in if not `NaN` %saturated = select i1 %inbound_lower, i32 2147483647, i32 -2147483648 ;; Figure out between saturated and NaN representations %result_nan = select i1 %is_not_nan, i32 %saturated, i32 0 br label %done convert: %result = call i32 @llvm.wasm.trunc.signed.i32.f64(double %0) ; fptosi double %0 to i32 br label %done done: %r = phi i32 [ %result, %convert ], [ %result_nan, %specialcase ] ;; continues here

Here’s a godbolt comparison of the two approaches: https://rust.godbolt.org/z/cnEGWM. Curiously the output in godbolt is very different from what was posted in the PR description, and I’m not sure why the difference (maybe i32 vs u32?).

Anyway, comparing just the godbolt output the code generated from the snippet above looks somewhat better to my wasm-untrained eye. This code is also something LLVM is not really able to optimize much further. So at the very least doing something along the lines of this snippet can be a win in terms of the compile time.

If you don’t want to attempt to integrate something like this, that's also fine with me. In that case let me know and I’ll r+ this.

Oh nice! I'm happy to update this to use that codegen. Do you know if we can natively generate phi nodes though? I wanted to do that instead of using an alloca but the builder trait didn't seem to have a way to manually construct phi nodes.

alexcrichton · 2020-07-28T17:04:35Z

Ok I've updated this to codegen in line with @nagisa's suggestions, although it still uses an alloca to store the result rather than a phi because I can't figure out how to generate a phi node natively :(

@1

This commit improves code generation for WebAssembly targets when translating floating to integer casts. This improvement is only relevant when the `nontrapping-fptoint` feature is not enabled, but the feature is not enabled by default right now. Additionally this improvement only affects safe casts since unchecked casts were improved in rust-lang#74659. Some more background for this issue is present on rust-lang#73591, but the general gist of the issue is that in LLVM the `fptosi` and `fptoui` instructions are defined to return an `undef` value if they execute on out-of-bounds values; they notably do not trap. To implement these instructions for WebAssembly the LLVM backend must therefore generate quite a few instructions before executing `i32.trunc_f32_s` (for example) because this WebAssembly instruction traps on out-of-bounds values. This codegen into wasm instructions happens very late in the code generator, so what ends up happening is that rustc inserts its own codegen to implement Rust's saturating semantics, and then LLVM also inserts its own codegen to make sure that the `fptosi` instruction doesn't trap. Overall this means that a function like this: #[no_mangle] pub unsafe extern "C" fn cast(x: f64) -> u32 { x as u32 } will generate this WebAssembly today: (func $cast (type 0) (param f64) (result i32) (local i32 i32) local.get 0 f64.const 0x1.fffffffep+31 (;=4.29497e+09;) f64.gt local.set 1 block ;; label = @1 block ;; label = @2 local.get 0 f64.const 0x0p+0 (;=0;) local.get 0 f64.const 0x0p+0 (;=0;) f64.gt select local.tee 0 f64.const 0x1p+32 (;=4.29497e+09;) f64.lt local.get 0 f64.const 0x0p+0 (;=0;) f64.ge i32.and i32.eqz br_if 0 (;@2;) local.get 0 i32.trunc_f64_u local.set 2 br 1 (;@1;) end i32.const 0 local.set 2 end i32.const -1 local.get 2 local.get 1 select) This PR improves the situation by updating the code generation for float-to-int conversions in rustc, specifically only for WebAssembly targets and only for some situations (float-to-u8 still has not great codegen). The fix here is to use basic blocks and control flow to avoid speculatively executing `fptosi`, and instead LLVM's raw intrinsic for the WebAssembly instruction is used instead. This effectively extends the support added in rust-lang#74659 to checked casts. After this commit the codegen for the above Rust function looks like: (func $cast (type 0) (param f64) (result i32) (local i32) block ;; label = @1 local.get 0 f64.const 0x0p+0 (;=0;) f64.ge local.tee 1 i32.const 1 i32.xor br_if 0 (;@1;) local.get 0 f64.const 0x1.fffffffep+31 (;=4.29497e+09;) f64.le i32.eqz br_if 0 (;@1;) local.get 0 i32.trunc_f64_u return end i32.const -1 i32.const 0 local.get 1 select) For reference, in Rust 1.44, which did not have saturating float-to-integer casts, the codegen LLVM would emit is: (func $cast (type 0) (param f64) (result i32) block ;; label = @1 local.get 0 f64.const 0x1p+32 (;=4.29497e+09;) f64.lt local.get 0 f64.const 0x0p+0 (;=0;) f64.ge i32.and i32.eqz br_if 0 (;@1;) local.get 0 i32.trunc_f64_u return end i32.const 0) So we're relatively close to the original codegen, although it's slightly different because the semantics of the function changed where we're emulating the `i32.trunc_sat_f32_s` instruction rather than always replacing out-of-bounds values with zero. There is still work that could be done to improve casts such as `f32` to `u8`. That form of cast still uses the `fptosi` instruction which generates lots of branch-y code. This seems less important to tackle now though. In the meantime this should take care of most use cases of floating-point conversion and as a result I'm going to speculate that this... Closes rust-lang#73591

alexcrichton · 2020-08-03T18:28:48Z

ping for a re-r? from @nagisa

nagisa · 2020-08-03T18:54:25Z

Yeah LGTM. Its too bad we cannot easily generate phi – having many casts will ultimately end up generating a larger amount of IR for LLVM to optimize, slowing down compilation. But nevertheless the improvement seems worth it.

@bors r+

bors · 2020-08-03T18:54:27Z

📌 Commit 2c1b046 has been approved by nagisa

bors · 2020-08-03T18:54:41Z

💡 This pull request was already approved, no need to approve it again.

There's another pull request that is currently being tested, blocking this pull request: Rollup of 7 pull requests #75116

bors · 2020-08-03T18:54:42Z

📌 Commit 2c1b046 has been approved by nagisa

bors · 2020-08-03T23:58:02Z

⌛ Testing commit 2c1b046 with merge 1d601d6...

bors · 2020-08-04T01:48:19Z

☀️ Test successful - checks-actions, checks-azure
Approved by: nagisa
Pushing 1d601d6 to master...

This commit fixes an issue with rust-lang#74695 where the fptosi and fptoui specializations on wasm were accidentally used on vector types by the `simd_cast` intrinsic. This issue showed up as broken CI for the stdsimd crate. Here this commit simply skips the specialization on vector kinds flowing into `fpto{s,u}i`.

Don't try to use wasm intrinsics on vectors This commit fixes an issue with rust-lang#74695 where the fptosi and fptoui specializations on wasm were accidentally used on vector types by the `simd_cast` intrinsic. This issue showed up as broken CI for the stdsimd crate. Here this commit simply skips the specialization on vector kinds flowing into `fpto{s,u}i`.

rust-highfive assigned davidtwco Jul 23, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 23, 2020

alexcrichton mentioned this pull request Jul 23, 2020

Redundant checks in floating point to integer casts on WASM #73591

Closed

CryZe reviewed Jul 23, 2020

View reviewed changes

src/librustc_codegen_ssa/mir/rvalue.rs Outdated Show resolved Hide resolved

CryZe approved these changes Jul 23, 2020

View reviewed changes

davidtwco approved these changes Jul 24, 2020

View reviewed changes

rust-highfive assigned nagisa and unassigned davidtwco Jul 24, 2020

nagisa reviewed Jul 24, 2020

View reviewed changes

alexcrichton force-pushed the more-wasm-float-cast-fixes branch 2 times, most recently from 66e5418 to 4a8d24b Compare July 28, 2020 17:03

alexcrichton force-pushed the more-wasm-float-cast-fixes branch from 4a8d24b to 2c1b046 Compare July 28, 2020 17:17

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 3, 2020

bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 4, 2020

bors merged commit 1d601d6 into rust-lang:master Aug 4, 2020

alexcrichton mentioned this pull request Aug 9, 2020

Don't try to use wasm intrinsics on vectors #75316

Merged

cuviper added this to the 1.47.0 milestone May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc: Improving safe wasm float->int casts #74695

rustc: Improving safe wasm float->int casts #74695

alexcrichton commented Jul 23, 2020 •

edited

Loading

rust-highfive commented Jul 23, 2020

davidtwco left a comment

davidtwco commented Jul 24, 2020

CryZe commented Jul 24, 2020 •

edited

Loading

nagisa Jul 24, 2020

nagisa Jul 24, 2020

nagisa Jul 24, 2020 •

edited

Loading

alexcrichton Jul 28, 2020

alexcrichton commented Jul 28, 2020

alexcrichton commented Aug 3, 2020

nagisa commented Aug 3, 2020 •

edited

Loading

bors commented Aug 3, 2020

bors commented Aug 3, 2020

bors commented Aug 3, 2020

bors commented Aug 3, 2020

bors commented Aug 4, 2020

rustc: Improving safe wasm float->int casts #74695

rustc: Improving safe wasm float->int casts #74695

Conversation

alexcrichton commented Jul 23, 2020 • edited Loading

rust-highfive commented Jul 23, 2020

davidtwco left a comment

Choose a reason for hiding this comment

davidtwco commented Jul 24, 2020

CryZe commented Jul 24, 2020 • edited Loading

nagisa Jul 24, 2020

Choose a reason for hiding this comment

nagisa Jul 24, 2020

Choose a reason for hiding this comment

nagisa Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

alexcrichton Jul 28, 2020

Choose a reason for hiding this comment

alexcrichton commented Jul 28, 2020

alexcrichton commented Aug 3, 2020

nagisa commented Aug 3, 2020 • edited Loading

bors commented Aug 3, 2020

bors commented Aug 3, 2020

bors commented Aug 3, 2020

bors commented Aug 3, 2020

bors commented Aug 4, 2020

alexcrichton commented Jul 23, 2020 •

edited

Loading

CryZe commented Jul 24, 2020 •

edited

Loading

nagisa Jul 24, 2020 •

edited

Loading

nagisa commented Aug 3, 2020 •

edited

Loading