riscv64: Implement `insertlane` #6408

afonso360 · 2023-05-18T11:17:37Z

👋 Hey,

This PR implements insertlane. The implementation is essentially the same as the splat instruction, but using the masked version of vmv where only one lane is enabled. Masked vmv's are called vmerge, and they are only special in the fact that they don't follow the mask policy for masked out elements. Instead, masked out elements are sourced from one of the operands, instead of being ignored.

Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register v0 as a mask. Masks are represented as bitfields where each bit corresponds to one lane.

The WASM replacelane tests exposed a preexisting bug in the encoding of vmv.s.x and vfmv.s.f, where their source operands are encoded in the vs1 field, instead of vs2. A fix for that is also included in this PR.

cranelift/codegen/src/isa/riscv64/inst/mod.rs

alexcrichton · 2023-05-18T14:18:45Z

cranelift/codegen/src/isa/riscv64/inst_vector.isle

+(rule (gen_vec_mask mask)
+  (rv_vmv_sx (imm $I64 mask) (vstate_from_type $I64X2)))


I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?

I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.

One thing I thought though is that this might be good to delegate to a general vconst helper. Right now vconst always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.

I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?

I'm not sure!

There are a few combos here that I can think of that could potentially be better than this in terms of materializing the mask. One of them like you mentioned is using something like the immediate splat instruction vmv.v.i, but that only covers a weird subset of masks. Where the mask result can fit in a signed imm5 per lane. So with that one we can do a mask like 0b001111 but not 0b011111, though we do get the benefit of it being just a single instruction.

I didn't want to try to pull off those sort of weird edge cases yet, but it might be easier than I think if we can look at the whole thing through the lens of vconst, I'm going to give that a try.

I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.

Do you mean having a separate move instruction into the X / F registers that has no immediate? We used to have instructions that matched exactly those semantics, but they were removed with the following reasoning:

NOTE: The complementary vins.v.x instruction, which allow a write to
any element in a vector register, has been removed. This instruction
would be the only instruction (apart from vsetvl) that requires two
integer source operands, and also would be slow to execute in an
implementation with vector register renaming, relegating its main use
to debugger modifications to state. The alternative and more
generally useful vslide1up and vslide1down instructions can be
used to update vector register state in place over a debug link
without accessing memory.

vslide1up and vslide1down aren't exactly applicable to insertlane and extractlane. And they have a special note in the standard stating that they are slow instructions, pretty much just reserved for debuggers.

One thing I thought though is that this might be good to delegate to a general vconst helper. Right now vconst always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.

Yeah, that seems like a good idea! I'm also going to try to merge the splat const rules into that.

On a broader note I checked what the other engines are doing here w.r.t insertlane:

V8 seems to be doing exactly what we do.

LLVM does a vslideup with a tail undisturbed policy, which is also an option and it does mean they can avoid materializing a mask since they can use vslideup.vi with the lane as part of the immediate. I have no idea if this is better or worse than the option above since it also emits 2 or 3 vsetvli's.

Interesting, and thanks for explaining all that! To be clear I don't think any changes are necessary for this PR, I was mostly just musing. With prior art in V8 this is definitely fine to land (even without that I think it's ok). I agree it's probably not worth it yet to have lots of special cases for various immediates here and there, better to start simple and build out. I was mostly curious if there was a pattern that was going to be implemented later that would simplify this or whether this would be the way forward for awhile.

I think I would prefer handling the merging all of the vconst stuff in a future PR. But yeah, making mask generation as part of that makes a ton of sense!

Some of these carry their source in vs2

Remove the space between , and the register. This is inline with the rest of our formatting.

…egister

afonso360 requested review from a team as code owners May 18, 2023 11:17

afonso360 requested review from elliottt and removed request for a team May 18, 2023 11:17

github-actions bot added the cranelift Issues related to the Cranelift code generator label May 18, 2023

alexcrichton approved these changes May 18, 2023

View reviewed changes

afonso360 added 9 commits May 20, 2023 11:06

riscv64: Support vector instruction masking

e1eb293

riscv64: Add vmerge instructions

a4afb9a

riscv64: Implement insertlane

d5358b0

riscv64: Fix encoding of vmv instructions

b513fdf

Some of these carry their source in vs2

riscv64: Fix formatting of mask register

1f3cafd

Remove the space between , and the register. This is inline with the rest of our formatting.

riscv64: Restrict insertlane to vector types that fit in a single r…

cd1d8e9

…egister

wasmtime: Enable more RISC-V SIMD tests

0f2909a

riscv64: Use inline format syntax for printing vector instructions

1e40013

riscv64: Add vector mask note

a5952ba

afonso360 force-pushed the riscv-insertlane branch from 89a6b01 to a5952ba Compare May 20, 2023 10:31

afonso360 enabled auto-merge May 20, 2023 10:33

afonso360 added this pull request to the merge queue May 20, 2023

Merged via the queue into bytecodealliance:main with commit 9871098 May 20, 2023

afonso360 deleted the riscv-insertlane branch May 20, 2023 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv64: Implement `insertlane` #6408

riscv64: Implement `insertlane` #6408

afonso360 commented May 18, 2023 •

edited

Loading

alexcrichton May 18, 2023

afonso360 May 18, 2023

alexcrichton May 18, 2023

afonso360 May 20, 2023

		(rule (gen_vec_mask mask)
		(rv_vmv_sx (imm $I64 mask) (vstate_from_type $I64X2)))

riscv64: Implement insertlane #6408

riscv64: Implement insertlane #6408

Conversation

afonso360 commented May 18, 2023 • edited Loading

alexcrichton May 18, 2023

Choose a reason for hiding this comment

afonso360 May 18, 2023

Choose a reason for hiding this comment

alexcrichton May 18, 2023

Choose a reason for hiding this comment

afonso360 May 20, 2023

Choose a reason for hiding this comment

riscv64: Implement `insertlane` #6408

riscv64: Implement `insertlane` #6408

afonso360 commented May 18, 2023 •

edited

Loading