Add more SIMD platform-intrinsics #117953

farnoy · 2023-11-15T22:20:02Z

Also added a run-pass test to test both intrinsics, and additional build-fail & check-fail to cover validation for both intrinsics

rustbot · 2023-11-15T22:20:10Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @b-naber (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

farnoy · 2023-11-15T22:20:47Z

r? rust-lang/project-portable-simd

rustbot · 2023-11-15T22:20:49Z

Failed to set assignee to miguelraz: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

farnoy · 2023-11-15T22:21:39Z

r? @calebzulawski @programmerjake

rustbot · 2023-11-15T22:21:43Z

Failed to set assignee to calebzulawski: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

compiler/rustc_codegen_llvm/src/intrinsic.rs

tests/codegen/simd-intrinsic/simd-intrinsic-generic-load.rs

compiler/rustc_codegen_cranelift/src/intrinsics/simd.rs

Noratrieb · 2023-11-16T10:07:29Z

r? @workingjubilee as a SIMD person that has bors perms :D

rustbot · 2023-11-17T12:50:19Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

farnoy · 2023-11-17T12:51:07Z

Marking this as ready for review. I tested it end to end in my project, with changes to the portable-simd crate linked above. It's been working well.

farnoy · 2023-11-19T15:02:25Z

@RalfJung Letting you know about a new SIMD intrinsic because of rust-lang/miri#1912 (comment)

RalfJung · 2023-11-19T15:07:41Z

Please add somewhere documentation for the behavior of these intrinsics -- in particular documentation for any cases where it is UB.

RalfJung · 2023-11-19T15:10:43Z

I guess the question is where to add those docs... it's a bit annoying that the extern block that imports all these intrinsics is not in the same repository where they are defined :/ Maybe library/portable-simd/crates/core_simd/src/intrinsics.rs should be moved into the rustc repo? That would certainly be the natural place to put these docs.

farnoy · 2023-11-19T15:15:15Z

I'll add it to portable-simd for now, then? This will be an interesting one for Miri because memory is not accessed when the corresponding mask element is disabled. For example, it's legal to use a memory address that is invalid if the mask is all zeros.

RalfJung · 2023-11-19T15:21:19Z

Yes that sounds very important to describe precisely. I assume you ensured that all codegen backends are implementing this in a way that matches that spec (i.e., avoids the UB when the mask is zero).

farnoy · 2023-11-19T15:43:48Z

Yes, LLVM describes it this way:

The memory addresses corresponding to the “off” lanes are not accessed.

And for Cranelift, I implemented it with a scalar loop and guarded the memory access by an if statement.

farnoy · 2023-11-29T13:46:53Z

Hi @workingjubilee , is there anything I can do to help expedite the review process?

workingjubilee

Sorry, I had like 200 notifications.

Now I have <100.

I agree with Ralf that the documentation for these intrinsics ought to live inside this repo. That we more-meticulously documented them is more just an accident of no one else beating us to it. A brief sentence and then "this corresponds roughly to XYZ instruction" is fine for now, we can coalesce them later.

compiler/rustc_codegen_cranelift/src/intrinsics/simd.rs

compiler/rustc_codegen_llvm/src/intrinsic.rs

farnoy · 2023-12-03T11:25:27Z

@workingjubilee Thanks, I've pushed out a commit addressing your comments.

farnoy · 2023-12-07T22:13:12Z

@workingjubilee OK I think that should cover it, I've added a check-fail test to verify return types and a build-fail test to verify the argument validation.

What's the final verdict on the order of these arguments? I think we want (mask, pointer, values) and like you said, this aligns with the simd_select taking the mask first and then "using" the later argument when the mask lane is true, and the last argument for the fallback value when it's false?

workingjubilee · 2023-12-07T22:16:44Z

Yes, that ordering is good by me.

Hypothetically you could use this to argue that the proper order is

simd_masked_load(mask, pointer, values)
simd_masked_store(mask, values, pointer)

But I think (mask, pointer, values) for both is also fine and consistent.

farnoy · 2023-12-08T00:29:57Z

@workingjubilee I'm happy with the state of the PR now. The validation coverage feels fine to me, the only downside are the diagnostic messages shown to the user.

I'm fairly sure that other intrinsics (especially the multi-argument ones) also suffer from this, but they don't have build-fail tests where we could see that directly.

I don't think this PR regresses on what we already have, I'm just highlighting a deficiency that already existed.

workingjubilee · 2023-12-08T22:29:12Z

Probably! This is almost certainly an inadequately tested part of the compiler.

In any case, it is fine for monomorphization errors to be gross, or even simply unhelpful, we just want them to be correct if we are emitting them at all.

farnoy · 2023-12-08T23:03:58Z

Are we good to merge then, or is there something else we should do in this PR?

workingjubilee

Everything looks good now modulo a nit.

A test seems to be absent?
Please rebase to minimize the intervening history a bit.

r=me with that added.

tests/codegen/simd-intrinsic/simd-intrinsic-generic-masked-load.rs

workingjubilee · 2023-12-09T05:48:48Z

@bors delegate=farnoy

bors · 2023-12-09T05:48:51Z

✌️ @farnoy, you can now approve this pull request!

If @workingjubilee told you to "r=me" after making some further change, please make that change, then do @bors r=@workingjubilee

This maps to the LLVM intrinsics: llvm.masked.load and llvm.masked.store

farnoy · 2023-12-09T11:43:38Z

@bors r=@workingjubilee

bors · 2023-12-09T11:43:40Z

📌 Commit 97ae509 has been approved by workingjubilee

It is now in the queue for this repository.

…llaumeGomez Rollup of 6 pull requests Successful merges: - rust-lang#117953 (Add more SIMD platform-intrinsics) - rust-lang#118057 (dedup for duplicate suggestions) - rust-lang#118638 (More `rustc_mir_dataflow` cleanups) - rust-lang#118702 (Strengthen well known check-cfg names and values test) - rust-lang#118734 (Unescaping cleanups) - rust-lang#118766 (Lower some forgotten spans) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#117953 - farnoy:masked-load-store, r=workingjubilee Add more SIMD platform-intrinsics - [x] simd_masked_load - [x] LLVM codegen - llvm.masked.load - [x] cranelift codegen - implemented but untested - [ ] simd_masked_store - [x] LLVM codegen - llvm.masked.store - [ ] cranelift codegen Also added a run-pass test to test both intrinsics, and additional build-fail & check-fail to cover validation for both intrinsics

Rustup Pulls in rust-lang/rust#117953 (in preparation for implementing those intrinsics)

RalfJung · 2023-12-10T20:41:25Z

tests/codegen/simd-intrinsic/simd-intrinsic-generic-masked-load.rs

+// CHECK-LABEL: @load_f32x2
+#[no_mangle]
+pub unsafe fn load_f32x2(mask: Vec2<i32>, pointer: *const f32,
+                         values: Vec2<f32>) -> Vec2<f32> {


What is the meaning of "values" here? It seems strange that a load is told the values to load...?

These are the fallback/passthrough values used for lanes that are masked off, so have the corresponding bits in mask all zeros

Ah, the name values doesn't really convey that. Thanks!

RalfJung · 2023-12-10T20:45:18Z

tests/codegen/simd-intrinsic/simd-intrinsic-generic-masked-store.rs

+#[no_mangle]
+pub unsafe fn store_f32x2(mask: Vec2<i32>, pointer: *mut f32, values: Vec2<f32>) {
+    // CHECK: call void @llvm.masked.store.v2f32.p0(<2 x float> {{.*}}, ptr {{.*}}, i32 {{.*}}, <2 x i1> {{.*}})
+    simd_masked_store(mask, pointer, values)


What are the alignment requirements on the pointer for the store to be valid? (And same question for the load.)

The pointer should be aligned to the individual element, not the whole vector. So when loading f32x2, size_of is 8, but alignment of f32 is 4, therefore the pointer should be aligned to 4. This is both for load & store

What if the mask is all-0, i.e., no load/store actually happens. Is the pointer still required to be aligned?

Even with alignment checking, masked elements never generate an access for the load, thus no hardware implementation requires it. If you have a strong reasoning for arguing otherwise, we could make it UB regardless?

Otherwise: no, it's a non-event, like unsafe { invalid_ptr.add(0) }.

fwiw x86-64 for these doesn't ever generate an alignment check exception even if you turn alignment checking on, while the others are explicit that the logic is that predication prevents the access, therefore the alignment check.

Yes they do.

I believe everything that's been raised so far has been addressed. Thanks @RalfJung and apologies if my answers earlier were unhelpful. I wasn't aware that the alignment requirement is only relaxed through the from_array/to_array path, relying on the optimizer to elide those later.

@farnoy can you link to the PR(s) that address this?

Mostly I've been asking for documentation. #118853 addresses that, please double-check there that the docs for these new intrinsics match what you implemented now. :)

farnoy · 2023-12-10T21:25:32Z

compiler/rustc_codegen_llvm/src/intrinsic.rs

+
+        // Alignment of T, must be a constant integer value:
+        let alignment_ty = bx.type_i32();
+        let alignment = bx.const_i32(bx.align_of(values_ty).bytes() as i32);


I think this should be bx.align_of(values_elem).bytes() not values_ty. Technically, the LLVM intrinsic accepts any power of two as alignment, so we could relax this down to 1.

https://rust-lang.github.io/unsafe-code-guidelines/layout/packed-simd-vectors.html#packed-simd-vector-types

Am I understanding this correctly or does it not make a difference @workingjubilee?

Having checked the Arm ARM, RISCV V spec, and Intel SDM, we should continue to require element alignment at least. We can expect any masked-load-supporting implementation to support masked loads and stores on element boundaries. However, they may reject accesses on byte boundaries (except where the element is u8, of course), or they may just implement the load/store inefficiently if unaligned, which is honestly about as bad.

There are definitely reasons to do unaligned loads (deserialization, packet framing, etc) that benefit from vectorization, so I think we should at least make writing {}_unaligned versions possible. Though I can't remember, what's the requirement for scatter/gather?

Same, element alignment.

We can adjust them to take a const parameter for alignment, if we want, I guess?

I can only guess the intent, but I am fairly sure about what Simd::load currently does. :)

If it is intended to allow entirely unaligned pointers, it should be implemented as ptr.cast::<Self>().read_unaligned(). Though I guess that would run into issues with non-power-of-2 vectors as the comment indicates, but those aren't properly supported currently anyway.

Opened rust-lang/portable-simd#382 for the load question.

I will open a PR to fix the alignment passed down to LLVM IR. The optimizer converts the masked load to an umasked, aligned load because the alignment is too high

https://rust.godbolt.org/z/KEeGbevbb

If we go with element alignment for now, does this require a code change? The documentation makes it seem like bx.align_of(values_ty) is not guaranteed to equal bx.align_of(values_elem) (implementation-defined), so this line should probably be corrected?

Correct.

I don't have opinions either way here. It's just important that we pick a rule and then apply it consistently everywhere. (Which is also IMO why we shouldn't land new intrinsics without documentation of their exact safety requirements. Just because portable SIMD intrinsics got a pass on this in the past doesn't mean we should continue doing that.)

I certainly wasn't intending on being freewheeling here, or I would not have asked a zillion small questions myself. :^) The problem is slightly of the "remembering all the questions I should be asking" nature, especially when there has already been multiple rounds of review.

I guess the main list of questions to keep in mind is:

For memory (i.e. pointer) operations: What is the alignment asserted by the op? (And for SIMD types, you should almost always be picking the alignment of the element or the alignment of the vector.)

For memory operations, is there a condition in which that alignment is not asserted by the intrinsic? (Also, please answer no, especially if there is any doubt about how LLVM will interpret it.)

For memory operations, what happens if you operate on uninit memory? It's UB, right?

For masked operations, what happens for masks derived from alternating [true, false, ..]? (i.e. "what is the normal behavior?")

For masked operations, what happens on all-1s? Is there any special event, or does it strengthen any assertions?

For masked operations, what happens on all-0s? Is there any special event, or does it weaken any assertions?

For indexing operations, what happens when your index is "out-of-bounds"?

Sooo... have you checked the LLVMIR for the answers to all of the above questions? How about after optimizations?

For memory operations, what happens if you operate on uninit memory? It's UB, right?

Note that we do not ever treat uninit memory specially for memory operations in Rust. The UB on uninit derives entirely from the fact that uninit memory violates the validity invariant of most types. So the question that should be asked is: is this an untyped operation, working on raw abstract machine bytes, or is it a typed operation, and at which type?

…workingjubilee Fix alignment passed down to LLVM for simd_masked_load Follow up to rust-lang#117953 The alignment for a masked load operation should be that of the element/lane, not the vector as a whole It can produce miscompilations after the LLVM optimizer notices the higher alignment and promotes this to an unmasked, aligned load followed up by blend/select - https://rust.godbolt.org/z/KEeGbevbb

Rollup merge of rust-lang#118864 - farnoy:masked-load-store-fixes, r=workingjubilee Fix alignment passed down to LLVM for simd_masked_load Follow up to rust-lang#117953 The alignment for a masked load operation should be that of the element/lane, not the vector as a whole It can produce miscompilations after the LLVM optimizer notices the higher alignment and promotes this to an unmasked, aligned load followed up by blend/select - https://rust.godbolt.org/z/KEeGbevbb

Rustup Pulls in rust-lang#117953 (in preparation for implementing those intrinsics)

farnoy changed the title ~~Add more SIMD platform-intrinsic~~ Add more SIMD platform-intrinsics Nov 15, 2023

rustbot assigned b-naber Nov 15, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 15, 2023

farnoy mentioned this pull request Nov 15, 2023

Implement SIMD-specific functions rust-lang/portable-simd#16

Open

4 tasks

programmerjake reviewed Nov 15, 2023

View reviewed changes

compiler/rustc_codegen_llvm/src/intrinsic.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Nov 15, 2023

View reviewed changes

tests/codegen/simd-intrinsic/simd-intrinsic-generic-load.rs Outdated Show resolved Hide resolved

farnoy commented Nov 16, 2023

View reviewed changes

compiler/rustc_codegen_cranelift/src/intrinsics/simd.rs Show resolved Hide resolved

rustbot assigned workingjubilee and unassigned b-naber Nov 16, 2023

farnoy mentioned this pull request Nov 16, 2023

Add support for masked loads & stores rust-lang/portable-simd#374

Closed

farnoy marked this pull request as ready for review November 17, 2023 12:50

calebzulawski approved these changes Nov 18, 2023

View reviewed changes

workingjubilee reviewed Dec 3, 2023

View reviewed changes

compiler/rustc_codegen_cranelift/src/intrinsics/simd.rs Outdated Show resolved Hide resolved

compiler/rustc_codegen_llvm/src/intrinsic.rs Outdated Show resolved Hide resolved

workingjubilee reviewed Dec 9, 2023

View reviewed changes

tests/codegen/simd-intrinsic/simd-intrinsic-generic-masked-load.rs Outdated Show resolved Hide resolved

Add simd_masked_{load,store} platform-intrinsics

97ae509

This maps to the LLVM intrinsics: llvm.masked.load and llvm.masked.store

farnoy force-pushed the masked-load-store branch from e271119 to 97ae509 Compare December 9, 2023 11:40

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 9, 2023

GuillaumeGomez mentioned this pull request Dec 9, 2023

Rollup of 6 pull requests #118780

Merged

bors merged commit c57b054 into rust-lang:master Dec 9, 2023
11 checks passed

rustbot added this to the 1.76.0 milestone Dec 9, 2023

RalfJung mentioned this pull request Dec 10, 2023

Rustup rust-lang/miri#3217

Merged

bors added a commit to rust-lang/miri that referenced this pull request Dec 10, 2023

Auto merge of #3217 - RalfJung:rustup, r=RalfJung

9828125

Rustup Pulls in rust-lang/rust#117953 (in preparation for implementing those intrinsics)

RalfJung reviewed Dec 10, 2023

View reviewed changes

farnoy commented Dec 10, 2023

View reviewed changes

farnoy mentioned this pull request Dec 12, 2023

Fix alignment passed down to LLVM for simd_masked_load #118864

Merged

workingjubilee added A-SIMD Area: SIMD (Single Instruction Multiple Data) PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) labels Dec 14, 2023

RalfJung pushed a commit to RalfJung/rust that referenced this pull request Dec 17, 2023

Auto merge of rust-lang#3217 - RalfJung:rustup, r=RalfJung

92ab9d6

Rustup Pulls in rust-lang#117953 (in preparation for implementing those intrinsics)

Add more SIMD platform-intrinsics #117953

Add more SIMD platform-intrinsics #117953

Conversation

farnoy commented Nov 15, 2023 • edited Loading

rustbot commented Nov 15, 2023

farnoy commented Nov 15, 2023

rustbot commented Nov 15, 2023

farnoy commented Nov 15, 2023

rustbot commented Nov 15, 2023

Noratrieb commented Nov 16, 2023

rustbot commented Nov 17, 2023

farnoy commented Nov 17, 2023

farnoy commented Nov 19, 2023

RalfJung commented Nov 19, 2023 • edited Loading

RalfJung commented Nov 19, 2023 • edited Loading

farnoy commented Nov 19, 2023

RalfJung commented Nov 19, 2023 • edited Loading

farnoy commented Nov 19, 2023

farnoy commented Nov 29, 2023

workingjubilee left a comment

Choose a reason for hiding this comment

farnoy commented Dec 3, 2023

farnoy commented Dec 7, 2023

workingjubilee commented Dec 7, 2023

farnoy commented Dec 8, 2023

workingjubilee commented Dec 8, 2023

farnoy commented Dec 8, 2023

workingjubilee left a comment • edited Loading

Choose a reason for hiding this comment

workingjubilee commented Dec 9, 2023

bors commented Dec 9, 2023

farnoy commented Dec 9, 2023

bors commented Dec 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Dec 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

RalfJung Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

farnoy commented Nov 15, 2023 •

edited

Loading

RalfJung commented Nov 19, 2023 •

edited

Loading

RalfJung commented Nov 19, 2023 •

edited

Loading

RalfJung commented Nov 19, 2023 •

edited

Loading

workingjubilee left a comment •

edited

Loading

RalfJung Dec 10, 2023 •

edited

Loading

RalfJung Dec 13, 2023 •

edited

Loading

workingjubilee Dec 11, 2023 •

edited

Loading

RalfJung Dec 11, 2023 •

edited

Loading