Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvptx abi adjustment may keep PassMode::Direct for small aggregates in conv extern "C" #117480

Open
kjetilkjeka opened this issue Nov 1, 2023 · 14 comments · Fixed by #117671
Open
Labels
O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kjetilkjeka
Copy link
Contributor

          @kjetilkjeka I had to disable this test on nvptx64 since the `extern "C"` ABI uses `Direct` pass mode in invalid ways. I think that's caused by this code:

fn classify_arg<Ty>(arg: &mut ArgAbi<'_, Ty>) {
if arg.layout.is_aggregate() && arg.layout.size.bits() > 64 {
arg.make_indirect();
}
}

That's invalid since it doesn't say what to do for smaller aggregates, and they default to the (bad) Direct. Instead you have to say which register they are supposed to be passed in. You can check what the other targets are doing. Targets are expected to set an explicit pass mode for all aggregate arguments.

This should be easy to reproduce by having a function like

#[repr(C)]
struct ReprC1<T: ?Sized>(T);

extern "C" fn myfn(x: ReprC1<i32>) {}

which will likely ICE on the current compiler already.

Originally posted by @RalfJung in #117351 (comment)

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 1, 2023
@kjetilkjeka
Copy link
Contributor Author

@rustbot label +O-NVPTX

@rustbot rustbot added the O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html label Nov 1, 2023
@kjetilkjeka kjetilkjeka changed the title nvptx abi adjustment may keep PassMode::Direct for small aggregates nvptx abi adjustment may keep PassMode::Direct for small aggregates in conv extern "C" Nov 1, 2023
@kjetilkjeka
Copy link
Contributor Author

kjetilkjeka commented Nov 1, 2023

Thanks for pointing this out @RalfJung

As a note I had to use #[no_mangle] to avoid the compiler from removing the symbol before the assert was hit.

Do you know why this test started failing with your changes in #117351 and not before that? The reason I'm asking is that I'm curious to how to make the tests catch it if we have a regression and start producing PassMode::Direct after this issue is resolved

@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 1, 2023
@RalfJung
Copy link
Member

RalfJung commented Nov 1, 2023

Do you know why this test started failing with your changes in #117351 and not before that?

We used to only run the assertion in codegen, i.e. when a function with such argument types actually gets generated or called. The abi compatibility test works on function pointers, it computes their ABI but never calls them so we never enter the part of the codegen backend that had the assertion.

I'm going to try to find a way to move this check to the ABI computation logic, but that turns out to be hard since we do compute some ABIs that we don't actually support...

@RalfJung
Copy link
Member

RalfJung commented Nov 1, 2023

I'm curious to how to make the tests catch it if we have a regression and start producing PassMode::Direct after this issue is resolved

#117500 does ensure we'd catch this.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 28, 2024
…ct, r=davidtwco

NVPTX: Avoid PassMode::Direct for args in C abi

Fixes rust-lang#117480

I must admit that I'm confused about `PassMode` altogether, is there a good sum-up threads for this anywhere? I'm especially confused about how "indirect" and "byval" goes together. To me it seems like "indirect" basically means "use a indirection through a pointer", while "byval" basically means "do not use indirection through a pointer".

The return used to keep `PassMode::Direct` for small aggregates. It turns out that `make_indirect` messes up the tests and one way to fix it is to keep `PassMode::Direct` for all aggregates. I have mostly seen this PassMode mentioned for args. Is it also a problem for returns? When experimenting with `byval` as an alternative i ran into [this assert](https://github.com/rust-lang/rust/blob/61a3eea8043cc1c7a09c2adda884e27ffa8a1172/compiler/rustc_codegen_llvm/src/abi.rs#L463C22-L463C22)

I have added tests for the same kind of types that is already tested for the "ptx-kernel" abi. The tests cannot be enabled until something like rust-lang#117458 is completed and merged.

CC: `@RalfJung` since you seem to be the expert on this and have already helped me out tremendously

CC: `@RDambrosio016` in case this influence your work on `rustc_codegen_nvvm`

`@rustbot` label +O-NVPTX
@bors bors closed this as completed in 713c852 May 28, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue May 28, 2024
Rollup merge of rust-lang#117671 - kjetilkjeka:nvptx_c_abi_avoid_direct, r=davidtwco

NVPTX: Avoid PassMode::Direct for args in C abi

Fixes rust-lang#117480

I must admit that I'm confused about `PassMode` altogether, is there a good sum-up threads for this anywhere? I'm especially confused about how "indirect" and "byval" goes together. To me it seems like "indirect" basically means "use a indirection through a pointer", while "byval" basically means "do not use indirection through a pointer".

The return used to keep `PassMode::Direct` for small aggregates. It turns out that `make_indirect` messes up the tests and one way to fix it is to keep `PassMode::Direct` for all aggregates. I have mostly seen this PassMode mentioned for args. Is it also a problem for returns? When experimenting with `byval` as an alternative i ran into [this assert](https://github.com/rust-lang/rust/blob/61a3eea8043cc1c7a09c2adda884e27ffa8a1172/compiler/rustc_codegen_llvm/src/abi.rs#L463C22-L463C22)

I have added tests for the same kind of types that is already tested for the "ptx-kernel" abi. The tests cannot be enabled until something like rust-lang#117458 is completed and merged.

CC: ``@RalfJung`` since you seem to be the expert on this and have already helped me out tremendously

CC: ``@RDambrosio016`` in case this influence your work on `rustc_codegen_nvvm`

``@rustbot`` label +O-NVPTX
@RalfJung
Copy link
Member

This should not have been closed without enabling the relevant tests for nvtpx:
#125694.

@RalfJung RalfJung reopened this May 29, 2024
@RalfJung
Copy link
Member

Actually the issue is still open -- the test fails:

`PassMode::Direct` for aggregates only allowed for "unadjusted" and "ptx-kernel" functions and on wasm
Problematic type: TyAndLayout {
    ty: ReprC1<*const i32>,
    layout: Layout {
        size: Size(8 bytes),
        align: AbiAndPrefAlign {
            abi: Align(8 bytes),
            pref: Align(8 bytes),
        },
        abi: Aggregate {
            sized: true,
        },
        fields: Arbitrary {
            offsets: [
                Size(0 bytes),
            ],
            memory_index: [
                0,
            ],
        },
        largest_niche: None,
        variants: Single {
            index: 0,
        },
        max_repr_align: None,
        unadjusted_abi_align: Align(8 bytes),
    },
}

@RalfJung
Copy link
Member

#117671 only changed classify_arg, shouldn't it change classify_ret as well?

@kjetilkjeka
Copy link
Contributor Author

This comment explains the problem but is essentially this.

The only two PassModes that give the correct signature is Direct and Indirect{on_stack: true}. For arguments it's possible to use Indirect{on_stack: true} but that is not the case for returns.

I unfortunately don't think it's possible to have a correct C abi for ptx without accepting Direct or implementing the new ABI handling.

@RalfJung
Copy link
Member

Hm, I only understand half of that comment.^^ I cannot parse the PTX signature. What does it all mean? How does the PTX signature look like when the struct has multiple fields with padding between them?

I also don't know much about on_stack. The fact that that is part of Indirect makes no sense to me as there's not actually a pointer being passed -- this is probably some LLVM stuff leaking through. Is there a way to tell LLVM "return this as a blob of N bytes" that then gets the right signature in PTX?

This is all completely terrible. LLVM's types are already ill-suited to express function call ABIs so LLVM backends have to use heuristics to reconstruct the actual ABI (in particular for targets like PTX that aren't actually assembly but still IRs), and then in Rust we use the PassMode to describe which LLVM types to generate but we can't actually describe all of them, urgh.

That said... I am a bit surprised that Cast doesn't work. You say

PassMode::Cast casted to an uniform does the correct thing for many types, but unfortunately not for single element structs (it passes like it was the underlying type, not the struct itself).

what exactly does that look like? cast will return things in an LLVM struct if prefix contains something, so I would expect that for the struct with a u8 field, you want Cast where prefix[0] is Some(Reg { kind: integer, size: 1 }) and everything else is None. (If you only set the rest it will use an array, or the underlying type for single-element arrays.

For more complicated types this does sound quite tricky to do correctly, though.

@kjetilkjeka
Copy link
Contributor Author

kjetilkjeka commented May 29, 2024

Hm, I only understand half of that comment.^^ I cannot parse the PTX signature. What does it all mean? How does the PTX signature look like when the struct has multiple fields with padding between them?

Ptx is an intermediate format where passing something is basically just listing a named argument of the type it is passed as. The C ABI in ptx is will list a byte array of elements with a specific alignment requirement when passing structs.

When passing struct SingleU8{a: u8} it is listed as .param .align 1 .b8 <name>[1]

Primitive integer types are mainly passed as the actual type they are, except when they are of less size than 32 bits. All integers of less than 32bits will be padded to a 32bit integer. This is somewhat unfortunate, but also just something we have to deal with.

When passing a u8 it will then be listed as .param .b32 <name>. No explicit alignment is required as the datatype alignment is used.

what exactly does that look like? cast will return things in an LLVM struct if prefix contains something, so I would expect that for the struct with a u8 field, you want Cast where prefix[0] is Some(Reg { kind: integer, size: 1 }) and everything else is None. (If you only set the rest it will use an array, or the underlying type for single-element arrays.

When I try to cast to a single 8-bit register it passes it as a u8 instead of the SingleU8. That is, as you say, due to it passing it as the underlying type of single-element arrays.


I also don't know much about on_stack. The fact that that is part of Indirect makes no sense to me as there's not actually a pointer being passed -- this is probably some LLVM stuff leaking through. Is there a way to tell LLVM "return this as a blob of N bytes" that then gets the right signature in PTX?

I have had great problems with figuring out how Indirect and on_stack fits together. My partial conclusion was that when on_stack is set it's really not Indirect anymore. I would have to investigate more to understand how it maps to LLVM.

I think my next point of action in this task will be to look into the LLVM output from clang when compiling these functions. Perhaps that can guide what we need to output to LLVM from Rust. I guess a problem arise if that is exactly what we do generate when using PassMode::Direct. I wouldn't be entirely surprised if that was the case.

@RalfJung
Copy link
Member

.param .align 1 .b8 <name>[1]

So if this was a struct with an i32 field it would be [4] at the end to indicate an array of length 4?

When I try to cast to a single 8-bit register it passes it as a u8 instead of the SingleU8. That is, as you say, due to it passing it as the underlying type of single-element arrays.

How are you setting up the cast?
When the rest is a u8 then it will indeed become a single u8. But when one prefix element is a u8, that should become a struct for LLVM, which is exactly the same as with Direct.

@kjetilkjeka
Copy link
Contributor Author

So if this was a struct with an i32 field it would be [4] at the end to indicate an array of length 4?

Yes, this is correct.

How are you setting up the cast?
When the rest is a u8 then it will indeed become a single u8. But when one prefix element is a u8, that should become a struct for LLVM, which is exactly the same as with Direct.

Ahh, that is not intuitive at all for me, but it works indeed. I made the return tests pass with this code.

   if ret.layout.is_aggregate() {
        let align_bytes = ret.layout.align.abi.bytes();
        let size = ret.layout.size;

        let reg = match align_bytes {
            1 => Reg::i8(),
            2 => Reg::i16(),
            4 => Reg::i32(),
            8 => Reg::i64(),
            16 => Reg::i128(),
            _ => unreachable!("Align is given as power of 2 no larger than 16 bytes"),
        };

        if align_bytes == size.bytes() {
            ret.cast_to(CastTarget {
                prefix: [Some(reg), None, None, None, None, None, None, None],
                rest: Uniform::new(Reg::i8(), Size::from_bytes(0)),
                attrs: ArgAttributes {
                    regular: ArgAttribute::default(),
                    arg_ext: ArgExtension::None,
                    pointee_size: Size::ZERO,
                    pointee_align: None,
                },
            });
        } else {
            ret.cast_to(Uniform::new(reg, size));
        }
    } else if ret.layout.size.bits() < 32 {
        ret.extend_integer_width_to(32);
    }

I will most likely get a chance to create a PR on Friday. Thanks for that trick.

@RalfJung
Copy link
Member

Yeah to be clear I have no idea how CastTarget is intended to be used, I just read the code for how it gets translated into an LLVM type, and then found a way to make that generate the LLVM type you want, for the one case you mentioned. I have no idea how to turn that into a general principle that works for all types.

You seem to make a special case for align_bytes == size.bytes(), but doesn't that mean that for a type with two i32 fields we'll get the wrong encoding? Specifically, currently that will be { i32, i32 } (a struct with two i32 fields), with your patch that will be [2 x i32] (a two-element array of i32). Maybe those generate the same PTX output, maybe not -- please carefully check that.

@kjetilkjeka
Copy link
Contributor Author

You seem to make a special case for align_bytes == size.bytes(), but doesn't that mean that for a type with two i32 fields we'll get the wrong encoding? Specifically, currently that will be { i32, i32 } (a struct with two i32 fields), with your patch that will be [2 x i32] (a two-element array of i32). Maybe those generate the same PTX output, maybe not -- please carefully check that.

I have specifically added a test for this type in #125980. It seems to produce correct results.

workingjubilee added a commit to workingjubilee/rustc that referenced this issue Jun 12, 2024
…ssmode, r=davidtwco

Nvptx remove direct passmode

This PR does what should have been done in rust-lang#117671. That is fully avoid using the `PassMode::Direct` for `extern "C" fn` for `nvptx64-nvidia-cuda` and enable the compatibility test. `@RalfJung` [pointed me in the right direction](rust-lang#117480 (comment)) for solving this issue.

There are still some ABI bugs after this PR is merged. These ABI tests are created based on what is actually correct, and since they continue passing with even more of them enabled things are improving. I don't have the time to tackle all the remaining issues right now, but I think getting these improvements merged is very valuable in themselves and plan to tackle more of them long term.

This also doesn't remove the use of `PassMode::Direct` for `extern "ptx-kernel" fn`. This was also not trivial to make work. And since the ABI is hidden behind an unstable feature it's less urgent.

I don't know if it's correct to request `@RalfJung` as a reviewer (due to team structures), but he helped me a lot to figure out this stuff. If that's not appropriate then `@davidtwco` would be a good candidate since he know about this topic from rust-lang#117671

r​? `@RalfJung`
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jun 12, 2024
Rollup merge of rust-lang#125980 - kjetilkjeka:nvptx_remove_direct_passmode, r=davidtwco

Nvptx remove direct passmode

This PR does what should have been done in rust-lang#117671. That is fully avoid using the `PassMode::Direct` for `extern "C" fn` for `nvptx64-nvidia-cuda` and enable the compatibility test. `@RalfJung` [pointed me in the right direction](rust-lang#117480 (comment)) for solving this issue.

There are still some ABI bugs after this PR is merged. These ABI tests are created based on what is actually correct, and since they continue passing with even more of them enabled things are improving. I don't have the time to tackle all the remaining issues right now, but I think getting these improvements merged is very valuable in themselves and plan to tackle more of them long term.

This also doesn't remove the use of `PassMode::Direct` for `extern "ptx-kernel" fn`. This was also not trivial to make work. And since the ABI is hidden behind an unstable feature it's less urgent.

I don't know if it's correct to request `@RalfJung` as a reviewer (due to team structures), but he helped me a lot to figure out this stuff. If that's not appropriate then `@davidtwco` would be a good candidate since he know about this topic from rust-lang#117671

r​? `@RalfJung`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
4 participants