Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for #[instruction_set] attribute (RFC 2867) #74727

Closed
2 of 3 tasks
nikomatsakis opened this issue Jul 24, 2020 · 32 comments · Fixed by #102458
Closed
2 of 3 tasks

Tracking Issue for #[instruction_set] attribute (RFC 2867) #74727

nikomatsakis opened this issue Jul 24, 2020 · 32 comments · Fixed by #102458
Labels
A-codegen Area: Code generation B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-isa_attribute Related to #[instruction_set] attribute introduced by RFC 2867 finished-final-comment-period The final comment period is finished for this PR / Issue. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state S-tracking-ready-to-stabilize Status: This is ready to stabilize; it may need a stabilization report and a PR T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jul 24, 2020

This is a tracking issue for the RFC "#[instruction_set(...)] attribute" (rust-lang/rfcs#2867).

The feature gate for the issue is #![feature(isa_attribute)].

The lang-team liaison for this feature is @pnkfelix.

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also uses as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Unresolved Questions

  • How do we ensure that instruction_set and inline assembly always interact correctly? This isn't an implementation blocker but needs to be resolved before Stabilization of the attribute. (Currently, LLVM will not inline a32 functions into t32 functions and vice versa, because they count as different code targets. However, this is not necessarily a guarantee from LLVM, it could just be the current implementation, so more investigation is needed.)

Implementation history

@nikomatsakis nikomatsakis added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Jul 24, 2020
@jonas-schievink jonas-schievink added A-codegen Area: Code generation O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state labels Jul 24, 2020
@nikomatsakis nikomatsakis added the F-isa_attribute Related to #[instruction_set] attribute introduced by RFC 2867 label Jul 28, 2020
@xd009642
Copy link
Contributor

I'm going to take a shot at implementing this RFC. I've already got an initial idea of where to start, adding it to CodegenFnAttrs in librustc_middle. From there I imagine it's just a case of getting it to the llvm codegen. I'll likely open a WIP PR and ask any questions I have there once I've made some progress

@xd009642
Copy link
Contributor

xd009642 commented Sep 1, 2020

@pnkfelix do you mind if I message you for guidance once I have a more concrete idea of what my questions are?

@jonas-schievink
Copy link
Contributor

jonas-schievink commented Sep 2, 2020

I'll write up some pointers.

  • The symbols to be used in the attribute syntax have to be added to symbol.rs.
  • The instruction_set attribute has to be added as a gated attribute in builtin_attrs.rs
  • A new instruction_set field needs to be added to CodegenFnAttrs, and a new InstructionSet enum needs to be added.
  • The codegen_fn_attrs query has to look at the attributes and compute the right value for CodegenFnAttrs. It also has to check:
    • That the instruction_set feature flag is enabled. (actually, adding the attribute might already do this)
    • That the current compilation target (available in tcx.sess.target.target) supports switching between ARM and Thumb mode (notably, this is the case for the thumbv4t-... target, but not for the thumbv6/7/8m-... targets; I'm not sure what the best way to codify this is, maybe by extending target specs).
    • That the attribute is syntactically correct.
    • ...and otherwise, emit an appropriate error (that should also be tested for).
  • The LLVM codegen backend has to be adjusted to apply the target feature attribute to the function, this happens in attributes::from_fn_attrs.
    Thumb mode is enabled on ARM targets by enabling the thumb-mode target feature. If thumb mode is to be disabled, the target feature has to be turned off by prefixing it with - instead of +. This can be probably be done here.

@xd009642
Copy link
Contributor

xd009642 commented Sep 2, 2020

So I've added the symbols to symbol.rs, and instruction_set to CodegenFnAttrs added an InstructionSet enum, updated codegen_fn_attrs (although not all of the sub-bullet points, it's very happy path only right now).

I'll look at the target stuff next since that seems to be the biggest part I'm missing in codegen_fn_attrs and I guess talk to @Lokathor to see what thumb targets I need to do this for.

Thanks that fills in a lot of my open questions 😄

@Lokathor
Copy link
Contributor

Lokathor commented Sep 3, 2020

I don't know the best way to, from inside the compiler, tell if the target is an ARM arch target. And, even then, to tell if the default codegen will be a32 or t32.

I can say that, looking at the platform support page, all the tier2 and tier3 targets that start with arm are ARM with a32 as the default codegen, and then all the targets in tier2 and tier3 that start with thumb are ARM with t32 as the default codegen. For example, thumbv4t-none-eabi, thumbv6m-none-eabi, and so on.

Ideally this works for any target_arch="arm" target, even a custom target profile.

@ketsuban
Copy link
Contributor

ketsuban commented Sep 3, 2020

If there's already a way to tell a target that starts with arm from one that starts with thumb (which tells you whether the default is -thumb_mode or +thumb_mode) I think I agree with @jonas-schievink that the best way forward is probably a new boolean field (has_thumb_interworking?) in the target spec.

@xd009642
Copy link
Contributor

xd009642 commented Sep 6, 2020

So I've attempted to do all the points laid out above to the best of my effort (minus the tests for appropriate errors as I had a feeling my errors are likely to change). There's probably a lot to change as this is my first time working on the compiler, but I think it's ready for some initial feedback - #76260 .

@joshtriplett
Copy link
Member

Current status from today's @rust-lang/lang meeting: updating to account for targets that only support one or another instruction set.

@xd009642
Copy link
Contributor

@joshtriplett are there minutes for the meeting? Or should I just wait till the RFC or issue is updated and adjust the PR?

@Lokathor
Copy link
Contributor

@xd009642 I was at the meeting today and attempted to convey the status, though I may have done so inexpertly.

What I said was that:

  • Your PR is waiting for some T-compiler feedback about how to handle targets that can't switch their instruction set.
  • That probably the solution would be to add an extra flag about if the target allows instruction set switching.

I don't think that the RFC needs an update at all, or that your PR needs any changes other than whatever T-compiler asks for. Feel free to work out any blockers/concerns on the PR with them, and also feel free to comment anything more if I missed a detail.

@xd009642
Copy link
Contributor

So far there's an additional has_thumb_interworking option on the targets, and only ones with that as true can have the instruction_set applied otherwise it's an unsupported target (false being the default). But yeah I need some CI feedback and potentially some feedback on the UI tests

bors added a commit to rust-lang-ci/rust that referenced this issue Oct 9, 2020
Implementation of RFC2867

rust-lang#74727

So I've started work on this, I think my next steps are to make use of the `instruction_set` value in the llvm codegen but this is the point where I begin to get a bit lost. I'm looking at the code but it would be nice to have some guidance on what I've currently done and what I'm doing next 😄
@Lokathor
Copy link
Contributor

Lokathor commented Oct 9, 2020

Update: PR merged, starting tomorrow you can use it in your nightly work, presumably we'll have a trial period of at least few releases before considering possible stabilization.

@lifning
Copy link

lifning commented Nov 25, 2020

Is it possible for core library functions to get inlined despite not being specified as one ISA or the other?
My use case is wanting to possibly take advantage of instructions like a32's ldmia/stmia for some heavy lifting in a perf-critical bit of interrupt handler code living in IWRAM, but looking at rustc's emitted asm, i notice usages of things like core::array::<impl core::ops::index::Index<I> for [T; N]>::index still require branching into thumb_func's that live in ROM.

@ketsuban
Copy link
Contributor

The problem as I understand it is there's only one (T32) copy of the standard library and Rust just emits calls into it despite the function being A32, which LLVM then refuses to inline.

A solution could be for Cargo's unstable build-std attribute to be aware of has_thumb_interworking and include a copy of the standard library compiled for the other code target.

@lifning
Copy link

lifning commented Nov 25, 2020

There're definitely other things besides core that the lack of ability to inline kind of makes a drag too, such as the phantom_fields defined by https://docs.rs/crate/gba-proc-macro (which i think my current best bet is to just.. duplicate each function into foo() and a32_foo() versions and call the latter in my code, but I don't like the idea of forking every dependency crate and making duplicate a32 versions of every stub function i need to call)

For this case it would definitely benefit me to be able to compile arbitrary dependencies in both modes too, though I'm not sure what that would look like ideally. A lot of code here isn't currently delivering on the zero-cost abstraction promise, in other words :P

@Lokathor
Copy link
Contributor

Lokathor commented Apr 7, 2021

Update 2021-04-06

Picking a target that supports using both a32 and t32 code, and that is tier2 so we can see it on goldbolt, please look at the following code: https://rust.godbolt.org/z/9v914d8fe https://rust.godbolt.org/z/91cPYq8bn. Note the build args.

  • -Copt-level=3: this is as release as it gets
  • --target armv5te-unknown-linux-gnueabi: all code (including core) is a32 by default, and t32 can be used.

The exact details of the ARM calling conventions aren't quite important here, but the thing to note within the assembly output is:

  • vol_read_a32 gets inlined. It uses the read_volatile method which is also inlined. You get a fully zero-cost ldr r4, [r0].
  • vol_read_t32 does not get inlined. It uses the read_volatile method which doesn't get inlined, you get a shockingly expensive double function call for what should be a single instruction.

Meanwhile let's look at if we can paper over the issue with inline assembly:

  • asm_read_a32 is inlined, but now it's two instructions instead of one. more on that in a moment.
  • asm_read_t32 is not inlined but (because it's not calling to an intrinisc) at least it's only a single function call and not a double function call.
  • A moment ago I said there's an extra move instruction. Depending on how you jigger the assembly you can get that move extra move either in the asm_read_a32 or the asm_read_t32. Assuming that you use the same assembly pattern for both functions, whichever pattern you pick you have to slow one of them down by a single move. That's not zero cost.

"Loka that's just one example though?" Well, sure, but it's a very important one because being able to use these zero-cost intrinsics is the sales pitch of rust. Here's another example: https://rust.godbolt.org/z/bG3bhTd4z:

  • core::iter::traits::iterator::Iterator::copied: does literally nothing at all.
  • core::slice::<impl [T]>::iter: a single add operation.
  • core::convert::num::<impl core::convert::From<u8> for u32>::from: this does a useless input&255 op as a way to turn u8 to u32 for reasons I can't even guess at. Note that if we remove the instruction_set attribute from this function the properly inlined version doesn't have the 255 anywhere in the code, so somehow llvm is just tripping itself over nothing at all.

Summary

  • Does It Work? Yes
  • Is It Zero Cost? Not even remotely
  • Is It Fit To Be Stable? Interesting question I'm so glad you asked. I think arguably yes. Like the one situation where people are using this (folks doing GBA stuff) is already stuck on Nightly until either the thumbv4t-none-eabi target becomes Tier 2 so that we don't have to use build-std, or build-std becomes stable. So, making this stable immediately isn't going to suddenly unblock Stable GBA developments. However... I don't think that there's a way to improve the developer interface for this attribute. The interface between the programmer and the compiler of "this is how you declare this quality for the function you're describing to the compiler" is so simple that there's not really any other way that it could work. The performance problems are "just" a codegen quality issue. That's a very load-bearing "just", but it's true. The problem is that rust relies a crazy amount on inlining, more than you ever normally think about, and so for the codegen here to get better we need to make inlining happen properly for functions using the non-default instruction set. Maybe more stuff gets inlined in MIR, maybe rustc looks at the call-graph of alternate instruction set functions and then for each thing it calls that doesn't have inline assembly and doesn't have a specific instruciton_set assigned it just emits LLVM IR for the function twice, once in a32 and once in t32, and then LLVM can do its thing at the cost of a little compile time. Whatever the fix ends up being, I don't think it would change the current developer interface for using the attribute in code.
  • Overall Current Status: It was mentioned in a lang team meeting that some things can be rated as +0 instead of +1, and right now this is a good example of a +0 unstable feature.

@repnop
Copy link
Contributor

repnop commented Apr 7, 2021

one thing for consideration is having this not be its own stand-alone attribute applied to functions, but being an available option as a #[repr(instruction_set = "...")] instead since its affecting the representation of the function. food for thought :)

@nikomatsakis
Copy link
Contributor Author

@Lokathor any update this month? Thanks for the update last month, we have a meeting next Wed and we'll talk it over.

@Lokathor
Copy link
Contributor

I did have a build that managed to create an illegal instruction in debug mode but not release mode, though that might be related to the incremental compilation bug that was recently resolved. I haven't had time to check again since that announcement.

@Lokathor
Copy link
Contributor

Lokathor commented Jun 21, 2022

@nikomatsakis Having worked with this some more, I can say that things are ready for stabilization.

the code that you get out of this is pretty bad, but that's just a quality-of-life issue. The interface for how a user marks functions with this attribute wouldn't change at all even if the compiler gets smarter about codegen.

EDIT: particularly, this works with inline_asm, which was the major concern way back when.

@joshtriplett joshtriplett added the I-lang-nominated Nominated for discussion during a lang team meeting. label Jun 21, 2022
@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/lang meeting. Kicking off stabilization of this seems fine. @pnkfelix, as the liaison, want to start a stabilization FCP if you concur?

@joshtriplett joshtriplett added the S-tracking-ready-to-stabilize Status: This is ready to stabilize; it may need a stabilization report and a PR label Jul 13, 2022
@joshtriplett
Copy link
Member

ping @pnkfelix

@pnkfelix
Copy link
Member

pnkfelix commented Aug 2, 2022

Okay, I reviewed the update from last year and fixed the first godbolt link (asm! has stabilized since it had been posted).

Its an interesting question, whether to stabilize something that has clear performance issues (in terms of not meeting typical zero-cost expectations). But that is a quality-of-life issue; if users can find utility in this feature even in its current state, then that's an argument for stabilization.

I'll go ahead and fire off a stabilization FCP.

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Aug 2, 2022

Team member @pnkfelix has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Aug 2, 2022
@xd009642
Copy link
Contributor

xd009642 commented Aug 2, 2022

First stabilisation report so let me know if I've done anything wrong 🙏

Request for Stabilization

Summary

The instruction set attribute provides a new function attribute for architectures which support more than one form of machine code. Currently, this is just implemented for CPUs in the ARM family which support a program switching between "ARM code" and "Thumb code". Although other instruction sets could be added in future as unstable features.

These two instruction sets appear as follows:

Arm code:

#[instruction_set(arm::a32)]
fn arm_code_fn() {
    // function compiled using arm code
}

Thumb code:

#[instruction_set(arm::t32)]
fn thumb_code_fn() {
    // function compiled using thumb code
}

Documentation

Currently, the only usage is homebrew GameBoy Advance development for ARMv4t and the RFC. Given the number of uses of this feature is currently limited to these people this is potentially fine.

Tests

Unresolved Questions

  • Code generation is poor, what work can improve this in future is unresolved (to my knowledge)

@Mark-Simulacrum Mark-Simulacrum removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 16, 2022
@nikomatsakis
Copy link
Contributor Author

@rfcbot concern documentation-pr

We need an open PR adding this to the Rust reference!

@xd009642
Copy link
Contributor

@nikomatsakis PR opened with an initial go at the docs: rust-lang/reference#1253

@nikomatsakis
Copy link
Contributor Author

@rfcbot resolve documentation-pr

Thanks @xd009642 !

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Aug 31, 2022
@rfcbot
Copy link

rfcbot commented Aug 31, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Aug 31, 2022
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Sep 10, 2022
@rfcbot
Copy link

rfcbot commented Sep 10, 2022

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@rfcbot rfcbot added the to-announce Announce this issue on triage meeting label Sep 10, 2022
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 22, 2022
@bors bors closed this as completed in 1286ee2 Nov 5, 2022
RalfJung pushed a commit to RalfJung/miri that referenced this issue Nov 10, 2022
Stabilize the `instruction_set` feature

Closes rust-lang/rust#74727
FCP is complete on rust-lang/rust#74727 (comment)
r? `@pnkfelix` and/or `@nikomatsakis`
cc `@xd009642`

Signed-off-by: Yuki Okushi <jtitor@2k36.org>
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 20, 2024
Stabilize the `instruction_set` feature

Closes rust-lang/rust#74727
FCP is complete on rust-lang/rust#74727 (comment)
r? `@pnkfelix` and/or `@nikomatsakis`
cc `@xd009642`

Signed-off-by: Yuki Okushi <jtitor@2k36.org>
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Stabilize the `instruction_set` feature

Closes rust-lang/rust#74727
FCP is complete on rust-lang/rust#74727 (comment)
r? `@pnkfelix` and/or `@nikomatsakis`
cc `@xd009642`

Signed-off-by: Yuki Okushi <jtitor@2k36.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-isa_attribute Related to #[instruction_set] attribute introduced by RFC 2867 finished-final-comment-period The final comment period is finished for this PR / Issue. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state S-tracking-ready-to-stabilize Status: This is ready to stabilize; it may need a stabilization report and a PR T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.