Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize volatile copy and set functions #2728

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dancrossnyc
Copy link

@dancrossnyc dancrossnyc commented Jul 20, 2019

Rendered

Stabilize the volatile_copy_memory, volatile_copy_nonoverlapping_memory
and volatile_set_memory intrinsics as ptr::copy_volatile,
ptr::copy_nonoverlapping_volatile and ptr::write_bytes_volatile,
respectively.

I initiated discussion on this back in June here: https://internals.rust-lang.org/t/pre-rfc-stabilize-volatile-copy-and-set-functions/9872

Unfortunately, it fell off my plate, but I'd like to dust if off and move it forward.

Signed-off-by: Dan Cross cross@gajendra.net

text/0000-volatile-copy-and-set.md Show resolved Hide resolved
text/0000-volatile-copy-and-set.md Outdated Show resolved Hide resolved
text/0000-volatile-copy-and-set.md Outdated Show resolved Hide resolved
@Centril Centril added A-intrinsic Proposals relating to intrinsics. A-volatile Proposals related to volatile. A-machine Proposals relating to Rust's abstract machine. A-raw-pointers Proposals relating to raw pointers. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Jul 20, 2019
@Centril
Copy link
Contributor

Centril commented Jul 20, 2019

cc @rust-lang/wg-unsafe-code-guidelines

@RalfJung
Copy link
Member

RalfJung commented Jul 21, 2019

Can't these new methods be implemented (in user code) in terms of the ones we are already exposing? If yes, the RFC should say why new APIs should be added to libstd, given the motivation is an extremely niche use-case. If no, the RFC should explain why the "obvious" implementation in terms of read_volatile and write_volatile is not correct.

EDIT: Ah, I think I found the answer in the "alternatives" section:

inelegant, less likely to perform well, and opens up the possibility of bugs

At least the first and third point can easily be taken care of by writing this in a crate once and auditing it properly. In terms of performance, compilers don't touch volatile accesses anyway (that's their point), so this seems a questionable argument to me.


Finally, RFC 1467 was in mild error in asserting that no analogue
for the proposed intrinsics exist in C. `memcpy`, `memmove` and `memset`
on volatile-qualified data provide this functionality in standard C.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a source for this. A quick search only turns up information that says the opposite, e.g. here:

memcpy is incompatible with volatile objects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looking at this LLVM IR it does not seem like the memcpy is volatile in any sense, or am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right and I was wrong.

Indeed, the situation in C is even worse than me simply being incorrect; if I'm reading the standard right, access a volatile-qualified datum via a non-const-qualified Lvalue is undefined behavior. So, for example, a single element struct containing a volatile array, or casting away the volatile, might be UB (in the C sense); yuck. One of the many reasons we are switching to Rust....

I added that language to try and make this proposal independent of the details of LLVM's implementation. I've attempted a revision to make the intent clearer; I'm still not quite happy with it, but I'm also going to be out of town for a few days, so forgive me if I don't revise that right away. :-)

Regarding the other comment:

Can't these new methods be implemented (in user code) in terms of the ones we are already exposing? If yes, the RFC should say why new APIs should be added to libstd, given the motivation is an extremely niche use-case. If no, the RFC should explain why the "obvious" implementation in terms of read_volatile and write_volatile is not correct.

EDIT: Ah, I think I found the answer in the "alternatives" section:

inelegant, less likely to perform well, and opens up the possibility of bugs

At least the first and third point can easily be taken care of by writing this in a crate once and auditing it properly. In terms of performance, compilers don't touch volatile accesses anyway (that's their point), so this seems a questionable argument to me.

I removed the mention of efficiency. I think the main issue is preventing duplication and potential bugs. Memset and memcpy are easy; overlapping memmove is a bit weirder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh one other small note about performance: I think the compiler is in an interesting position where it can do things like loop unrolling while respecting icache alignment and things like that. As long as the semantics of the abstract machine are preserved, I don't see any reason why it couldn't do things like that in a way that would be, perhaps, more challenging with a hand-rolled loop: particularly if that loop lived in another crate.

Copy link
Member

@RalfJung RalfJung Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that language to try and make this proposal independent of the details of LLVM's implementation.

That's totally great, but the RFC shouldn't state incorrect things about C/C++. ;)

Thanks for updating the RFC!

Regarding the other comment:

Discussion is way easier if you keep the topics within a thread instead of jumping between them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that language to try and make this proposal independent of the details of LLVM's implementation.

That's totally great, but the RFC shouldn't state incorrect things about C/C++. ;)

Precisely why I acknowledged my error and removed that wording. :-)

Thanks for updating the RFC!

Regarding the other comment:

Discussion is way easier if you keep the topics within a thread instead of jumping between them.

Indeed. Unfortunately, the Github UI would not allow me to respond to that thread.

@spunit262
Copy link

In terms of performance, compilers don't touch volatile accesses anyway (that's their point), so this seems a questionable argument to me.

Which is exactly why these intrinsics are needed. They are a single operation so there's nothing to optimized, but a hand written loop does need to be optimized, but can't as the writes are volatile.

```

Finally, it is important that this proposal not tie the Rust language
to the semantics of any given implementation, such as thos defined by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to the semantics of any given implementation, such as thos defined by
to the semantics of any given implementation, such as those defined by

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

and `write_volatile`: resulting loads and stores cannot be elided, and
the relative order to loads and stores cannot be reordered with respect
to one another, though other operations can be reordered with respect
to volatile operations.
Copy link
Member

@RalfJung RalfJung Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure what this long-winding paragraph is supposed to tell me. :) The open questions around volatile are not new, but can't we specify the new operations entirely in terms of the old? That would avoid even having to talk about volatile semantics in this RFC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what you mean. Do you mean that we should define the semantics of these operations to match those of write_volatile and read_volatile? Or that we should mention the LLVM semantics? When I put the proto-RFC on internals, I was asked specifically to avoid tying it to LLVM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we should define the semantics of these operations to match those of write_volatile and read_volatile?

That. Volatile accesses are not new to Rust, so this RFC shouldn't have to deal with the complexity of specifying them.

interfaces for the `volatile_copy_memory` or `volatile_set_memory`
intrinsics, as they were "not used often" nor provided in C.
However, when writing low-level code, it is sometimes also useful
to be able to execute volatile copy and set operations.
Copy link
Member

@RalfJung RalfJung Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should state, early in the motivation, that volatile copy and set operations are already possible in Rust, but have to be open-coded. The way this is written now, it sounds as if currently there was no way to do volatile copy or set in Rust.

This RFC, if I am understanding correctly, is entirely a "convenience"-RFC: it's about adding APIs to libcore that could already be implemented in a user library. So this should be clear from the start. Then the motivation should argue why this is an API that should move to libcore.

For such APIs it helps tremendously if there is an (ideally already tested and used) implementation in a user crate that it can point to. Even better if you can point at instances where people tried to implement this but got it wrong.

@RalfJung
Copy link
Member

@spunit262

Which is exactly why these intrinsics are needed. They are a single operation so there's nothing to optimized, but a hand written loop does need to be optimized, but can't as the writes are volatile.

The intrinsic is a single operation in the IR that lowers to lots of assembly, and likely pretty much the same assembly as a hand-coded loop. I see no reason why the intrinsic should perform any better. Without an experiment demonstrating a performance difference between using the intrinsic and using the open-coded version, I do not think you can reasonably claim that there will be any difference.

@RustyYato
Copy link

RustyYato commented Jul 22, 2019

@RalfJung what about unaligned volatile operations, right now {read, write}_volatile assume that the input pointer is aligned, and you would need copy_volatile at least to get unaligned volatile reads and writes and (copy_nonoverlapping_volatile to get efficient unaligned volatile reads and writes).

@hanna-kruppe
Copy link

While it's not properly documented (nothing about these intrinsics is), they do in fact lower to IR that requires the pointers to be aligned. This is also consistent with now the non-volatile equivalents, and pointer functions in general, work.

Furthermore, you can also get a version without alignment requirements (no matter whether it's an intrinsic or implemented in library code) by treating the memory as having n * size_of::<T>() elements of type u8 instead (or MaybeUninit<u8> to be on the safe side w.r.t. uninitialized memory). That does byte-sized accesses instead of T-sized ones (if the latter are possible at all on the hardware), but nothing about volatile_{copy,set}_memory (neither current docs, nor LLVM's docs, nor this RFC) guarantees the access size anyway, so if you're worried about tearing, those functions don't help you anyway.

@spunit262
Copy link

@RalfJung
I've already looked at the assembly, these intrinsics lower to a call to memset/memcpy/memmove or a small number of loads and stores.

sequence of sending IPIs makes no reference to `proto_sipi_page` and
since the `bytes` field is not visible outside of `new`, this
illustrates a situation in which the compiler _could_ theoretically
elect to elide the copy.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the problem isn't with the memcpy, but with sending the IPI. The IPI needs to be a release operation (from the hardware's view and the compiler's). I'm not familiar with x86 and you haven't given the implementation of send_sipi so I don't know if it is already. If it's not, you might just need some sort of barrier between the mempcy and sending the IPI.

Copy link

@comex comex Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the motivating example is flawed.

In this case, copy_proto_page_to_phys_mem is apparently an extern function that's called with a pointer to proto_sipi_page, so the compiler has no choice but to ensure the contents of proto_sipi_page were actually written before the call. The fact that the pointer is first casted to usize doesn't change that – since it's first casted to a raw pointer, Stacked Borrows would still allow copy_proto_page_to_phys_mem to access it. So the example as written is 100% well-defined even without a volatile copy.

On the other hand, if this were done in a manner incompatible with Stacked Borrows, such as (I think?) by transmuting the reference to usize, then making the copy volatile would not help.

For one thing, the example code does not copy directly from INIT_CODE to proto_sipi_page. It copies to bytes, a local variable in the new function; that variable is then moved into a SIPIPage, which is a memcpy, and then moved to the caller, which is another memcpy. If you switched to a volatile copy, you'd be ensuring the data was actually written to some stack allocation representing bytes... but not that bytes was actually copied to proto_sipi_page!

This distinction not just theoretical but, unfortunately, real. If you compile the example in debug mode, the assembly emitted for SIPIPage::new contains a call to core::intrinsics::copy followed by two compiler-inserted calls to memcpy. Ideally LLVM's optimizer would be able to elide both of those memcpys, but if you compile in release mode, it actually only elides one of them. There is still an intermediate copy between bytes and proto_sipi_page.

You could fix the example in practice by directly copying from INIT_CODE to proto_sipi_page. But that wouldn't fix the UB in theory (that would result if you violated Stacked Borrows; as I said, the example as written is fine). If the optimizer for some reason thought that proto_sipi_page was unused, it would be free to, e.g., reuse the stack slot for other data, even if you had previously made volatile writes into it.

More generally, I'm not sure there's any way to redeem the example to actually require volatile memcpy. If the Rust code mapped the physical memory and copied the data directly to that mapping, then you'd finally be at a point where release operations matter (as mentioned by @parched). But those are easily obtained. If the implementation send_sipi is an extern function, calling it is a compiler release operation). If it's based on asm!, you would have to mark the assembly sequence with the memory clobber (to indicate that, in the words of the GCC manual, it "performs memory reads or writes to items other than those listed in the input and output operands"), which is also a compiler release operation. The one catch is that you'd have to make sure you didn't hold a mutable borrow of the data across the call, since then it would be UB for the function or asm block to read from it. However, using volatile to write the data in the first place wouldn't fix that UB. At best it would make it slightly harder for the compiler to exploit.

I'm happy to see volatile memcpy stabilized, but it seems more useful for something like IPC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I dislike contrived code examples in places like this. :-)

What do you mean about utility for IPC, precisely? As in shared-memory message passing or something like that?

Copy link

@comex comex Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Caveat: In this recent thread, there was no real consensus on whether or not using volatile for shared memory in Rust is recommended - or even legal. But it is what C++ people recommend. If volatile is suited for shared memory, then volatile_copy would be an ideal way to safely copy large buffers into or out of shared memory.

impl SIPIPage {
// Note that the _only_ operation on the `bytes` field
// of `SIPIPage` is in `new`. The compiler could, in
// theory, elide the `copy`.
Copy link
Contributor

@gnzlbg gnzlbg Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it cannot: unsafe code below calls copy_proto_page_to_phys_mem which reads from this field via a pointer to SIPIPage.

EDIT: @comex expands on this below: https://github.com/rust-lang/rfcs/pull/2728/files#r307566402

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation is unclear. I see no reason why this cannot be implemented in a library.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 24, 2019

In terms of performance, compilers don't touch volatile accesses anyway (that's their point), so this seems a questionable argument to me.

Which is exactly why these intrinsics are needed. They are a single operation so there's nothing to optimized, but a hand written loop does need to be optimized, but can't as the writes are volatile.

A volatile memcpy is a single operation in the IR, but it is still going to lower to N distinct reads and writes, which is the exact same thing a hand-coded performant volatile memcpy implementation would lower to.

@KodrAus KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020
@scottmcm
Copy link
Member

We discussed this in the lang meeting today. The consensus was that since this is currently possible to be written using existing stable volatile things (https://github.com/rust-lang/rfcs/pull/2728/files#r305730848), that there's no lang impact here. As such, untagging lang and leaving to libs to decide whether this is worth doing.

@scottmcm scottmcm removed the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 30, 2020
@RalfJung
Copy link
Member

We discussed this in the lang meeting today. The consensus was that since this is currently possible to be written using existing stable volatile things (https://github.com/rust-lang/rfcs/pull/2728/files#r305730848), that there's no lang impact here. As such, untagging lang and leaving to libs to decide whether this is worth doing.

I hope what I said over there back then is correct, given that nobody responded there.^^

@MikailBag
Copy link

MikailBag commented Sep 30, 2020

You are talking about https://doc.rust-lang.org/stable/std/ptr/fn.write_volatile.html (stable since 1.9.0), aren't you?

@Lokathor
Copy link
Contributor

Lokathor commented Jul 21, 2021

Any sort of atomic-like system for volatile accesses would maybe ensure that there's no tearing, and definitely ruin the times of embedded developers who are currently using repr(transparent) to simply declare volatile accesses of more structured types than raw integers, and it would still be exactly as unsafe and device specific as what we have now. Overall, a big step backwards.

Separately, if we should have these new intrinsics, I don't have a strong opinion.

Ralf is certainly not wrong that the access width must be specified for these to be broadly useful. As one example, I work with a device where part of the mmio must be accessed with u8, and another part must not be accessed with u8, and then the rest doesn't care much. In the absence of info about what will happen with these intrinsics I'd just have to write my own loops anyway.

@Diggsey
Copy link
Contributor

Diggsey commented Jul 21, 2021

volatile accesses of more structured types than raw integers

the access width must be specified for these to be broadly useful

How do you reconcile these two statements? Do developers using "structured types" for volatile access expect that a single access width is used for the whole structure? Or are they expecting that individual fields of the structure will be written with separate access widths?

If the former, then I don't really understand why switching to a system like we have for atomics would be a step backwards, since you can implement the "generic volatile copy" on top of eg. volatile 32-bit accesses, and get the same guarantees? If the latter, then isn't that code just wrong, since those guarantees are not provided?

@Lokathor
Copy link
Contributor

repr(transparent) wrapper structs are used over top of a raw integer type, and then the appropriate methods are provided for altering the bits within that integer. The volatile locations are then declared to directly be of these various wrapper types.

@RalfJung
Copy link
Member

The same would work on top of VolatileI32 etc types though, wouldn't it?

@Lokathor
Copy link
Contributor

I suppose I don't know what the exact API for that type would be, but if you're saying that someone should repr(transparent) wrap that VolatileI32 value in some new wrapper type that offers the correct bit-manipulation and also has some sort of read and write methods that pass through to read and write on the VolatileI32, that's not a good plan. Often you have the same data at multiple locations (eg: controlling a series of identical peripherals, or something like that). If more than one location stores the same data type you don't want the manipulation of that data type to be linked to a particular location. So to have a good experience there needs to be a separation between the data type that you're manipulating a value of within the CPU and a particular address which stores one of these data values.

I think the fundamental thing here is that solving "the tearing issue" is like the least important part of the entire MMIO story for Rust. I won't say that it's not a problem at all, but I will say that no actual MMIO API for a device, that's like peer reviewed for quality and such, and trusted by the community, will ever end up with declarations that are forced to tear, if they care about tearing at all. Because let's also be clear: you might not care about tearing. You might be totally fine with a tear happening. It depends on the device, it depends on the situation. I spend more of my time being concerned that a careless call to Vec::with_capacity can abort the process on accident than I do worrying about volatile being used to make a torn read.

Back to the main topic: These bulk operations are not at all "unusable" if they're allowed to change access width. They're just slightly less usable. Either way, the docs just need to specify if a width change is allowed or not. There certainly are situations that I've encountered where I've wanted the bulk operations, and where I have not cared at all about the access width. I just needed a bulk transfer to happen, and I didn't want to think about the details I wanted LLVM to do the thinking. In fact, I would have been entirely happen for LLVM to do 256 byte transfers rather than 4 byte transfers to complete the copy. Other times, I happen to know that I need it to emit precisely an STRB instruction with a given region of memory, and I wouldn't be happy with doing a bulk copy with any other width.

So just have the bulk ops say that they do or don't allow a change in access width and then we'll be fine.

@anp
Copy link
Member

anp commented Jul 22, 2021

I agree that trying to specify access width for larger ranges or composite types is fraught -- any decision is likely to block off optimizations using wider hardware in the future. For the use cases that have come up in Fuchsia we've had no need for specific access width in the bulk shared memory operations I've seen and it would be fine to leave the width deliberately unspecified/unstable. As an example, the current volatile_copy_nonoverlapping_memory intrinsic lowers to a non-elided memcpy which AFAIK makes no guarantees about access width.

IMO, if doing MMIO or another operation that relies on specific (non-)tearing behavior, it's better to use the existing intrinsics with explicit-width-and-alignment pointers. Also IMO, I suspect that Rust should not attempt to define a type like Volatile<T> for all of the spec ambiguities already cited here that AIUI currently affect C++. All of the use cases I've seen would be satisfied by abstractions built atop raw pointer access using bulk volatile access for variable-length regions.

@anp
Copy link
Member

anp commented Jul 22, 2021

One possible point of confusion that came up from an offline discussion: is it intended that the new functions would be generic or that they would take *mut u8? I had assumed the latter and was confused about the need to specify volatile behavior for complex types but I think that might have been an incorrect assumption.

@dancrossnyc
Copy link
Author

I'm sorry, this totally slipped off my radar. The main motivation when I originally wrote this was to have an unelideable memcpy available from a stable compiler by leveraging the intrinsic. Issues of tearing and so forth seem mostly orthogonal; if that matters for whatever one is trying to apply this to, perhaps one is better served with a hand-rolled loop or sequence of statements.

It made sense to me that this could be generic over some type T in the same way that std::ptr::copy is generic over T.

@RalfJung
Copy link
Member

I think the fundamental thing here is that solving "the tearing issue" is like the least important part of the entire MMIO story for Rust. I won't say that it's not a problem at all, but I will say that no actual MMIO API for a device, that's like peer reviewed for quality and such, and trusted by the community, will ever end up with declarations that are forced to tear, if they care about tearing at all.

But this is just because what Rust happens to do wrt tearing is sensible, right?

Right now, Rust could implement write_volatile and read_volatile as a sequence of individual 1-byte-accesses (maximal tearing) and this would not violate any part of the spec or docs. I would be extremely surprised if this change would actually be okay, for those MMIO users where tearing matters. In other words, right now all those MMIO users depend on unspecified behavior. That's hardly a good situation to be in.

Or am I missing something?

@RalfJung
Copy link
Member

Issues of tearing and so forth seem mostly orthogonal; if that matters for whatever one is trying to apply this to, perhaps one is better served with a hand-rolled loop or sequence of statements.

The problem is that there currently is no way to hand-roll that loop. We simply don't offer the APIs for that.

You can use read_volatile::<i32> and hope that this will be a single, non-teared 4-byte access. But hoping is the best you can do here, with current Rust.

@Lokathor
Copy link
Contributor

Right now, Rust could implement write_volatile and read_volatile as a sequence of individual 1-byte-accesses (maximal tearing) and this would not violate any part of the spec or docs.

Wellllll, so people are relying on the fact that when rustc doesn't say / care then LLVM rules generally take over. And LLVM has the rule:

the backend should never split or merge target-legal volatile load/store instructions.

https://llvm.org/docs/LangRef.html#volatile-memory-accesses (close to the end of the section).

So people are relying on rustc keeping to that, and not splitting or merging any target-legal access, such as an i32 access on most any target.

So I would advocate that Rust officially adopt at least that much of the rules to apply to all backends.

Or the backend could just error out if volatile is used and the backend doesn't support it, I guess? I mean a backend without volatile support can still build plenty of other programs, so maybe that's optional for alternative backends, whatever, that's for T-compiler to care about I think.

@anp
Copy link
Member

anp commented Jul 23, 2021

Right now, Rust could implement write_volatile and read_volatile as a sequence of individual 1-byte-accesses (maximal tearing) and this would not violate any part of the spec or docs.

I think this would be bad for MMIO use cases of those functions, although I agree with @Lokathor that in practice most users of volatile in Rust are relying on the LLVM spec as well given our underspecification (it's what I did to convince myself the copy_nonoverlapping version was what we needed). Extending ours to reflect that somehow seems reasonable to me but I could be missing something.

In my case, I'm not doing MMIO, I just want to ensure that LLVM never decides to eliminate a "dead store" to memory that it can't see used (the unelidable memcpy @dancrossnyc refers to, IIUC). For that purpose it would probably be an unacceptable performance regression to split the operations like that but I wouldn't rely on the language specification there -- I'd rely on LLVM's desire to maintain performance and the lack of access width guarantees for the copy_nonoverlapping_volatile variant. (If I'm honest, that's the only function I've needed.)

@RalfJung
Copy link
Member

RalfJung commented Jul 24, 2021

the backend should never split or merge target-legal volatile load/store instructions.

"should" seems awfully weak for something that code relies on. And "target-legal" also seems underdefined? This means when I write Rust code that uses volatile accesses I need to exactly know the target to reason about tearing -- this is completely against how Rust usually handles target differences, e.g. for Atomic* we make sure that either the target supports what the program needs or the build fails. The current volatile API does not even let the program express what it needs, though.

But I don't actually have much of a stake in the volatile game, so if the people relying on voaltile are happy with such a guarantee, that's fine for me. However, we should put this guarantee into the Rust docs, since "falling back to what LLVM guarantees" will easily give wrong results. (And I think this is T-lang territory, since it is about the guarantees the language provides.)

@Lokathor
Copy link
Contributor

I agree that "should" and "target-legal" are vague enough that I'd prefer them to be more specified if possible. However, as you've said yourself before, we can't tell people we're going to give them something that LLVM won't implement.

Maybe the LLVM devs would be willing to make it more strongly specified? I've never asked them.

The current volatile API does not even let the program express what it needs, though.

I'm not quite clear on what you mean here. The type of the pointer determines the type of the access that you want, so it seems expressible to me.

for Atomic* we make sure that either the target supports what the program needs or the build fails.

While that's true, I think that's just part of the story. To me the full story is that by offering specific AtomicFoo types we can give them a fully safe API. Creation is safe, accessing is safe. The safety isn't sensitive to any particular address being used for the atomic value. Any aligned address of the correct span is just fine, on any device that supports the atomic ops at all. We can know the minimum alignment and memory span statically without knowing anything about the device that the code will run on just by having the reference in the first place. It all lines up nicely.

Contrast with something like a VolatileU16 type. Can we make this have a fully safe API? No. We can make creation or access be safe, but we can't make both of them safe. Fine, I have to write a little unsafe code, so what are the safety rules? You have to go consult the hardware data sheets. Does knowing the target triple help me? No, the data sheets are specific to an exact device, not a general target triple. Compiling for the thumbv4t-none-eabi target does not mean that 0x0400_0000 is necessarily an MMIO address for anything at all. The code is required to be more specific than a rustc target profile ever really gets.

Which means that my summary would be:

  • If use pointers and read_volatile/write_volatile: we have to write device specific code, and the target triple doesn't help us much.
  • If we use a restrictive set of VolatileFoo types similar to the AtomicFoo types: we have to write device specific code, and the target triple doesn't help us much.

@anp
Copy link
Member

anp commented Jul 24, 2021

This might be a minefield, so grain of salt. What if Rust specified it's volatile behavior as implementation-defined and the current rustc implementation could specify how it translates accesses to LLVM IR, deferring actual semantics to that specification?

It would avoid needing to change LLVM's guarantees to be able to spec how we do volatile. Obviously would come with a number of costs that may not be worth it.

@RalfJung
Copy link
Member

I'm not quite clear on what you mean here. The type of the pointer determines the type of the access that you want, so it seems expressible to me.

The type of the pointer determines size (and alignment) of the access. It does not determine to what extend tearing can be tolerated.

Contrast with something like a VolatileU16 type. Can we make this have a fully safe API? No. We can make creation or access be safe, but we can't make both of them safe.

We can totally make both of them safe. Why shouldn't we?

Unless by "creation" you mean turning an integer address into a ptr-to-volatile. But then you would overload the term "creation" to mean something very different for atomics and volatiles, so this is not a fair comparison. The true difference is that with MMIO volatiles, you typically access memory that is "allocated" by other parties (usually statically assigned); this has nothing inherently to do with volatile.

@Lokathor
Copy link
Contributor

Yes I do mean the int-to-ptr part of it. I mean, for both atomic and volatile, the process of not having the value to somehow having the value. Whatever that process, that's "the creation". The fact that atomics basically aren't address sensitive and mmio volatiles entirely are is kinda my whole point here.

And yes volatile can be used for non MMIO, but if the usage is not MMIO then the accesses don't have side effects and the location cannot be being modified concurrently because volatile isn't synchronized, which means that it's unobservable if there was a tear during the reading or not, or even what access pattern was actually used for the memory.

And if you are doing MMIO then the tear tolerance depends on specifically the device and location you're doing the MMIO with.

So the tear tolerance isn't really part of the type. It's part of the address within a particular memory map.

@RalfJung
Copy link
Member

I agree with most of what you said, except...

So the tear tolerance isn't really part of the type. It's part of the address within a particular memory map.

It's not part of the types we have. That is exactly the problem. It could be part of the type -- VolatileI32 would mean that this is a type for 32-bit-accesses that do not tear. As a type system fan I think it should be part of the type, but I might be overestimating the benefits of that.

@Lokathor
Copy link
Contributor

Lokathor commented Jul 25, 2021

No, I mean that even if you were to have a bunch of special types that are guaranteed to not have tearing access, deciding to use those types at all is still a decision that you have to make when looking at a particular memory map's layout. You can make a group of types that all have "this doesn't tear" as a property for various access widths, but you still can't know when to use those types without knowing what the device's memory map is. It's like, uh, a category error, I think it's called. You could have every fancy brick in the world to build with and that doesn't tell you if you followed the blueprint or not when you put the bricks together.

Let's get a kinda specific for a moment because perhaps this will illustrate just one possible example situation. Suppose we have the following struct:

#[repr(C, align(4))]
pub struct ObjAttrs {
  pub attr1: u16,
  pub attr2: u16,
  pub attr3: u16,
  pub p: i16,
}

If we use this on older ARM architecture that doesn't support LDRD then LLVM will fall back to using two LDR instructions. Indeed, we can double check this in goldbolt to see what we get:

example::get_obj0:             @ here's some comments added by me:
        ldr     r0, .LCPI0_0   @ r0 = 0x0700_0004
        ldr     r1, [r0]       @ r1 = read_4_bytes(r0)
        movs    r0, #7         @ r0 = 7
        lsls    r0, r0, #24    @ r0 = r0 << 24
        ldr     r0, [r0]       @ r0 = read_4_bytes(r0)
        bx      lr             @ return
.LCPI0_0:
        .long   117440516

As we can see, the read is "torn" across two LDR instructions. But despite that, this struct is still correctly defined if you're trying to manipulate the video object attributes on a Game Boy Advance. The fact that this struct definition also causes the volatile accesses to tear does not matter. That's still the correct struct definition for how this particular device works.

@RalfJung
Copy link
Member

RalfJung commented Jul 25, 2021

I mean, for both atomic and volatile, the process of not having the value to somehow having the value.

AtomicI32::new is safe, and VolatileI32::new would likewise be safe. This would be an apples-to-apples comparison of "creation".

Of course, "turn 0x123400 into a &VolatileI32" is unsafe. The same goes for "turn 0x123400 into a &AtomicI32". No surprise here.

Comparing AtomicI32::new with "turn 0x123400 into a &VolatileI32" is comparing apples with oranges.

No, I mean that even if you were to have a bunch of special types that are guaranteed to not have tearing access, deciding to use those types at all is still a decision that you have to make when looking at a particular memory map's layout. You can make a group of types that all have "this doesn't tear" as a property for various access widths, but you still can't know when to use those types without knowing what the device's memory map is.

I mean, of course. This has nothing to do (that I can see) with my comments though so I am confused why you bring it up. I'm not suggesting that by adding types suddenly one does not have to write code any more...

The same is true for Atomic*: just because we have those types doesn't mean concurrent data structures magically write themselves. You still have to know which atomics to use how and when. I never suggested anything else. volatile, however, has a problem that is even more fundamental than this, and that problem can be easily solved with types: AtomicI32 clearly expresses that we need to access 32bit of data in a single instruction; for volatile, no way to express the same intent exists.

Of course the programmer needs to first figure out their intent -- volatile accesses of which size go to which addresses. But then the programmer needs to put that intent into terms the compiler can understand, and that is currently not really possible. Types like VolatileI32 would let the programmer encode that intent directly, basically providing a direct way to describe MMIO registers of various sizes in terms of Rust types.

I never said this would magically make it impossible to write incorrect MMIO code -- as you know very well of course, the tyoe system can prevent some bugs but not all.

@RalfJung
Copy link
Member

RalfJung commented Jul 25, 2021

I agree that "should" and "target-legal" are vague enough that I'd prefer them to be more specified if possible. However, as you've said yourself before, we can't tell people we're going to give them something that LLVM won't implement.

There are several options we have here:

  • Figure out a stronger wording for LLVM, and suggest it to them. Like, replacing "should" by "must". Combine that with Rust knowledge of what "target-legal" means for our various targets.
  • Instead of just setting the volatile flag, also set the unordered flag. This makes the access an atomic operation as far as LLVM is concerned, and disallows tearing. That way we could offer volatile accesses that will definitely not tear.

@Lokathor
Copy link
Contributor

AtomicI32::new is safe, and VolatileI32::new would likewise be safe. This would be an apples-to-apples comparison of "creation".

Okay I finally see the disconnect here. I wasn't imagining that a person would ever care about creating an owned atomic value. That's pretty useless, because atomics are about synchronization. To me, the entire time I was referring to "using an atomic" as "using a shared reference to an atomic", which is why I kept talking about how they're not address sensitive and such. If the atomic is owned or has an exclusive reference then, at least while that's the case, it might as well not be atomic, because no one is looking at it. You're not synchronizing with anyone. So I was only considering &AtomicValue, and I was comparing that to *mut VolatileValue.

But then the programmer needs to put that intent into terms the compiler can understand, and that is currently not really possible. Types like VolatileI32 would let the programmer encode that intent directly, basically providing a direct way to describe MMIO registers of various sizes in terms of Rust types.

Okay, so:

  • I think that we agreed that tears during non-MMIO volatile accesses of don't actually matter. So let's exclusively talk about possible MMIO tears.
  • I think we also agree that MMIO is inherently device specific.
  • A particular device has whichever peripherals, but it also has a particular CPU, which is of a particular architecture, which is part of that build target profile.
  • LLVM will not split/merge target-legal accesses.

Conclusion: a volatile u32 MMIO access programmed for a specific device is already expressing the level of allowed tearing. If a u32 access is a target-legal access then the allowed tearing is "none" and if it's not a target-legal access then the level of allowed tearing is "any". You don't need additional types on top of that. It was already fully specified by whoever wrote that they were doing a u32 volatile access on a particular device.

@RalfJung
Copy link
Member

RalfJung commented Jul 25, 2021

Okay I finally see the disconnect here. I wasn't imagining that a person would ever care about creating an owned atomic value. That's pretty useless, because atomics are about synchronization. To me, the entire time I was referring to "using an atomic" as "using a shared reference to an atomic", which is why I kept talking about how they're not address sensitive and such. If the atomic is owned or has an exclusive reference then, at least while that's the case, it might as well not be atomic, because no one is looking at it. You're not synchronizing with anyone.

No that's not the disconnect.^^
Of course the atomic needs to be shared to be useful. But still in basically all cases, atomics are created via AtomicI32::new. They start out fully owned, and then become shared.

So your comment doesn't touch my point at all I am afraid.

@RalfJung
Copy link
Member

a volatile u32 MMIO access programmed for a specific device is already expressing the level of allowed tearing. If a u32 access is a target-legal access then the allowed tearing is "none" and if it's not a target-legal access then the level of allowed tearing is "any". You don't need additional types on top of that. It was already fully specified by whoever wrote that they were doing a u32 volatile access on a particular device.

So basically the point is, yes it's awfully target-specific and can't even be queried from the language, but that's okay since MMIO is inherently device-specific.

That makes sense, but I still think we should then explicitly list what is considered "target-legal" on each target Rust supports. That seems like a rather underdefined term. (The presence of VoaltileI32 could otherwise reflect such information in a systematic way.)

@comex
Copy link

comex commented Jul 31, 2021

I'd like to point out that the LangRef also says:

Rationale
Platforms may rely on volatile loads and stores of natively supported data width to be executed as single instruction. For example, in C this holds for an l-value of volatile primitive type with native hardware support, but not necessarily for aggregate types. The frontend upholds these expectations, which are intentionally unspecified in the IR. The rules above ensure that IR transformations do not violate the frontend’s contract with the language.

The first sentence seems to confirm that there's a hard guarantee that volatile loads and stores of target-legal types will turn into single instructions, as I'd expect.

The rest of it I don't actually understand. What exactly are the "expectations" that are "intentionally unspecified"? The list of types where volatile accesses are guaranteed to generate a single instruction? Why would you leave that intentionally unspecified?

Besides that:

I agree with @RalfJung that Rust should at least list the types that are guaranteed to produce a single instruction.

I also tend to agree with him that an ideal solution, starting from a clean slate, would involve types like VolatileU32 that encode the single-instruction guarantee in the type system. For cases like @Lokathor's GBA example, I think we would ideally have a separate function to tell the compiler to load some memory using a series of arbitrary-width volatile accesses.

But since we're not starting from a clean slate, I'm not convinced that the benefit of splitting the two is worth the churn.

That said, the current implementation of volatile loads of multi-field structs, like the GBA example, is seriously problematic.

The GBA example produces 32-bit load instructions, which is optimal, but only because the IR rustc generates for core::ptr::read_volatile::<ObjAttrs>, for some reason, does a volatile load at type i64:

  %1 = bitcast %ObjAttrs* %src to i64*, !dbg !9
  %2 = load volatile i64, i64* %1, align 4, !dbg !9

And then LLVM splits the 64-bit load into two 32-bit loads.

On the other hand, if you add more fields to ObjAttrs, the frontend-side optimization no longer occurs, so rustc produces:

 %1 = load volatile %ObjAttrs, %ObjAttrs* inttoptr (i32 117440512 to %ObjAttrs*), align 16777216, !dbg !23, !noalias !27

And then LLVM produces assembly for a series of 16-bit accesses, corresponding to the original field types. It refuses to combine the 16-bit accesses into 32-bit ones, presumably because the access is volatile.

So we have an optimization that LLVM thinks is illegal for volatile accesses but rustc thinks is legal. On rustc's side, the optimization is meant to be an implementation detail and there is presumably no guarantee that the circumstances under which it is performed will always remain the same. If it changes, people will get either invalid code generation (if they assumed that fields wouldn't be merged, and now they are), or suboptimal code generation (if they assumed that fields would be merged, and now they aren't).

We should fix this. We should decide whether merging fields is legal or not. If it's not, rustc should stop doing the i64 optimization. If it is, rustc should try to generate better IR rather than naively generate a LLVM volatile load/store of struct type.

By the way, if you compile C code in Clang that's a transliteration of the original example… Clang turns it into a call to a volatile-memcpy intrinsic, which also sometimes merges fields and sometimes generates suboptimal code that doesn't, but for entirely different reasons.

@Lokathor
Copy link
Contributor

Having a list of types that are guaranteed to work on a given target, and other types might also work on that target, is entirely great. I'd encourage it even, nice to have things written down.

The reason I object so strongly to a VolatileU32 sort of situation is that I don't think anyone can actually take that idea and run it out into a full API that is good and also easy to abstract over. They are hard to be generic with. They are hard to use newtypes with. This makes it more difficult to program. Further, I think that if those sorts of types were implemented then Rust programmers would be told that they should actually use those types and forget about using ptr::read_volatile and ptr::write_volatile. This would ultiamtely be a large step backward in expressiveness.

I've spoken with other people that do this MMIO stuff:

  • They almost all say that they want a "volatile cell" type which lets you newtype a user defined struct type. Then a programmer could have &VolatileCell<MyStruct>, and they would write a phrase like &my_struct_ref.my_field, and that projection would turn into &VolatileCell<MyFieldType>. Currently the lack of ability to do projections with MMIO structs is kinda a pain in the butt. The exact best fix here is not known to anyone, but some sort of improvement so that you can properly field project with volatile would be beneficial.
  • Not once have any of them wished for having their data type options limited to types that won't tear on the device. They already declare their addresses to be of the correct types so that accesses don't tear, if tearing is a concern at all.

I strongly encourage Rust to develop in the direction that is consistent with what actual users of this part of the language say that they need. Please avoid making work for yourselves that isn't being asked for by actual users of this part of the language.

@comex your point about poorly defined codegen for aggregates is very valid, and I think that the situation there should be improved as much as possible.

@comex
Copy link

comex commented Jul 31, 2021

Having a list of types that are guaranteed to work on a given target, and other types might also work on that target, is entirely great. I'd encourage it even, nice to have things written down.

The reason I object so strongly to a VolatileU32 sort of situation is that I don't think anyone can actually take that idea and run it out into a full API that is good and also easy to abstract over. They are hard to be generic with. They are hard to use newtypes with. This makes it more difficult to program.

Well, true. I actually think the atomic types have the same issue. I don't know why we have AtomicU32 instead of Atomic<u32>, where Atomic would be defined as struct Atomic<T: AtomicEligible>, and AtomicEligible would be automatically implemented for the usual integer types but could also be derived by any type of the right size. Marking a struct atomic if it has the same size as an integer is possible in C and C++, and I use this pattern frequently in those languages (often in combination with bitfields).

In my mind this somewhat orthogonal to the question of whether non-tearing should be encoded in the type system, but I suppose not in practice. You raise good points.

  • They almost all say that they want a "volatile cell" type which lets you newtype a user defined struct type. Then a programmer could have &VolatileCell<MyStruct>, and they would write a phrase like &my_struct_ref.my_field, and that projection would turn into &VolatileCell<MyFieldType>. Currently the lack of ability to do projections with MMIO structs is kinda a pain in the butt. The exact best fix here is not known to anyone, but some sort of improvement so that you can properly field project with volatile would be beneficial.

This would be nice. Even without projections, having a VolatileCell that disables dereferenceable would be a great start.

Stabilize the `volatile_copy_memory`, `volatile_copy_nonoverlapping_memory`
and `volatile_set_memory` intrinsics as `ptr::copy_volatile`,
`ptr::copy_nonoverlapping_volatile` and `ptr::write_bytes_volatile`,
respectively.

Signed-off-by: Dan Cross <cross@gajendra.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-intrinsic Proposals relating to intrinsics. A-machine Proposals relating to Rust's abstract machine. A-raw-pointers Proposals relating to raw pointers. A-volatile Proposals related to volatile. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.